Part two of three.

Two distinct forms of AI offer viable solutions to the problems posed by data observability systems. Let’s review the value of both.

Predictive AI

The first form is Predictive AI, often referred to as a forecasting model. Forecasting poses a considerable challenge, requiring deep understanding of probabilities derived from historical data and trend analysis – and fortunately, numerous models are available for deployment.

However, achieving effective data observability still proves difficult due to the dynamic nature of historical trends and the many influences affecting the system at minute, hourly, daily, quarterly, and yearly intervals. The predictive model must be highly adaptable and able to accommodate shifts in these trends. In addition, the integration of predictive capabilities into a data observability platform requires careful selection and collection of relevant metrics aligned with the intended observability goals. While commonly focused on CPU and memory constraints, additional granular aspects can be measured based on organizational requirements.

A key point: predictive metrics are effective for generating recommendation-type systems, but they should not be treated as catalogs of definitive possibilities. These predictions derive their accuracy from known variables only, and not from indirect influences on the underlying values. Despite this limitation, organizations stand to gain real advantages by adopting a proactive approach to data observability, surpassing the value of current reactive or near real-time evaluation practices. A predictive system delivers advanced recommendations tailored to the unique occurrences and conditions of the system – but to fully exploit the potential of such integration and facilitate seamless interpretation of results, a Generative AI (GenAI) platform is needed.

GenAI

Currently, Generative AI dominates the conversation across multiple industries, with organizations working fervently to integrate GenAI applications into their business platforms. The aim is to leverage GenAI’s capabilities across various use cases, gaining deeper insights into operational strategies – and a competitive edge. This pursuit extends to data observability, where understanding data on both a granular and operational level is paramount for informed decision making. Predictive AI facilitates this transition toward proactive solutions, as previously discussed. However, the challenge lies in translating these insights into actionable information tailored to each stakeholder. That’s where a GenAI solution can take over.

Integrating GenAI into predictive models empowers users with the ability to comprehend their data within a self-explanatory, generative context. This allows users to explore both explanatory and exploratory questions, offering insights into system effects and potential limitations in unique scenarios. For instance, through methodologies like Mixtral LLM and Ensemble models, organizations can extract nuanced insights and address complex inquiries.

Below are some examples from a real-world solution currently testing this concept.

Q1. Can you provide Insights into the system’s sustainability with its current configuration?

A1. Based on the provided data, there are some concerns regarding the sustainability of the system’s CPU usage. Them mean CPU utilization is around 0.73, which indicates that the CPU is being utilized quite heavily. However, it is important to note that the median CPU utilization is 0.75, which suggests that the CPU message is consistently high. Additionally, the standard deviation of CPU utilization is around 0.20, indicating that there is some variability in the CPU usage over time. Overall, these metrics suggest that the system’s CPU usage is consistently high and may be approaching unsustainable levels. It would be recommended to investigate potential causes of high CPU usage and consider implementing measures to reduce CPU usage where possible.

Q2. In the upcoming month, can we explore opportunities to trim down on resource utilization?

A2. Yes, there are opportunities to optimize resources utilization in the upcoming month based on the data provided, here are some recommendations:

  • Monitor the HOST_CPU utilization closely as it has shown signs of saturation during peak hours. Implementing load balancing techniques could help distribute the workloads evenly across resources.
  • Investigate the cause of the negative values in the HOST_MEMORY utilization data. It may indicate that the allocated memory is not being utilized effectively. Consider reallocating memory resources accordingly.
  • Keep track of the failure rate, which has been consistently high throughout the reporting period. Implementing redundancy mechanisms and fault tolerance techniques could improve system reliability.
  • Continuously monitor resource utilization metrics and adjust resources allocation policies as needed to ensure optimal performance and prevent saturation.

Q3. [What were my] CPU details of my system between 25th of Jan to 27th Jan?

A3. Here are the January 25 to January 27 details. However, here are some general insights about the CPU usage during that timeframe.

On January 25, the mean CPU utilization was 0.73, median was 0.75, minimum was 0.39, Max was 1.0, Standard deviation was 0.21, 25th percentile was 0.56, and 75th percentile was 0.92.

On January 26, the mean CPU Utilization was 0.73, median was 0.75, minimum was 0.40, maximum was 1.0, Standard deviation was 0.20, 25th percentile was 0.56 and 75th percentile was 0.92

On January 27, the mean CPU utilization was 0.93, median was 0.86, minimum was 0.63, maximum was 1.0, Standard deviation was 0.54, 25th percentile was 0.67 and 75th percentile was 0.96

Indicating an increase in CPU utilization on January 27th. Please note that these values represent overall CPU utilization and do not provide information about the individual processes or application running on the system.

Please see the CPU utilization dashboard per your request.

Through the integration of Predictive and GenAI methodologies, the solution provides comprehensive insights into system dynamics (as shown above), offering best in-time recommendations rooted in historical events and the likelihood of their recurrence. Through continuous monitoring, this approach affords greater visibility into system usage, empowering DBAs and System Admins to proactively address any concerns before they manifest.

In our final blog of this three-part series, we will introduce and delve more deeply into the architecture and methodology of a readily-to-use, GenAI-enabled database observability tool.