Rubber Meets the Road: Reality of AI in Infrastructure Monitoring

Print Friendly, PDF & Email

In this special guest feature, Farhan Abrol, Head of Machine Learning Products at Pure Storage, examines the disparity between the hype and what’s been delivered, and where we’ll see the most impactful advancements in efficiency and capacity in the coming year. Farhan oversees the development and execution of Meta, a machine learning engine that uses IoT data to optimize customer experience. He has 5+ years of enterprise technology experience and has successfully led engineer teams in enhancing data resiliency, streamlining management operations, and improving customer experience. Farhan is also passionate about shaping the future of the industry, and is an active mentor for future IT leaders.

Enterprise investment in intelligent infrastructure management is growing seemingly in lockstep with the rise in hype around the potential for AI and machine learning to improve IT infrastructure – yet the anticipated value is only beginning to be realized.

IDC estimates global spending on cognitive systems will reach $31.3B by 2019. Many of the applications and use cases for these systems are in infrastructure management services – currently one of the least automated parts of IT capabilities.

That represents a big opportunity for many organizations – particularly around things like monitoring and workload planning. Intelligent IT infrastructure is predictive, self-healing, self-optimizing, and self-protective – automating all of these key elements without the need for human intervention. The most common focus areas for this type of investment include capacity planning, performance tuning, and observability and log analysis. Yet complexity, siloed data and other issues are proving to be barriers as IT teams seek to modernize with artificial intelligence for IT operations (AIOps).

We’ve already seen some examples of activity in AIOps that point to where we’re heading. Netapp recently acquired Cognigo, an Israeli data compliance and security supplier, which provides an AI-driven data protection platform to help enterprises protect their data and stay in compliance with privacy regulations such as Europe’s GDPR.

In June, HPE expanded its hyper-converged infrastructure portfolio when it introduced HPE Nimble Storage dHCI, which provides self-optimizing performance, predictive support automation and problem prevention. And ServiceNow acquired Israel-based Loom Systems to add AIOps to its portfolio.

Workload planning

As these investments play out, they will begin to provide actual value to customers across the enterprise landscape. But we’re not there yet. While some use cases have yet to be put into action in a big way, perhaps the most meaningful benefits that AI has brought to infrastructure management so far are around planning and tuning.

Workload planning is one of the hardest things to do in storage. A storage environment isn’t static – new requirements come in every day. Changes may include, for example, a need to double an Oracle workload or support five times the number of VDI users.

To anticipate those requirements, every infrastructure admin needs to do planning and tuning. It used to take hours – leading to either underutilization or overuse – neither of which are optimal.

Machine learning models, on the other hand, can predict load and capacity, find the ideal location for workloads, and minimize risk. Being able to forecast the future means IT teams can sleep easier, knowing they can forecast and simulate the impact of changing hardware components on both load and capacity.

Getting to the ‘workload DNA’ for ML models

At the heart of building intelligent planning tools that impact the different outcome measures we care about – namely performance and capacity – is having as many signals as possible. The more data one can gather around the nature of the workload – the time series of read bandwidth, write IOPS, IOSize, the spatial overwrite pattern and the like – the better the models can perform. We call this holistic representation the “workload DNA.”

Our models take the “workload DNA” as inputs and build time series forecasting and regression models to predict how performance and capacity will evolve on a piece of infrastructure. Doing this across all pieces of infrastructure allows us to give IT teams an easy view into which components will run out of steam on which axis, when. 

We then take the next step and allow teams to simulate migrating workloads between different nodes and suggest placements for optimal load balancing and utilization across performance and capacity. All this is enabled by the core prediction models. It’s exciting stuff and the tip of the iceberg for AIOps.

The road ahead

As models mature, machine learning for infrastructure planning will go from advisory roles to automated action, and the focus will turn to data. In order for predictive feedback systems to scale and be applicable in more contexts, machine learning will be applied to the task of efficiently finding where the model performs poorly, and augmenting data for that feature space. Just take a look at the number of companies being funded for labelling data. The quality of the models is reasonably mature; the quality of the data dictates how effective they’ll be.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*