Turning Big Data into better data with MLOps

Print Friendly, PDF & Email

In this special guest feature, Chida Sadayappan, Lead Specialist for Data Cloud and Machine Learning at Deloitte Consulting, discusses Machine Learning Operations (MLOps). Chida interacts with CxOs to provide them support in Cloud, Machine Learning & Data Analytics and Data Modernization and transformation Strategy. His expertise ensures that enterprises are able to successfully adopt AI and integrate cloud technology.

Over the past decade, Big Data has been an outsize force in reshaping how businesses operate. But as data continues its breakneck proliferation—an estimated 59 zettabytes were generated in 2020—businesses are increasingly challenged to aggregate, understand, and use these massive jumbles of data. [1]

That’s where Machine Learning Operations (MLOps) comes in. This emerging discipline uses automation to accelerate ingestion of data and to more quickly develop, test, deploy, and monitor cloud machine learning (ML) models that use Big Data. Taken together, these steps comprise governance practices that help ensure the integrity of data throughout its life cycle. [2]

Like its predecessor DevOps, MLOps uses automated development pipelines, processes, and tools designed to streamline design and deployment of ML learning models and workflows. Operation of data is a key component of ML—and is a strong suit of MLOps. Unlike DevOps, MLOps can more effectively manage the operation of data. MLOps isn’t an algorithm, but it does operationalize the algorithm to simplify the predictive process. MLOps enables the appropriate uses of ML algorithms to teach systems how to identify and classify data today and “learn” new, more effective techniques to do so in the future. These decision-making ML algorithms help businesses recognize patterns that predict consumer preferences, identify fraud, monitor financial performance, and reimagine customer experience, to name a few use cases—and become operationalized with MLOps.

Given these potential outcomes, it’s not surprising that businesses that have invested in cloud-based ML are taking a serious look at MLOps to enable, monitor, and enhance ML models. It’s a nascent discipline, but the global MLOps market is expected to soar to almost $4 billion by 2025, up from $350 million in 2019. [3]

Toward a framework for MLOps implementation

While there is no singular strategy for implementation of MLOps, an end-to-end framework typically comprises four basic elements: versioning the model, autoscaling, continuous model monitoring and training, and retraining and redeployment.

  • Versioning the model: It’s important that organizations explore different data sets and algorithms that could solve the same business problems. Reproducibility is critical, and versioning each data set, algorithm, and ingestion pipeline is essential to creating results that can be reproduced.
  • Autoscaling: Once deployed, the MLOps model should be able to rapidly scale up or down as demand dictates. That’s essential because large organizations may eventually create thousands of data models.
  • Continuous model monitoring and training: It’s critical to continually monitor performance of models to help ensure that they produce accurate results. That’s because external factors like economic conditions are constantly in flux, which can make obsolete the data used in the initial training process. Monitoring helps evaluate model output and track drift and effectiveness over time.
  • Retraining and redeployment: As model drift occurs, businesses should be prepared to retrain the model using new data and then redeploy it.

Data preparation is the foundation of MLOps

It’s impossible to overstate the importance of precise, standardized data preparation when planning an MLOps initiative. Problem is, selecting and correctly preparing the right data for ML training and modeling is an arduous initiative for most businesses. It requires that they identify and convert raw and chaotic data to a clean and consistent format that can be used across models. What’s more, a formal methodology for data preparation is needed to replicate and version models.

The first step in data preparation is to identify and access the appropriate data for use in ML training models and algorithms. To do so, businesses will need to assign attributes to data that are meaningful indicators for achieving MLOps objectives. Also essential is the ability to share information among internal teams to improve collaboration and accelerate development life cycles. This will require that data can be quickly located, accessed, indexed, and reused in the cloud.

MLOps is an open-ended process that includes continuous monitoring, evaluation of models, and data training. Continual tracking is critical to enhancing visibility into the performance and accuracy of ML outcomes. Monitoring can also help detect and address model drift, which occurs when ML algorithms no longer make accurate projections, which is typically due to changes in data or customer behaviors.

Consider, as an example, a video streaming service that uses ML to predict a customer’s preferences. An algorithm delivers a personalized recommendation of videos for individual subscribers, while MLOps monitors what users actually watch. If subscribers don’t click a recommended video, the streaming service will need to adjust the algorithm to better convince users to watch recommended titles.

MLOps streamlines this iterative process by comparing the customer response with the recommendation and then determining, if necessary, how to correct the user reaction. In some cases, the training data used to determine subscriber recommendations shifts over time. In others, user tastes and interests may morph due to real-world events like the COVID-19 crisis. Either way, the subscriber may find that once spot-on suggestions have become decidedly off base. So, if the personalization system recommends a romantic comedy to a diehard fan of WWII epics, there’s little chance the user will select the lighter fare. And that can diminish customer satisfaction—and ultimately imperil reputation and revenues.

Finally, MLOps is an iterative and continuous process that thrives on experimentation and innovation. It’s important to explore different data sets and algorithms—you just might uncover more accurate, streamlined ways to tackle the business problem. Also, an attitude that embraces trial and error and the notion of “failing fast” can help improve machine learning and MLOps capabilities while accelerating time to innovation.

The right data for ML

For businesses that are drowning in a deluge of Big Data, MLOps can help operationalize the important data that could have an impact on the ML algorithm or its underlying model. In this way, MLOPs helps cut through the Big Data noise to bring in the right data, based on its relationship to the ML model, to achieve greater AI operational efficiencies, productivity, and innovation. MLOps can help improve and operationalize learning of ML algorithms and, indeed, the entire ML process. So, while Big Data has long been a critical first step in training AI/ML models, MLOps is now helping improve long-term tuning of these ML models based on real interactions.


[1] IDC, IDC’s Global DataSphere Forecast Shows Continued Steady Growth in the Creation and Consumption of Data, May 2020

[2] Deloitte, Time, technology, talent: The three-pronged promise of cloud ML, December 10, 2020

[3] Deloitte, Tech Trends 2021: MLOps: Industrialized AI, December 15, 2020

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind

*