While there are numerous tools that implement MLOps in various ways, it’s important to remember that MLOps is not a specific toolset. You cannot download MLOps, no matter how attractive the advertising for it has become. That’s not to say that there aren’t plenty of tools out there that will help you implement MLOps. But you must first fully understand your workflow and then choose tools that will help implement it. So, what is it? It is, first and foremost, a methodology and workflow that is intended to change the mindset of people working in all aspects of data science applications at scale.
There are a number of personas that are involved in a production-level machine learning workflow. The main players are:
- Data scientists work with big data figuring out how to summarize and transform it so that it is understandable and usable. This is then fed into models that they develop and train for the purpose of predicting something about future data that they will encounter.
- Software developers write applications to utilize these models in real-world applications.
- DevOps engineers are responsible for ensuring that all the infrastructure is configured and operating properly for the scale and availability required. They are responsible for the CI/CD pipelines and ensuring that quality gates are defined and in place.
DevOps was born in the early years of 2000, when the software industry realized the dysfunction of having completely separate bodies responsible for their own segment, with nobody who really understood the end-to-end pipeline. The DevOps methodologies were developed to answer this, where a single set of people were supposed to understand the entire end-to-end pipeline and stop the blame game.
The machine learning paradigm added new aspects to the end-to-end pipeline, including sourcing, handling, training, and versioning data. This required the DevOps model to evolve and adapt to include these new aspects. The idea was to train DevOps engineers to understand enough of the data science field to be able to take full responsibility of the entire pipeline. New tools needed to be developed to handle all the automation that was required to support this model. MLOps was born as a superset of the DevOps methodologies.
See also: McKinsey Acquires MLOps Platform Iguazio
What is included in an MLOps workflow?
The first aspect of MLOps that is different from a standard DevOps workflow is working with data. During model development and testing, the data scientists will generally have access to a static data lake where they can train and test their model. The model is integrated into an application that will enable the model to be executed against new data and produce a prediction.
While predictions are automatic, it is wise to add in a check for an acceptable range of prediction values and raise a flag if the threshold is crossed. This will give the data scientists a reason to verify that the model is performing as expected and that nothing was missed.
Before promoting a new version to production, both the old and new models should be checked against the same data. This will prevent regressions as well as give you a baseline for ensuring that the new model is performing as expected.
Retraining is a process that should be executed regularly to ensure that the models are performing accurately with the production data. Data in production changes regularly, and if the model isn’t retrained, it will start to show data skew. The retraining should be triggered automatically whenever key tracking metrics have fallen below the threshold. For this to work correctly, baseline results are needed that prove that the model is retrained properly. If the metrics aren’t improving after a defined period, this should raise a flag for manual intervention.
In a production workflow, the raw data is generally thrown away after processing. You do need a way to save raw data for a specified period for manual verification/validation. Metadata that describes that decision process should be saved regularly, as this will help understand why specific predictions were made if something is unclear.
As with every modern application, a continuous integration (CI) pipeline is required to test every change as it is developed. This ensures that if there is a problem, the developers receive feedback as quickly as possible. The CI pipeline includes all aspects of a production deployment (though not necessarily with the scale/high availability components) to ensure that it won’t behave differently when it is rolled out to production. The continuous delivery (CD) pipeline takes the latest working code and pushes it directly to production. This full automation ensures that customers have access to the latest features and bug fixes as quickly as possible.
Sim Zacks is a DevOps Architect at GM, working on connecting the pieces to deliver the software for the autonomous vehicle. His 25+ years of experience include multiple technology fields including DevOps/CI-CD, software development, system & network administration, database development & design, quality engineering and leading large projects. Sim uses his knowledge & experience to seamlessly connect and translate all aspects of the technology stack with business requirements. He is a mentor & strategist, working with people to overcome everyday challenges and further their careers. He writes a blog on personal strategy, which can be found on his LinkedIn profile. He is also a speaker for local and international conferences. Sim earned an MBA from the University of Phoenix and a BSc in Computer Science from Lawrence Technological University.