Open Nav

MLOps best practices

MLOps best practices

Building machine learning models is an iterative process that should be scalable, reproducible, and collaborative. The aim is to move fast in the experimentation phase while trying as many things as possible, for example, different algorithms and parameter combinations. In this process, you may get well-performing models that you’d like to move to production. However, it is impossible to reproduce these models without a system to track this experimentation process. MLOps, is, therefore, a set of principles, tools, and techniques for making the process of building machine learning models scalable, reproducible and collaborative. 

This article will teach you the best practices to keep in mind while implementing an MLOps system. The assumption is that you have a problem that should be solved with machine learning. Some problems can be solved without machine learning or modeled using other techniques. Building machine learning systems requires having enough training data. Therefore, if you don’t have the data, the first step is to collect it. 

Data processing

Let’s start by looking at some of the best practices to keep in mind while dealing with data. 

Use simpler features 

It is better to have many simple features than a few complex features. This will help keep your pipeline and model simple in the beginning. For instance, you can use feature elimination techniques to remove features that apply to very few examples.

Combine existing features

You can create new features by combining existing features in different ways. Two common approaches are discretization and crosses. Machine learning packages such as TensorFlow provide tools for doing this. Discretization is achieved by creating discrete features from continuous values, and crossing is done by combining features. Crossing should be done carefully because crossing many features can lead to overfitting. 

Remove unused features

There is no need to keep numerous unused features in your system. They could lead to confusion when updating the system because you might need clarification on whether the feature was used elsewhere. Keep your infrastructure simple by removing all unused features. Remember that you or someone else can add them back in the future. 

Sanity check external data sources 

You may be feeding data from external sources in your pipelines. Therefore, you need to ensure that the data exists and is not corrupted. Errors in the data will lead to model degradation. To confirm the sanity of the data, you can:

  • Check that the data types are as expected
  • Check for  missing values
  • Confirm that the data is within the expected range

You can use various methods to impute missing data, such as replacing it with the mean or median. You should include this verification process in your data pipeline and be able to identify and track any changes that happen to the data.

Write data scripts that can be tested

Code written for data cleaning and exploratory data is usually messy and disorganized, which is quite common when using notebooks. However, before training the models, it’s good practice to convert this code into functions that can be tested. This will make it possible to reuse and integrate the code into your pipeline. 

Control data labeling 

Data is the most crucial thing in a machine learning pipeline, and high-quality data will lead to high-quality models. Therefore, the labeling process should be highly controlled. Ensure that labels are peer-reviewed to achieve high quality. 

Avail datasets on shared infrastructure 

Machine learning models usually process large datasets. Making these data available from one source is essential because: 

  • It prevents data duplication
  • It prevents copying large files which can lead to system delays
  • It makes it easy to keep the data up to date
  • Helps in implementing access control to the dataset 

Model development 

Before models are put into production, they need to be built and tested. When creating models, keep the following items in mind. 

Start with a simple model

With a model in place, most of the challenges you will face are engineering problems. Starting with a simple model enables you to get the infrastructure right. For instance, you can start with simple features to create a baseline model and metrics. Having a complex model in the beginning introduces two problems: 

  • Debugging a complicated model 
  • Optimizing the environment the model is running from


There are some important things to keep in mind when building models. Do you want the model to be used for live predictions? Would you rather run predictions, store them in a database and pull those predictions instead? 

Build an interpretable model 

Some algorithms act like black boxes making it difficult to understand how the model works. Starting with an interpretable model makes debugging easy. 


Training machine learning models can be a resource-intensive task depending on the data and the type of models. In this section, you will learn about some of the best practices for the training phase. 

Automate parameter tuning optimization

Tuning model hyperparameters manually can take ages. Using automated tools to do this will save you a ton of time. Machine learning packages such as scikit-learn offer this capability, as do some experiment tracking tools. 

Set clear training objectives

Failing to set clear objectives can lead to wasted time trying to optimize for the wrong goals. It’s imperative to set and share clear objectives, especially when working in a team, to prevent the loss of precious engineering time. For instance, in a time series problem, you may require that the model be trained with data that is a certain age. 

Document manually generated features 

Individuals should own features that are manually generated. They should also be well-documented to make it easy to hand over from one engineer to another. It also makes it easier to maintain the feature because the reason behind its generation is available. Documenting also makes it possible to share information with the entire team. 

Peer review training scripts 

Peer-reviewing training scripts ensure that buggy code is not shipped to production and makes debugging easier. It also helps in the following: 

  • Transferring knowledge between engineers
  • Creating high-quality code that is easy to maintain
  • Responding to incidents as team members they are familiar with the code

Automate feature generation and selection

Generating and selecting optimal features can be a time-consuming process. For instance, you can use recursive feature elimination to automate feature selection. You should, however, evaluate the generated and selected features. Apart from reducing human effort in this process, automating the process can lead to better features. The automatically generated and selected features should also have documentation and an owner. 

Evaluate the model for bias

Evaluating the performance of your model on various groups is important. It ensures that the model is not biased toward a particular group. This is done by assessing fairness metrics such as False Positive Rate (FPR) and False Negative Rate (FNR). TensorBoard’s Fairness indicators dashboard is an excellent tool for this.  

Implement versioning

Versioning in experiment tracking makes it possible to reproduce experiments. You should, therefore, version the data, model configurations, and training scripts. The artifacts resulting from every experiment should also be versioned. Apart from enabling reproducibility, versioning is crucial for traceability, compliance, and auditing.

Avoid over-parametrized models 

Large models consume a lot of resources and take longer during inference. Use smaller models that use fewer resources and are fast at inference. This can be achieved by using pruned and distilled models. 

For example, you can introduce up to 90% sparsity to your model and still retain the model’s accuracy. You are, therefore, able to make better use of computational resources without losing the model’s performance. 

Other things to keep in mind in the training phase are: 

  • Keeping the training objective simple to measure and easy to understand
  • Ensuring that there are no bugs in the feature extraction code 
  • Implementing parallel training experiments to avoid deadlocks when experimenting
  • Automatically configuring model structure to obtain the best performance
  • Creating an environment where information is shared freely, such as the result of experiments

Model deployment and serving

Once you have a trained model, the next step is to move it to production. Productionalizing a model is more of an engineering than a machine learning problem. Let’s discuss some practices that can make this stage easier for you. 

Launch and improve

The first model you build will not be the best. Therefore, there is often no need to wait for a perfect model before launching. Launching is essential because you can test the ML system on actual users and iterate. As you move along, you can develop new features and improve the model’s performance. 

Test infrastructure 

Initially, start with a simple model that enables you to test your infrastructure. For instance, you can: 

  • Test the statistics of the data in your pipeline with the data available offline to ensure they are the same
  • Check that the deployed model gives similar results as the model in your training environment

Test the model on future data

After training the model using data for up to a specific date, test the model on data from a future date. This will help you to see how the model performs on future data. It may be worse, but not significantly.  

Create a monitoring system

Monitoring your ML system is vital in ensuring that the model’s performance doesn’t degrade over time. For instance, some models will need to be monitored daily and others monthly. You can set up an alerting system that sends you an email or create a dashboard where you can see all alerts. 


You can include the following information in the dashboard: 

  • A record of any problems detected when exporting the model. You can achieve this by doing sanity checks before deploying the model. 
  • When data was previously updated, stale data leads to poor performance, and updating the data can leave massive improvements. 
  • Record the owners of each feature. This will make it easy to get the features updated. Documenting each feature will also make it easy to maintain the feature if the creator isn’t available. 

Use same features during timing and serving

Training-serving skew is an occurrence where the model performs differently during training and serving. This is caused by differences in handling data during training and serving and change in data between training and serving. To avoid this, ensure you are using the same features and processing steps during training and serving. 

Measure training serving skew

You will most likely see differences in the model’s performance on the hold-out data and data from a future date. Measuring this difference helps you determine the acceptable difference based on your problem. It can also help unearth time-sensitive data causing the model to degrade. 

Build simple ensembles 

Ensembling machine learning models leads to better performance by taking advantage of the strengths of several models. Create a simple ensemble that takes the output from base learners as the input. You should also check how an increase in the performance of the base model affects the performance of the ensemble. 

Perform automated regression tests

Tests are essential in ensuring that bugs are not introduced in production code. Any new changes should be tested to ensure existing functionalities are not broken. Therefore, code that fails to pass tests should not be merged. An app like bors on GitHub can be used to check that all tests have passed and merge code automatically if there are no issues. 

Automate model deployments

Packaging models for production can take a lot of time. By automating model deployment, you free up engineers to focus on other issues. For instance, models that pass a particular test can be made available to a subset of users for testing. Automated deployments will involve packaging models automatically with all their required dependencies and moving them to a staging or production environment. Automation also involves constant monitoring of models and rolling back to a previous version in case of poor performance. 

Implement shadow production

Shadow deployment helps understand how a model performs in the real world. Predictions are made on the model using real-world data, but the predicted results aren’t used in business decisions. The predicted result is compared with the results obtained from the existing system. You can then use the model in a real-world setting when you are confident that the results from the model are palatable. 

Attach predictions to model versions and data

Attaching every model prediction with the model and data version is important for several reasons: 

  • It makes it possible to track every prediction in the system
  • It makes reproducing results possible. 
  • It enables debugging of models in the event of unexpected behavior. 

Other things to keep in mind while deploying your application include the following:

  • Ensuring the application is secure to prevent data from being stolen or the application from going down
  • Implementing a process of performing post-mortem analysis of any incident to enhance learning and continuous improvement. 
  • Monitoring deployed models often to make sure that they are behaving as expected. 
  • Ensuring that models perform well in training and production by checking for skew between models. 
  • Implementing a process for performing automatic roll-back of dismally performing models. 


Collaboration is an essential aspect of building machine learning models, especially in the experimentation phase. Using tools that allow easy collaboration allows an engineering team to move faster. In a collaborative system, team members should be able to access code, data, features, and project documentation. 


Keep the following things in mind from a governance point of view:

  • Ensure that your machine learning models are performing within specific ethical values
  • Perform risk assessments to make sure that your machine learning application doesn’t have any unintended negative impact
  • Enforce fairness and privacy in your machine learning application 
  • Educate users on how to use your machine learning application Ensure that users know that it’s a machine learning algorithm doing the work and its limitations 
  • Explain to users why your machine learning application made certain decisions
  • Provide channels for users of your machine learning application to offer feedback 
  • Get an external and trusted third party to audit your machine learning application to help unearth any weaknesses and build trust 

Final thoughts

There are many things to keep in mind while creating an MLOps system. Importantly, launch and keep iterating. Keep your ears on the ground for the latest tools and best practices, as this is a quickly developing field. This article introduces some best practices to keep in mind while setting up your MLOps system. The list is not exhaustive, but should serve as a starting point. 


Top MLOps guides and news in your inbox every month