Open Nav

Deploy your machine learning models with Kubernetes

You’re an AI expert. A deep learning Ninja. A master of machine learning. You’ve just completed another iteration of training your awesome model. This new model is the most accurate you have ever created, and it’s guaranteed to bring a lot of value to your company.

But…

You reach a road block, holding back your models potential. You have full control of the model throughout the process. You have the capabilities of training it, you can tweak it, and you can even verify it using the test set. But, time and time again, you reach the point where your model is ready for production and your progress must take a stop. You need to communicate with DevOps, who likely has a list of tasks to the floor that hold priority over your model. You patiently wait your turn, until you become unbearingly restless in your spinning chair. You have every right to be restless. You know that your model has the potential to produce record breaking results for your company. Why waste any more time?

There is another way…

Publish your models on Kubernetes. Kubernetes is quickly becoming the cloud standard. Once you know how to deploy your model on kubernetes you can do it anywhere (Google cloud or AWS)

How to deploy models to production using Kubernetes

You’ll never believe how simple deploying models can be. All you need is to wrap your code a little bit. Soon you’ll be able to build and control your machine learning models from research to production. Here’s how:

Layer 1- your predict code

Since you have already trained your model, it means you already have predict code. The predict code takes a single sample, fits the model with the sample and returns a prediction.

Below you’ll see a sample code that takes a sentence as an input, and returns a number that represents the sentence sentiment as predicted by the model. In this example, an IMDB dataset was used to train a model to predict the sentiment of a sentence.

*Tip
To make deploying even easier, make sure to track all of your code dependencies in a requirements file.

Layer 2- flask server

After we have a working example of the predict code, we need to start speaking HTTP instead of Python.

The way to achieve this is to spawn a flask server that will accept the input as arguments to its requests, and return the model’s prediction in its responses.

In this small snippet we import flask and define a route it should listen to. Once a request is sent to the server to the route /predict it will take the request argument and send them to the predict function we wrote in the first layer. The function return value is sent back to the client via the HTTP response.

Layer 3 — Kubernetes Deployment

And now, on to the final layer! Using kubernetes we can declare our deployment in a YAML file. This methodology is called Infrastructure as code, and it enables us to define the command we want to run in a single text file.

You can see in the file that we declared a Deployment with a single replica. Its image is based off of the tensorflow docker image, and then runs a set of four commands in order to trigger the server.

In this command, it clones the code from Github, installed the requirements and spins up the flask server written.

*Note: feel free to change the clone command to suit your needs.

Additionally, it’s important to add a service that will expose deployment outside of kubernetes cluster. Be sure to check your cluster networking settings via your cloud provider.

?Send it to the cloud

Now that we have all files set, it’s time to send the code to the Cloud.

Assuming you have a running kubernetes cluster – and you have its kube config file – you should run the following commands:

kubectl apply -f deployment.yml

This command will create our deployment on the cluster.

kubectl apply -f service.yml

Doing this command will create a service that will expose the endpoint to world. In this example, a NodePort service was used – meaning the service will be attached to a port on the cluster nodes.

Use the command `kubectl get services` to find the service IP and port. Now the model can be called using HTTP with the following curl command:

curl http://node-ip:node-port/predict \
-H 'Content-Type: application/json' \
-d '{"input_params": "I loved this videoLike, love, amazing!!"}'

Wrapping it up – It’s Aliiiive!

Easy huh? Now you know how to publish models to the internet using Kuberentes. And, with just a few lines of code. It actually gets easier.

cnvrg.io model deployment

cnvrg.io provides an end-to-end platform that allows data scientists to manage, build and automate machine learning from research to production. One of the core features of cnvrg.io is the automation of model deployment. With just a single click, a data scientist can create a production-ready environment that can serve millions of requests to their model.

For every deployment environment, cnvrg.io will set up a Kubernetes cluster with all the tools integrated to help you monitor your models in real-time (Promotheus, Grafana). It will track models at the system level and your machine learning model health. That way you can keep track of prediction confidence, input/output and basically any parameter you’d like.

Additionally, the cnvrg.io platform has integrated Istio for advanced A/B testing functionalities, webhooks, alerts and more. It’s so easy to use you’ll be surprised this solution wasn’t in your life earlier.

cnvrg.io user interface: model A/B testing

So. Go on. Take your own models and deploy away! ?

You can follow the full example and code from above here. If you’d like to deploy models with just one click, sign up for a demo at cnvrg.io/start or click the big deploy button below.

Click the big deploy button below.

It’s that easy.

Top MLOps guides and news in your inbox every month