Enabling Self-Service MLOps and Faster ML Delivery at monday.com

Industry

SaaS

Resources

CPU

GPU

AWS

Use-case

NLP

TOOLS

Before cnvrg.io, we had little control on the full model lifecycle beyond the research phase. cnvrg.io MLOps capabilities allowed us as data scientists to take ownership of the model lifecycle end to end without a direct dependency on engineers. cnvrg.io has enabled us to take on more projects, speed up time to value, and has saved us time working on technical complexity”

Ohad Hegedish

Data Scientist monday.com

About Monday.com

The monday.com Work OS is an open platform that democratizes the power of software so organizations can easily build work management tools and software applications to fit their every need. The platform intuitively connects people to processes and systems, empowering teams to excel in every aspect of their work while creating an environment of transparency in business. monday.com has teams in Tel Aviv, New York, San Francisco, Miami, Chicago, London, Kiev, Sydney, São Paulo, and Tokyo. The platform is fully customizable to suit any business vertical and is currently used by over 152,000 customers across over 200 industries in 200 countries.

Overview

monday.com is a work operating system (Work OS) where organizations of any size can create the tools and processes they need to manage every aspect of their work. BigBrain is the data team within monday.com that has data scientists, data engineers, BI developers, and full stack developers that are responsible for monday.com’s data and analytics platform and machine learning initiatives. monday.com has built cutting edge NLP, Timeseries, and Classification ML applications to optimize processes and enhance decision making internally, as well as improving user experience in the product. Various ML use cases have been used to predict whether a user will become a paying customer, Auto-Tag customer support tickets, and for an internal app summarizing text. Monday.com continues to expand its ML solutions and is using cnvrg.io to scale and accelerate time to value.

Challenges

Time to value was high with high dependency on developers

monday.com started its ML initiatives internally to improve decision making and processes for marketing, and customer support. Quickly, the company had more and more demand for ML solutions to improve performance and user experiences. As demand grew, the data scientists were heavily reliant on engineers to bring models to production. After training and building the models by the data scientists, it would often wait some time for deployment until a developer was available to set up the infrastructure and put it into production. Even once they were in production, the data scientists were siloed and had a disconnected workflow between where the model was trained, deployed and monitored which created unnecessary complexity. Developers took the model and encapsulated it in an environment, and the data science team lost overall ownership of the model in production. The data science team understood that it’s more efficient to be able to control the deployment process than it is to explain to developers why a model was coded in a certain way and why. Some of the key pain points were:

Excessively high time to value due to production bottlenecks
Dependency on developers and engineers for deployment
Missing critical MLOps capabilities like experiment tracking, adopting containerized workflows, and maintaining control over deployments
Inability to consolidate distinct endpoints into a multi-model endpoint pattern
Disjointed workflow due to each data scientist working with different machine learning tools

Solution

Seamlessly running end to end ML workflows independently

Other ML platforms limited the data scientists’ flexibility to code and build freely. They found that many of the ML platforms had strict provisions that required them to use specific SDKs, special platform-specific versions of models that don’t allow for customization. In cnvrg.io the data science team gained all the MLOps capabilities with a friendly UI, and allowed them to focus on researching rather than learning Docker and Kubernetes.

Experiment tracking and management for easily reproducible results
Easily compare between different model hyperparams configurations and training runs
Ability to train machine and deep learning models on any compute (cpu/gpu) in the cloud
Unified system to track model evaluation metrics and store and manage model artifacts
Simplified encapsulating and orchestration of models with docker images
Creating CI/CD re-training pipeline that updated the model based on performance accuracy
Enable the seamless chaining of algorithms and custom code written in any language
Customizable endpoints in one click

Results

Cut engineering bottlenecks by 100% and accelerated time to consumption

With cnvrg.io, the data scientists were able to get models into production on their own, instead of waiting on developers to deploy. cnvrg.io increased the data scientists ownership of the end-to-end ML lifecycle and reduced time waiting to understand model performance. With cnvrg.io MLOps capabilities, the BigBrain team has been able to take on more projects that have directly impacted business units from marketing to customer support as well as enhanced the product experience with personalization and summarization applications.

Zero interface with engineers during the train-deploy cycle
Shortened the time to see results from the modeling work
Easier to adopt a multi-model endpoint pattern for efficient serving
Reduced time spent on technical configurations and infra setup by 80%
Completely automated infrastructure for CI/CD of ML models