This Month in MLOp and LLMOps

By Bob Glithero

Welcome to a new feature of our blog, where we curate the best articles and tutorials we found in MLOps and LLMOps, with an emphasis on open-source solutions! This month, Retrieval-Augmented Generation (RAG) features heavily in our selections, underscoring that many users and enterprises are curious about the insights they can get from feeding LLMs with their own documents and data sources. We also have a tutorial on voice matching with LLMs, one company’s 18-year journey from basic scripts to a mature data mesh architecture, and some thoughts on how AI is impacting the structure and workflow of software development teams.

Community posts

MLOps Infrastructure at Mission Lane (Part 1)

By Mike Kuhlen

Mike discusses the first part of Mission Lane’s transition from their previous MLOps stack to a modern solution, enabling them to streamline model development, testing, deployment, and management. In the upcoming second part, Mike will delve into operational use cases, showcasing how Mission Lane leverages Airflow for batch evaluations over large datasets.

mlinfra – a hassle free way to deploy ML Infras

By Ali Abbas Jaffri

In this article, Ali Abbas Jaffri introduces mlinfra, a lightweight Python package built on top of Terraform modules. mlinfra is aimed at simplifying the deployment of MLOps infrastructure and establishing a universal Infrastructure as Code (IaC) framework.

Strategies for deploying online ML models

By Victor Macedo

Here, Victor talks about practical strategies for deploying machine learning (ML) models using Kubernetes, Knative, and Istio. He illustrates with a local Kubernetes cluster and a simple API application containerized with Docker and Knative for serverless deployments.

Versioning Machine Learning Models

By Lakshaya Khandelwal

Lakshaya discusses why effective versioning is crucial for production machine learning, including benefits such as reproducibility, rollbacks, compliance, and monitoring for concept drift, and challenges like data drift and large model sizes.

UML(C4) for ML Engineering optimization workflow

By Carlos C (@autognosi)

Carlos explores the significance of Unified Modeling Language (UML) and its C4 variant in optimizing Machine Learning Engineering processes. He also shows how these models offer a structured and hierarchical way to document software architectures, which are particularly beneficial in the context of ML Engineering.

From a hack to a data mesh approach: The 18-year evolution of data engineering at Leboncoin

By Simon Maurin

Over 18 years, Leboncoin’s data engineering has evolved from basic shell scripts to advanced data mesh principles. Simon details the company’s journey from basic scripts, to BI tooling, transitioning to a data platform for horizontal scaling and cloud adoption, and finally, a shift to microservices and event streams for massive data scalability enabling ML initiatives. The growth of application teams led to adopting a data mesh model, embedding data expertise within feature teams.

Detecting Concept Shift: Impact on Machine Learning Performance

By Michał Oleszak

Here, Michal focuses on concept shift, where input-output relationships change over time while features stay constant. The article categorizes data shifts and underscores the significance of concept shift detection for maintaining ML model performance. Michal advocates for regular model training on new data, highlights challenges in detecting and quantifying concept shift impact, and suggests strategies for managing it to ensure ongoing model value delivery.

Become the maestro of your MLOps abstractions

By Médéric Hurier

Médéric walks us through the evolving landscape of MLOps. Just as in Big Data systems, the diversity in MLOps introduces complexity, leading to analysis paralysis for decision-makers. To manage this complexity, he advocates for the development and mastery of abstractions, drawing parallels with the success of Apache Spark in managing Big Data applications. By encapsulating underlying complexities, abstractions offer adaptable architectures for integrating new components, simplifying the challenges posed by managing a seemingly endless variety of tooling.

Build It Or Buy It? Generative AI’s Million Dollar Question!

By Alden Do Rosario ·

The article discusses the emergence of the “RAG Pattern” in Generative AI and the challenges developers face when transitioning from building prototypes to deploying AI systems in production. While frameworks like Langchain or direct use of ChatGPT API offer quick prototyping solutions, issues arise when it comes to data ingestion, document handling, query relevance, and maintaining AI systems. Additionally, the author provides technical insights into building and querying RAG pipelines, and the trade-offs that may be necessary when implementing AI solutions.

Build an Audio-Driven Speaker Recognition System Using Open-Source Technologies — Resemblyzer and QdrantDB

By Karan Shingde

The article explores the process of matching a speaker’s voice with a set of existing voices using vector embeddings and open-source technologies. Similar to biometric systems, which use physical attributes like fingerprints, this approach utilizes the unique characteristics of the human voice for identification purposes.

AI for Groups: Build a Multi-User Chat Assistant Using 7B-Class Models

By Jan Jezabek, Ph.D.

Go through the steps needed to build a lightweight assistant for this purpose using open-source LLMs. In this context “lightweight” means a model that requires 16GB and 8GB of GPU RAM for training and inference respectively, and that it can efficiently run on a CPU if needed. The article also discusses dataset generation, creating chat templates, training, and testing strategies.

Power your RAG application using QdrantDB, Mistral 8x7B MoE, LangChain, and Streamlit.

By Karan Shingde

Karan’s second article here details the creation of a simple RAG (Retrieve, Answer, Generate) application, which leverages a vector database as a knowledge base extracted from PDFs. The application architecture addresses both ingestion and inference. The article also discusses the integration of Streamlit for building a user interface (UI) for the RAG application, allowing users to interact with the system by asking questions and receiving answers generated by the Mistral model. Includes sample code in the author’s repository: https://github.com/karan842/RAG-with-Qdrant-and-Mixtral/

Manage your RAG Life Cycle Management, Access, Testing, Performance, Reporting etc. — Streamlit based RAG LCM Portal

By Ajay Singh

Managing the lifecycle of RAG involves several stages, akin to MLOps/DevOps processes in AI. This includes design and development, testing and validation, deployment, and monitoring and maintenance. Testing plays a crucial role in ensuring the model’s readiness for real-world use. Moreover, the article introduces the Cognizance Portal, a Streamlit-based frontend system designed to manage the RAG lifecycle efficiently. The portal offers various capabilities such as LLM selection, chunking options, access management, lifecycle management, performance tracking, data visualization, and integration with multiple document types and data sources.

Mastering the Art of Embeddings: Choosing the Right Model for Your RAG Architecture

By Eduardo Ordax

Embedding models play a crucial role in optimizing the performance of Retrieval-Augmented Generation (RAG) architectures and Generative AI applications. Eduardo emphasizes the importance of selecting the right embedding model, as it directly influences the efficiency of identifying similarities between documents and user queries in the RAG process. The decision of which embedding model to choose is vital, considering the wide range of transformer-based models available in the market, tailored to specific objectives such as coding tasks, English language processing, and multilingual datasets.

Structured Data Analysis using Knowledge Graph + LLM

By Md Sharique

This article discusses the integration of Knowledge Graphs with Large Language Models (LLMs) for structured data analysis, offering a more efficient alternative to traditional model training processes. It introduces Knowledge Graphs as a semantic network that captures relationships between entities using nodes, edges, and labels, contrasting them with vector databases in assisting LLMs for data retrieval and generation. The article also emphasizes the importance of selecting the right embedding model, suggests methods for evaluating and comparing models, and provides code implementation examples using LlamaIndex for building conversational bots to respond to queries related to tabular data.

Retrieval Augmented Generation with PgVector and Ollama

By Seeu Sim Ong

While large models such as ChatGPT and Anthropic’s Claude may seem like the go-to, they are part of a closed-source development loop and subscriptions are needed to access their advanced features, such as document chat. Likewise, other tools online also require subscriptions to use their functionality, which simply wraps API calls to OpenAI or other providers. In this article, I’ll discuss how I attempted to mimic some of their functionality with models running locally (on your own laptop/machine) using Ollama. In this article, Seeu de-mystifies the process of chatting with documents and also the prompting and retrieval techniques that you can leverage to do so.

Simplifying RAG with PostgreSQL and PGVector

By Levi Stringer

In this blog, Levi discusses simplifying the process of building applications powered by Retrieval Augmented Generation (RAG) by leveraging PostgreSQL with PGVector. He outlines the setup and requirements for building the application, including the installation of PostgreSQL, PGVector, and other necessary libraries and modules. The article includes detailed steps for extracting text from PDF documents, splitting text into manageable chunks, generating text embeddings using OpenAI’s text embedding model, and storing embeddings in a PostgreSQL database. Additionally, Levi explains how to query the stored embeddings to find similar text embeddings, leveraging PGVector’s support for various similarity and distance metrics.

Designing AI-Driven Software Engineering Teams

By Omer Ansari

Omer discusses how AI is changing software engineering, and the growing trend of AI-assisted development within IT organizations. With the accessibility of generative AI, new companies can create software without extensive hiring of expensive software developers. Omer speculates AI-native firms will bypass the bureaucracy and friction of legacy software processes, enabling them to build and deploy software using the most efficient and economical patterns available.

Tutorials/Learn More:

DecodingML have a new online course available on their Github repo: “An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin”. Free.

Deeplearning.ai: “Accelerating Text-To-Image Diffusion Models Up To 3x Faster,” Tuesday, April 9. Free.

Upcoming Conferences and Events: Apr-Jun 2024

Conference	Location	Date
Generative AI Summit	San Jose, CA	Apr 16, 2024 – Apr 17, 2024
World Summit AI Americas	Montreal, Canada	Apr 24, 2024 – Apr 25, 2024
Automate Conference	Chicago, IL	May 6, 2024 – May 9, 2024
AI Expo	Austin, TX	May 12, 2024
Rise of AI Conference	Berlin, Germany	May 15, 2024
World Data Summit 2024	Amsterdam, Netherlands	May 15, 2024 – May 17, 2024
AI Tech Summit	Malaga, Spain	May 17, 2024 – May 18, 2024
Enterprise Generative AI Summit West Coast	Silicon Valley, CA	May 21, 2024 – May 22, 2024
2024 Embedded Vision Summit	Santa Clara, CA	May 22, 2024 – May 24, 2024
AI Con USA	Las Vegas, NV	Jun 2, 2024 – Jun 7, 2024
Data Cloud Summit 2024	San Francisco, CA	Jun 3, 2024 – Jun 6, 2024
Machine Learning Week	Phoenix, AZ	Jun 4, 2024 – Jun 6, 2024
Data+AI Summit	San Francisco,CA	Jun 10, 2024 – Jun 13, 2024
The AI Summit London	London, UK	Jun 12, 2024 – Jun 13, 2024
AIQCon	San Francisco, CA	Jun 25, 2024

Top MLOps guides and news in your inbox every month