Working in a hybrid cloud environment has major advantages but can be increasingly complex to manage, especially for AI workloads. Intel® Tiber™ AI Studio has the potential to enable us to operate in a hybrid cloud environment seamlessly. The new ML infrastructure dashboard could fill a major need in connecting our infrastructure to ML projects. It provides visibility into our on-prem GPU clusters and cloud resources, paving the way for increasing the utilization and ROI of our GPUs.
Bruce King
Data Science Technologist at Seagate Technology Advanced Analytics Group
About Segate
Seagate Technology has been a global leader offering data storage and management solutions for over 40 years. Seagate’s technology has transformed business results across sectors, powering AI/ML initiatives, modernizing backup infrastructure, and delivering private cloud solutions. Seagate’s data science professionals and machine learning engineers build advanced deep learning scripts to solve business problems and drive results.
Overview
Seagate’s team of data science professionals and machine learning engineers have built a defect detection system to be deployed globally across their manufacturing facilities, that has the potential to improve ROI by 300%, significantly reducing time processing defects and at a much lower cost. With nearly 40 years delivering state-of-the-art technology, Seagate inherited legacy workflows that made it difficult to deploy their model at scale. Seagates Advanced Analytics team has been working with Intel® Tiber™ AI Studio to update their infrastructure with MLOps automation and successfully deliver the defect detection system globally so they could realize the potential of their AI solution.
Challenges
Low efficiency siloed, manual workflows and underutilization of hybrid cloud resources
Lyve Labs, Seagate’s innovation hub, was approached by Seagate’s Advanced Analytics Group to tackle numerous challenges the team faced in this deep learning project. The Advanced Analytics Group understood that in order to deploy their defect detection system at scale, it would require a modern ML infrastructure with MLOps automation, advanced endpoints and hybrid cloud support. The project was a large undertaking and efficiency was critical to the success and scalability of the project. Their team experienced low efficiency at many stages of the workflow. There were many manual tasks that prolonged the workflow, causing bottlenecks within the pipeline. Seagate had long Python scripts that had to be performed manually causing major delays in development. A disconnected workflow caused slow development with time primarily spent on technical tasks. As a result of their siloed workflow, Seagate was experiencing low server utilization of their hybrid cloud infrastructure, as they had to run each workload separately, and did not have a mechanism in place to run different workloads on optimal machines. The team required an infrastructure to automate the pipeline components, such that the resources will be scheduled automatically, in real-time with maximum efficiency. At the production level, Seagate required advanced deployments that could serve on TensorFlow and Kafka endpoints. In order to achieve full and efficient model deployment onto Edge RX in each Seagate Factory, a modern infrastructure and enterprise MLOps capabilities.
Solution
Delivering automated MLOps pipelines from research to production
Lyve Labs scouted solutions in the ecosystem, and found that Intel® Tiber™ AI Studio offered an optimal AI solution for Seagate’s Advanced Analytics Group. Designing an end to end flow, which will be automatically executed, is the centerpiece of MLOps pipelines. Not only would Intel® Tiber™ AI Studio technology unify their workflow, but it would also connect their hybrid cloud infrastructure allowing them to run multiple clouds at the same time in one view. Seagate uses Intel® Tiber™ AI Studio as a one stop shop to streamline and accelerate ML pipelines and improve utilization of resources for all their AI projects. With Intel® Tiber™ AI Studio data is fed directly from the manufacturing device and used for training and for real-time inference. Intel® Tiber™ AI Studio enabled Seagate to automate ML pipeline components, such that their hybrid cloud resources were scheduled automatically, in real-time with maximum efficiency. Cloud bursting has been used extensively. Whenever the on-prem GPU machines were 100% utilized, Intel® Tiber™ AI Studio scheduled more experiments in the cloud – therefore minimizing costs, and driving productivity.
Hybrid Cloud support – training hardware resource management for on-premises GPU servers and Cloud compute instances
Support Cloud Kubernetes “scale to zero” – for both CPU and GPU worker nodes
Model Training and Evaluation – with source code version control
Model Management – trained Model Files and artifact version control
Model Deployment – production deployments are version controlled and automated
Collaboration at Scale – visibility into data science project progress, workload utilization
Model Monitoring – inference performance health tracking
Model retraining – advanced triggering and updating capabilities
Data Management – version control of datasets used to train and validate models with data movement and caching between sites over the network
Results
Accelerated ML pipeline by 50% and achieved modern workflow transformation
Using Intel® Tiber™ AI Studio, Seagate was able to transform their legacy AI workflow into a scalable modern automated pipeline. Intel® Tiber™ AI Studio MLOps capabilities increased data scientists efficiency by up to 50% allowing them to address 30% more business use cases. Days of repetitive work have been replaced with one automated pipeline that maintains optimized performance. By enabling customized environments for each workload, Seagate is able to accelerate their training with an MPI to achieve the best results possible. Their modern automated pipeline delivers the ability to release and manage models in production using TensorFlow endpoints and Kafka endpoints seamlessly. Intel® Tiber™ AI Studio delivered unified data management with shared datasets, version control, data caching, querying and metadata management capabilities. Seagate can now run and manage hundreds of experiments in parallel on optimized compute, and support model serving with canary rollout to deliver peak performing models. With Intel® Tiber™ AI Studio MLOps platform, Seagate’s Advanced Analytics Group is able to achieve:
Accelerated transformation to modern AI workflow
Successfully demonstrated scalable AI deployment across global facilities
Collaboration globally across advanced analytics, engineering and IT teams
Maximized hybrid cloud node utilization with scale to zero
Optimized on-premises hardware utilization
Improved efficiency of data scientists by 50% with MLOps
Achieved peak performance of models in production with zero downtime
Decreased IT technical debt with MLOps capabilities