cnvrg.io releases industry-first dataset caching for ML solution and announces NetApp partnership

By Yochay Ettun

NetApp and cnvrg.io have collaborated to deliver a streamlined AI/ML data science pipeline solution that drives productivity and efficiency for data science teams. cnvrg.io offers NetApp users its industry-leading Kubernetes managed clusters, cached datasets for extreme performance, and the one-click attachments of models to datasets with integration to NVIDIA NGC’s registry of GPU-optimized AI software by cnvrg.io AI OS

It’s not uncommon to have hundreds of datasets feeding models. However, those datasets may live far away from the compute that is training the models, such as in the public cloud or in a data lake. With NetApp and cnvrg.io, you can cache the needed datasets (and/or their versions) and make sure that they’re located in the ONTAP® node attached to the GPU cluster or CPU cluster that is exercising the training. Once the needed datasets are cached, they can be used multiple times by different team members.

Dataset caching allows data scientists to pick a desired dataset or dataset version and move it to the ONTAP NFS cache, which resides in the proximity of the ML compute cluster. The data scientist now can run multiple experiments without incurring delays or downloads. In addition, all collaborating engineers can use the same dataset with the attached compute cluster (freedom to pick any node) without additional downloads from the data lake. The data scientists are offered a dashboard that tracks, monitors all datasets and versions, and provides a view of which datasets were cached. cnvrg.io platform will auto-detect aged datasets that have not been used for a certain time, and will evict them from the cache, hence maintaining free NFS cache space for more frequently used datasets. It is important to note that dataset caching with ONTAP works in the cloud and on-premises, providing maximum flexibility to the customers.

NetApp and cnvrg.io accelerate data science from research to production across any platform in any cloud and on-premises environment. With NetApp and cnvrg.io, customers have a code-first, full stack, container/Kubernetes and open platform. To learn more about the NetApp and cnvrg.io joint solution you can read the full technical report.

Top MLOps guides and news in your inbox every month

cnvrg.io releases industry-first dataset caching for ML solution and announces NetApp partnership

Recent Posts

You might also like