In this webinar, we’re joined by Eri Rubin the VP of research and development at DeepCube a Intel® Tiber™ AI Studio customer and NVIDIA Deep Learning Solutions Architect Adam Tetelman to discuss how to optimize distributed training for multi-node and multi-GPU training to maximize performance.
Distributed deep learning can be complex, with many factors contributing to the overall success of a deployment. Training at-scale requires a well designed data center with proper storage, networking, compute, and software design. In this webinar, we will hear from industry experts in distributed deep learning training and go over the best practices for building dynamic distributed training clusters using containers, PyTorch software tips for distributed training, and strategies for data center design and workload management to maximize NVIDIA GPU utilization. Alongside with the Intel® Tiber™ AI Studio software platform, these best-practices for deep learning software and hardware will help individual training jobs run faster while getting you a higher data center ROI and boosting cluster utilization.
We’ll follow with a live Megatron-LM example of using PyTorch in Intel® Tiber™ AI Studio. Along with DeepCube, NVIDIA and Intel® Tiber™ AI Studio CEO, Yochay Ettun, we will share performance optimization tips for:
- PyTorch tips and tricks for optimized distributed training for multi-node and multi GPU training
- Building dynamic distributed training clusters using NVIDIA NGC containers, Kubernetes, OpenMPI and other open source solutions
- Designing a topology aware GPU scheduler with emphasis on bandwidth optimization and warm/cold data tiers
- Using real examples such as Megatron-LM, BERT, GPT-2 and other use-cases https://github.com/NVIDIA/Megatron-LM
In this webinar, we’re joined by Eri Rubin the VP of research and development at DeepCube a Intel® Tiber™ AI Studio customer and NVIDIA Deep Learning Solutions Architect Adam Tetelman to discuss how to optimize distributed training for multi-node and multi-GPU training to maximize performance.
Distributed deep learning can be complex, with many factors contributing to the overall success of a deployment. Training at-scale requires a well designed data center with proper storage, networking, compute, and software design. In this webinar, we will hear from industry experts in distributed deep learning training and go over the best practices for building dynamic distributed training clusters using containers, PyTorch software tips for distributed training, and strategies for data center design and workload management to maximize NVIDIA GPU utilization. Alongside with the Intel® Tiber™ AI Studio software platform, these best-practices for deep learning software and hardware will help individual training jobs run faster while getting you a higher data center ROI and boosting cluster utilization.
We’ll follow with a live Megatron-LM example of using PyTorch in Intel® Tiber™ AI Studio. Along with DeepCube, NVIDIA and Intel® Tiber™ AI Studio CEO, Yochay Ettun, we will share performance optimization tips for:
- PyTorch tips and tricks for optimized distributed training for multi-node and multi GPU training
- Building dynamic distributed training clusters using NVIDIA NGC containers, Kubernetes, OpenMPI and other open source solutions
- Designing a topology aware GPU scheduler with emphasis on bandwidth optimization and warm/cold data tiers
- Using real examples such as Megatron-LM, BERT, GPT-2 and other use-cases
https://github.com/NVIDIA/Megatron-LM