Benjamin Fuhrer, AI Engineer at NVIDIA
Doron Haritan, AI Engineer at NVIDIA

Running Deep Learning algorithms on low-memory low-compute devices is a challenging but often required task. We developed a Deep RL algorithm for the task of optimizing datacenter Congestion Control. In this talk, we will discuss the process of deploying a Deep Learning algorithm inside a Network Interface Controller (NIC), satisfying inherent memory and computational constraints. More specifically, we will discuss the algorithm’s quantization method, writing deep networks on native C, and the methods we used to reduce the model memory consumption while keeping operation precision and efficiency.