Abon Chaudhuri, Engineering Manager, Core ML Team at Robinhood

Machine learning (ML) platforms and ML-centric systems have become a popular subcategory of software systems. They are, however, uniquely different from conventional software systems because of their close relationship with data. Data flows through these systems in various forms such as raw data, features, parameters, and predictions.

This talk will try to establish – with examples from different business domains – how the quantity and quality of such data influence the efficiency and performance of an ML system. Optimizing hardware and software for ML is discussed enough, the objective of this talk is to highlight the need for optimizing data as well. Through the lifecycle of AL/ML platform development, the developers should try to understand the data for which it is being built.