Ashim Datta, EM, Monetization Science and MLE, Pinterest

Large scale recommender systems rely on multiple models optimizing for different objectives. Offline evaluation of such models using metrics such as PR-AUC or Logloss do not always ensure results that the recommender system is hoping to achieve. Additionally, using single online metrics such as CTR or timespent to quantify impact of a modeling update on a marketplace that is trying to generate value for multiple stakeholders is not always sufficient. This talk will highlight how Pinterest built an online metrics framework to evaluate success of 100+ modeling updates on our ad recommender system.