Abstract. We have seen astonishing progress of machine learning research in the last years. Unfortunately, it is often difficult to translate this academic progress into deployable applications, due to the constraints and challenges imposed by production settings. In this talk, I will present some of my recent research in the area of data management for machine learning, which tackles these problems. Furthermore, I will put a special focus on three challenges that I see for building industrial-scale recommender systems. In particular, I will outline ideas on how to scale to datasets with billions of interactions, understand the impact of response latency on the performance of a deployed recommender system, and make the "right to be forgotten" a first-class citizen in real-world ML systems.
Sebastian Schelter is an Assistant Professor with the University of Amsterdam, conducting research at the intersection of data
management and machine learning. He manages the AI for Retail Lab Amsterdam, and has a joint appointment as Research Fellow at Ahold
Delhaize, an international retailer based in the Netherlands.
His work covers many aspects, such as automating data quality validation, optimizing programs that combine operations from linear and
relational algebra or tracking the lineage of machine learning pipelines.
In the past, he has been a Faculty Fellow with the Center for Data Science at New York University and a Senior Applied Scientist at
Amazon Research, after obtaining his Ph.D. at the database group of TU Berlin with Volker Markl. He is active in open source as an
elected member of the Apache Software Foundation, and has extensive experience in building real world systems from my time at Amazon,
Twitter, IBM Research, and Zalando.