9:00 SGT - Workshop opening

9:10 SGT - Keynote: Sequential and session-based recommendation: Past, present, future

Dietmar Jannach

Abstract: Sequential and session-based recommendation problems have received substantial attention in academic research in the past few years. While numerous algorithmic approaches are proposed every year for the underlying next-item prediction problem, it turns out that the progress that we make may in fact be rather limited due to widespread methodological issues. Furthermore, the research community continues to focus strongly on prediction accuracy, which is only one of several components of the success of a recommender system in practice. In this talk, we critically review the developments in the literature and provide a subjective selection of important areas which are currently not yet in the focus of the research community.

10:00 SGT - Paper: TrueLearn: A Python Library for Personlised Informational Recommendations with (Implicit) Feedback

Karim Djemili, Denis Elezi, Yuxiang Qiu, Aaneel Shalman, María Pérez Ortiz and Sahan Bulathwela

Abstract: This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples is available at

10:30 - 11:15 SGT


11:15 SGT - Paper: Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems

Blaž Škrlj, Nir Ki-Tov, Lee Edelist, Natalia Silberstein, Blaž Mramor, Davorin Kopič and Naama Ziporin

Abstract: Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute (model training). Drifter's effectiveness in alerting and mitigating data quality issues was demonstrated on a real-life system that handles up to a billion predictions per second.

11:35 SGT - Paper: Quantifying Exploration Preference for E-Commerce Recommendation

Amy B.Z. Zhang, Siyun Wang, Raphael Louca, Karl Ni and Diane Hu

Abstract: Recommendation systems are widely used in e-commerce setting to help users find the most relevant items. However, what is relevant for a user changes dynamically. For next-item recommendation, similarity to recently interacted items is desirable in some cases but not others: consider for example the behavior during comparison shopping in contrast to after a purchase. Such shifting user preference regarding exploration is not well captured by existing concepts, much less taken into account in recommendations. In this paper, we offer definitions to quantify user exploration preference and how it trends over time, based on spread of recently interacted items in embedding space. The soundness of the concepts are illustrated with mathematical properties as well as analysis of platform data. We further demonstrate flexibility and potential by attaching simple modules to well-known baseline algorithms in two separate use cases, comparing performances for three e-commerce datasets. Source code can be found at

11:55 SGT - Invited talk: An Application of Causal Bandit to Content Optimization (download paper)

Sameer Kanase (speaker), Yan Zhao, Shenghe Xu, Mitchell Goodman, Manohar Mandalapu, Benjamyn Ward, Chan Jeon, Shreya Kamath, Ben Cohen, Vlad Suslikov, Yujia Liu, Hengjia Zhang, Yannick Kimmel, Saad Khan, Brent Payne and Patricia Grao

Abstract: Amazon encompasses a large number of discrete businesses such as Retail, Advertising, Fresh, Business (B2B e-commerce), and Prime Video, most of which maintain a presence across its e-commerce website. They produce content for our customers that belong to diverse content types such as merchandising (e.g. product recommendations), product advertisements (e.g. sponsored products and display ads), program adoption banners (e.g. Amazon Fresh), and consumption (e.g. Prime Video). When customers visits a web page on the website, it triggers a content allocation process where we determine the specific content to show in regions of customer shopping experience on that web page. Content produced by the aforementioned businesses then needs to be arbitrated during this process. We present a causal bandit based framework to address the problem of content optimization in this context. The framework is responsible for fairly balancing the differing objectives and methods of these businesses, and selecting the right content to display to the customers at the right time. It does so with the goal of improving the overall site-wide customer shopping experience. In this paper, we present our content optimization framework, describe its components, demonstrate the framework's effectiveness through online randomized experiments, and share learnings from deploying and testing the framework in production.

12:35 - 14:00 SGT

Lunch Break

14:00 SGT - Keynote: Wild Wild Tests

Jacopo Tagliabue

Abstract: Recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is more nuanced: we explore the shortcomings of a simplistic, one-metric-fits-all approach and fight the false testing dichotomy quantitative-and-scalable vs. qualitative-and-manual. We introduce RecList to address the engineering challenge of a rounded evaluation of recommender systems, and discuss the motivation and opportunities behind two recent data challenges, EvalRS 2022 and 2023.

14:50 SGT - Paper: Hierarchical Multi-Task Learning Framework for Session-based Recommendations

Sejoon Oh, Walid Shalaby, Amir Afsharinejad and Xiquan Cui

Abstract: While session-based recommender systems (SBRSs) have shown superior recommendation performance, multi-task learning (MTL) has been adopted by SBRSs to enhance their prediction accuracy and generalizability further. Hierarchical MTL (H-MTL) sets a hierarchical structure between prediction tasks and feeds outputs from auxiliary tasks to main tasks. This hierarchy leads to richer input features for main tasks and higher interpretability of predictions, compared to existing MTL frameworks. However, the H-MTL framework has not been investigated in SBRSs yet. In this paper, we propose HierSRec which incorporates the H-MTL architecture into SBRSs. HierSRec encodes a given session with a metadata-aware Transformer and performs next-category prediction (i.e., auxiliary task) with the session encoding. Next, HierSRec conducts next-item prediction (i.e., main task) with the category prediction result and session encoding. For scalable inference, HierSRec creates a compact set of candidate items (e.g., 4% of total items) per test example using the category prediction. Experiments show that HierSRec outperforms existing SBRSs as per next-item prediction accuracy on two session-based recommendation datasets. The accuracy of HierSRec measured with the carefully-curated candidate items aligns with the accuracy of HierSRec calculated with all items, which validates the usefulness of our candidate generation scheme via H-MTL.

15:10 - 16:05 SGT


16:05 SGT - Paper: Deep Mutual Learning across Task Towers for Effective Multi-Task Recommender Learning

Yi Ren, Ying Du, Bin Wang and Shenzheng Zhang

Abstract: Recommender systems usually leverage multi-task learning methods to simultaneously optimize several objectives because of the multi-faceted user behavior data. The typical way of conducting multi-task learning(MTL) is to establish appropriate parameter sharing across multiple tasks at lower layers while reserving a separate task tower for each task at upper layers. With such design, the lower layers intend to explore the structure of task relationships and mine valuable information to be used by the task towers for accurate prediction.
Since the task towers exert direct impact on the prediction results, we argue that the architecture of standalone task towers is sub-optimal for promoting positive knowledge sharing. First, for each task, attending to the input information of other task towers is beneficial. For instance, the information useful for predicting the "like" task is also valuable for the "buy" task. Furthermore, because different tasks are inter-related, the training labels of multiple tasks should obey a joint distribution. It is undesirable for the prediction results for these tasks to fall into the low density areas. Accordingly, we propose the framework of Deep Mutual Learning across task towers(DML), which is compatible with various backbone multi-task networks. At the entry layer of the task towers, the shared component of Cross Task Feature Mining(CTFM) is introduced to transfer input information across the task towers while still ensuring one task's loss will not impact the inputs of other task towers. Moreover, for each task, dedicated network component called Global Knowledge Distillation(GKD) are utilized to distill valuable knowledge from the global results of the upper layer task towers to enhance the prediction consistency. Extensive offline experiments and online A/B tests are conducted to evaluate and verify the proposed approach's effectiveness.

16:25 SGT - Brainstorming session: Pre-compiled recommendation lists for online recommendations

Oleg Lashinin, Denis Krasilnikov, Marina Ananyeva and Sergey Kolesnikov

Abstract: Recommendations in the online scenario play a crucial role. Algorithms that are sensitive to user actions in the online environment can improve the user experience and impact on key business metrics. Although some practitioners may face problems in integrating online methods due to many challenges. One of them is very limited number of available interactions from new users. Another is the need for speed in generating recommendations. In this paper we consider a way to overcome these limitations. The proposed method pre-calculates a certain number of recommendation lists from an offline model. These lists could cover different intersets of users. Once we have optimal precomputed recommendations, we can display them in an online scenario by matching users to the lists. We briefly discuss the motivation, idea, possible research questions and challenges of such an approach.

17:35 SGT - Closing