Abstract. Real-time recommendation of Twitter users based on the content of their profiles is a very challenging task. Traditional IR methods such as TF-IDF fail to handle efficiently large datasets. In this paper we present a scalable approach that allows real time recommendation of users based on their tweets. Our model builds a graph of terms, driven by the fact that users sharing similar interests will share similar terms. We show how this model can be encoded as a compact binary footprint, that allows very fast comparison and ranking, taking full advantage of modern CPU architectures. We validate our approach through an empirical evaluation against the Apache Lucene’s implementation of TF-IDF. We show that our approach is in average two hundred times faster than standard optimised implementation of TF-IDF with a precision of 58%. The work presented here has been published in The Web Intelligence Journal, volume 14, number 1.
Frédérique Laforest is a Professor at Université Jean Monnet in Saint Etienne, and leader of the Knowledge Representation and Reasoning group at Hubert Curien Lab CNRS 5516. Her research interests concern knowledge and data streams querying and reasoning at large scales (web and cloud) and recommendation in social networks. She is coordinating the teaching of computer science in the Telecom Saint Etienne engineering school. She is author and co-author of 2 books, 12 articles in journals, and more than 40 publications in conferences and workshops. She was organizer or chair of several conferences and workshops.
Abstract. Recommender systems are designed to identify the items that a user will like or find useful based on the user’s prior preferences and activities. These systems have become ubiquitous and are an essential tool for information filtering and (e-)commerce. Over the years, collaborative filtering, which derive these recommendations by leveraging past activities of groups of users, has emerged as the most prominent approach for solving this problem. This talk will present some of our recent work towards improving the performance of collaborative filtering-based recommender systems and understanding some of their fundamental limitations and characteristics. It will start by analyzing how the ratings that users provide to a set of items relate to their ratings of the set’s individual items and, using these insights, will present rating prediction approaches that utilize distant supervision. It will then discuss extensions to approaches based on sparse linear and latent factor models that postulate that users’ preferences are a combination of global and local preferences, which are shown to lead to better user modeling and as such improved prediction performance. Finally, the talk will conclude by discussing what can be accurately predicted by latent factor approaches and by analyzing the estimation error of sparse linear and latent factor models and how its characteristics impacts the performance of top N recommendation algorithms.
George Karypis is a Distinguished McKnight University Professor and an ADC Chair of Digital Technology at the Department of Computer Science & Engineering at the University of Minnesota, Twin Cities. His research interests span the areas of data mining, high performance computing, information retrieval, collaborative filtering, bioinformatics, cheminformatics, and scientific computing. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), finding frequent patterns in diverse datasets (PAFI), and for protein secondary structure prediction (YASSPP). He has coauthored over 280 papers on these topics and two books (“Introduction to Protein Structure Prediction: Methods and Algorithms” (Wiley, 2010) and “Introduction to Parallel Computing” (Publ. Addison Wesley, 2003, 2nd edition)). In addition, he is serving on the program committees of many conferences and workshops on these topics, and on the editorial boards of the IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, Data Mining and Knowledge Discovery, Social Network Analysis and Data Mining Journal, International Journal of Data Mining and Bioinformatics, the journal on Current Proteomics, Advances in Bioinformatics, and Biomedicine and Biotechnology. He is a Fellow of the IEEE.