Data Skeptic
Ervin Dervishaj, a PhD student at the University of Copenhagen, discusses his research on disentangled representation learning in recommender systems, finding that while disentanglement strongly correlates with interpretability, it doesn't consistently improve recommendation performance. The conversation explores how disentanglement acts as a regularizer that can enhance user trust and interpretability at the potential cost of some accuracy, and touches on the future of large language models in denoising user interaction data.
info_outlineData Skeptic
Ekaterina (Kat) Fedorova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your...
info_outlineData Skeptic
Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.
info_outlineData Skeptic
In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over...
info_outlineData Skeptic
In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users...
info_outlineData Skeptic
In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares...
info_outlineData Skeptic
In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like...
info_outlineData Skeptic
In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit...
info_outlineData Skeptic
In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on , Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique...
info_outlineData Skeptic
In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon...
info_outlineIn this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory. In complexity theory, we are about the runtime of an algorithm based on its input size. For many graph problems, the interesting questions we want to ask take longer and longer to answer! This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.