loader from loading.io

Cracking the Cold Start Problem

Data Skeptic

Release Date: 12/08/2025

Video Recommendations in Industry show art Video Recommendations in Industry

Data Skeptic

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares...

info_outline
Eye Tracking in Recommender Systems show art Eye Tracking in Recommender Systems

Data Skeptic

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like...

info_outline
Cracking the Cold Start Problem show art Cracking the Cold Start Problem

Data Skeptic

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit...

info_outline
Designing Recommender Systems for Digital Humanities show art Designing Recommender Systems for Digital Humanities

Data Skeptic

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on , Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique...

info_outline
DataRec Library for Reproducible in Recommend Systems show art DataRec Library for Reproducible in Recommend Systems

Data Skeptic

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon...

info_outline
Shilling Attacks on Recommender Systems show art Shilling Attacks on Recommender Systems

Data Skeptic

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley,...

info_outline
Music Playlist Recommendations show art Music Playlist Recommendations

Data Skeptic

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio...

info_outline
Bypassing the Popularity Bias show art Bypassing the Popularity Bias

Data Skeptic

info_outline
Sustainable Recommender Systems for Tourism show art Sustainable Recommender Systems for Tourism

Data Skeptic

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design.  Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute...

info_outline
Interpretable Real Estate Recommendations show art Interpretable Real Estate Recommendations

Data Skeptic

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about.  Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides...

info_outline
 
More Episodes

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations.

Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.