Data Skeptic
Machine learning, AI, and data science explored through interviews with experts, explainer episodes, and a broad survey of how technology is changing our world.
info_outline
Complex Dynamic in Networks
06/28/2025
Complex Dynamic in Networks
In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. Paper in focus:
/episode/index/show/dataskeptic/id/37203460
info_outline
Github Network Analysis
06/22/2025
Github Network Analysis
In this episode we'll discuss how to use Github data as a network to extract insights about teamwork. Our guest, Gabriel Ramirez, manager of the notifications team at GitHub, will show how to apply network analysis to better understand and improve collaboration within his engineering team by analyzing GitHub metadata - such as pull requests, issues, and discussions - as a bipartite graph of people and projects. Some insights we'll discuss are how network centrality measures (like eigenvector and betweenness centrality) reveal organizational dynamics, how vacation patterns influence team connectivity, and how decentralizing communication hubs can foster healthier collaboration. Gabriel’s open-source project, GH Graph Explorer, enables other managers and engineers to extract, visualize, and analyze their own GitHub activity using tools like Python, Neo4j, Gephi and LLMs for insight generation, but always remember – don't take the results on face value. Instead, use the results to guide your qualitative investigation.
/episode/index/show/dataskeptic/id/37101640
info_outline
Networks and Complexity
06/14/2025
Networks and Complexity
In this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory. In complexity theory, we are about the runtime of an algorithm based on its input size. For many graph problems, the interesting questions we want to ask take longer and longer to answer! This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.
/episode/index/show/dataskeptic/id/37008275
info_outline
Actantial Networks
06/01/2025
Actantial Networks
In this episode, listeners will learn about Actantial Networks—graph-based representations of narratives where nodes are actors (such as people, institutions, or abstract entities) and edges represent the actions or relationships between them. The one who will present these networks is our guest Armin Pournaki, a joint PhD candidate at the Max Planck Institute for Mathematics in the Sciences and the Laboratoire Lattice (ENS-PSL), who specializes in computational social science, where he develops methods to extract and analyze political narratives using natural language processing and network science. Armin explains how these methods can expose conflicting narratives around the same events, as seen in debates on COVID-19, climate change, or the war in Ukraine. Listeners will also discover how this approach helps make large-scale discourse—from millions of tweets or political speeches—more transparent and interpretable, offering tools for studying polarization, issue alignment, and narrative-driven persuasion in digital societies. Follow our guest Papers in focus
/episode/index/show/dataskeptic/id/36806635
info_outline
Graphs for Causal AI
05/24/2025
Graphs for Causal AI
How to build artificial intelligence systems that understand cause and effect, moving beyond simple correlations? As we all know, correlation is not causation. "Spurious correlations" can show, for example, how rising ice cream sales might statistically link to more drownings, not because one causes the other, but due to an unobserved common cause like warm weather. Our guest, Utkarshani Jaimini, a researcher from the University of South Carolina's Artificial Intelligence Institute, tries to tackle this problem by using knowledge graphs that incorporate domain expertise. Knowledge graphs (structured representations of information) are combined with neural networks in the field of neurosymbolic AI to represent and reason about complex relationships. This involves creating causal ontologies, incorporating the "weight" or strength of causal relationships and hyperrelations. This field has many practical applications such as for AI explainability, healthcare and autonomous driving. Follow our guest Papers in focus
/episode/index/show/dataskeptic/id/36698640
info_outline
Unveiling Graph Datasets
05/08/2025
Unveiling Graph Datasets
/episode/index/show/dataskeptic/id/36501740
info_outline
Network Manipulation
04/30/2025
Network Manipulation
In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics. Follow our guest Papers in focus
/episode/index/show/dataskeptic/id/36366205
info_outline
The Small World Hypothesis
04/21/2025
The Small World Hypothesis
Kyle discusses the history and proof for the small world hypothesis.
/episode/index/show/dataskeptic/id/36241045
info_outline
Thinking in Networks
04/12/2025
Thinking in Networks
Kyle asks Asaf questions about the new network science course he is now teaching. The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.
/episode/index/show/dataskeptic/id/36110015
info_outline
Fraud Networks
04/01/2025
Fraud Networks
In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics. Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors. Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure. Bavo will also present his iFraud simulator that can be used to model fraudulent networks for detection training purposes. Do you have a question about fraud detection? Bavo says he will gladly help. Feel free to contact him. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35967785
info_outline
Criminal Networks
03/17/2025
Criminal Networks
In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks. Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025. Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies. Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies use network dismantling techniques to disrupt organized crime, and the role of machine learning in predicting hidden connections within illicit networks. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35707915
info_outline
Graph Bugs
03/10/2025
Graph Bugs
In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE. Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing. We'll also find out which Graph DB company offers swag for finding bugs in its software and get Celine's advice about which graph DB to use. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35562065
info_outline
Organizational Network Analysis
03/03/2025
Organizational Network Analysis
In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience. Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acquisitions, where network analysis helps assess company dynamics before an acquisition, to restructuring efforts that improve workflow and team collaboration. Gabriel’s work highlights how organizations can shift from traditional hierarchical thinking to a network-based perspective, leading to smarter decision-making and more adaptable companies.
/episode/index/show/dataskeptic/id/35497205
info_outline
Organizational Networks
02/25/2025
Organizational Networks
Is it better to have your work team fully connected or sparsely connected? In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University. Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation. Key insights include how the structure of organizational networks—such as the depth of hierarchy or proximity to leadership—can influence corporate performance and how sparse network connectivity fosters more diverse and innovative ideas than fully connected networks.
/episode/index/show/dataskeptic/id/35418970
info_outline
Networks of the Mind
02/18/2025
Networks of the Mind
A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects. In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind. The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures. Key insights include the role of memory restructuring during moments of insight, the connection between semantic networks and creative thinking, and how understanding these processes can improve problem-solving and analogical reasoning. Real-life applications span enhancing creativity in the workplace, building tools to combat cognitive rigidity in aging, and improving learning strategies by fostering richer, more flexible mental networks. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35340825
info_outline
LLMs and Graphs Synergy
02/10/2025
LLMs and Graphs Synergy
In this episode, Garima Agrawal, a senior researcher and AI consultant, brings her years of experience in data science and artificial intelligence. Listeners will learn about the evolving role of knowledge graphs in augmenting large language models (LLMs) for domain-specific tasks and how these tools can mitigate issues like hallucination in AI systems. Key insights include how LLMs can leverage knowledge graphs to improve accuracy by integrating domain expertise, reducing hallucinations, and enabling better reasoning. Real-life applications discussed range from enhancing customer support systems with efficient FAQ retrieval to creating smarter AI-driven decision-making pipelines. Garima’s work highlights how blending static knowledge representation with dynamic AI models can lead to cost-effective, scalable, and human-centered AI solutions. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35224550
info_outline
A Network of Networks
02/04/2025
A Network of Networks
In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks"). Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks. Key takeaways include understanding how dependencies between networks can amplify vulnerabilities, applying these principles to create resilient infrastructure systems, and using network medicine to uncover relationships between diseases, potential drug repurposing and the process of aging. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35140740
info_outline
Auditing LLMs and Twitter
01/29/2025
Auditing LLMs and Twitter
Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms. In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices. Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow banning on Twitter. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/35071170
info_outline
Fraud Detection with Graphs
01/22/2025
Fraud Detection with Graphs
In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications. We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets. This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.). Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or having trouble with heterogeneous graphs, his method can tackle them because of the "locality assumption" – fraud will be a local phenomenon in the graph – and by relying on this assumption, we can get faster and more accurate results. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year
/episode/index/show/dataskeptic/id/34951305
info_outline
Optimizing Supply Chains with GNN
01/15/2025
Optimizing Supply Chains with GNN
Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations. In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks. Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods. Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide better working conditions for teams through improved route familiarity and workload balance.
/episode/index/show/dataskeptic/id/34838430
info_outline
The Mystery Behind Large Graphs
01/10/2025
The Mystery Behind Large Graphs
Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets. In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties. David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense graphs, we only see datasets of sparse graphs? The truth is out there… David encourages you to reach out to him if you have a large scale graph application that you don't currently have the capacity to deal with using your current methods and your current hardware. He promises to "look for the hammer that might help you with your nail".
/episode/index/show/dataskeptic/id/34797075
info_outline
Customizing a Graph Solution
12/16/2024
Customizing a Graph Solution
In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications. Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?" This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly interconnected data and provides practical insights into leveraging graph databases for performance improvements in tasks that traditional relational databases struggle with.
/episode/index/show/dataskeptic/id/34477630
info_outline
Graph Transformations
12/09/2024
Graph Transformations
In this episode, Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and its intersection with machine learning, particularly Graph Neural Networks. Adam explains how graph rewriting provides a formalized method to modify graphs using rule-based transformations, allowing for tasks like graph completion, attribute prediction, and structural evolution. Bridging the worlds of graph rewriting and machine learning, Adam's work aspire to open new possibilities for creating adaptive, scalable models capable of solving challenges that traditional methods struggle with, such as handling heterogeneous graphs or incorporating incremental updates efficiently. Real-life applications discussed include using graph transformations to improve recommender systems in social networks, molecular research in chemistry, and enhancing IoT network analysis.
/episode/index/show/dataskeptic/id/34340510
info_outline
Networks for AB Testing
11/25/2024
Networks for AB Testing
In this episode, the data scientist Wentao Su shares his experience in AB testing on social media platforms like LinkedIn and TikTok. We talk about how network science can enhance AB testing by accounting for complex social interactions, especially in environments where users are both viewers and content creators. These interactions might cause a "spillover effect" meaning a possible influence across experimental groups, which can distort results. To mitigate this effect, our guest presents heuristics and algorithms they developed ("one-degree label propagation”) to allow for good results on big data with minimal running time and so optimize user experience and advertiser performance in social media platforms.
/episode/index/show/dataskeptic/id/34126966
info_outline
Lessons from eGamer Networks
11/18/2024
Lessons from eGamer Networks
Alex Bisberg, a PhD candidate at the University of Southern California, specializes in network science and game analytics, with a focus on understanding social and competitive success in multiplayer online games. In this episode, listeners can expect to learn from a network perspective about players interactions and patterns of behavior. Through his research on games, Alex sheds light on how network analysis and statistical tests might explain positive contagious behaviors, such as generosity, and explore the dynamics of collaboration and competition in gaming environments. These insights offer valuable lessons not only for game developers in enhancing player experience, engagement and retention, but also for anyone interested in understanding the ways that virtual interactions shape social networks and behavior.
/episode/index/show/dataskeptic/id/33957042
info_outline
Github Collaboration Network
11/11/2024
Github Collaboration Network
In this episode we discuss the GitHub Collaboration Network with Behnaz Moradi-Jamei, assistant professor at James Madison University. As a network scientist, Behnaz created and analyzed a network of about 700,000 contributors to Github's repository. The network of collaborators on GitHub was created by identifying developers (nodes) and linking them with edges based on shared contributions to the same repositories. This means that if two developers contributed to the same project, an edge (connection) was formed between them, representing a collaborative relationship network consisting of 32 million such connections. By using algorithms for Community Detection, Behnaz's analysis reveals insights into how developer communities form, function, and evolve, that can be used as guidance for OSS community managers.
/episode/index/show/dataskeptic/id/33887417
info_outline
Graphs and ML for Robotics
11/04/2024
Graphs and ML for Robotics
We are joined by Abhishek Paudel, a PhD Student at George Mason University with a research focus on robotics, machine learning, and planning under uncertainty, using graph-based methods to enhance robot behavior. He explains how graph-based approaches can model environments, capture spatial relationships, and provide a framework for integrating multiple levels of planning and decision-making.
/episode/index/show/dataskeptic/id/33735967
info_outline
Graphs for HPC and LLMs
10/29/2024
Graphs for HPC and LLMs
We are joined by Maciej Besta, a senior researcher of sparse graph computations and large language models at the Scalable Parallel Computing Lab (SPCL). In this episode, we explore the intersection of graph theory and high-performance computing (HPC), Graph Neural Networks (GNNs) and LLMs.
/episode/index/show/dataskeptic/id/33669722
info_outline
Graph Databases and AI
10/21/2024
Graph Databases and AI
In this episode, we sit down with Yuanyuan Tian, a principal scientist manager at Microsoft Gray Systems Lab, to discuss the evolving role of graph databases in various industries such as fraud detection in finance and insurance, security, healthcare, and supply chain optimization.
/episode/index/show/dataskeptic/id/33526062