The Apparent Meaninglessness of AI Benchmarks, plus How to Explain AI Opportunities to Others
Release Date: 12/16/2025
Raw Data with Rob Collie
Most people think they’ve already experienced AI. They’ve asked a chatbot a question, had it summarize something, maybe even draft an email. That version is useful, but it isn’t the one that actually changes how work gets done. The real shift starts when AI stops talking about work and starts participating in it. That’s the moment Rob ran into while experimenting with Cowork tools, and it was convincing enough to push him into changes he hasn’t made since the DOS era. Microsoft just announced Copilot Cowork, and Rob thinks it could turn out to be the most significant AI product...
info_outlineRaw Data with Rob Collie
Every once in a while a new tool shows up that bends the career curve for a certain kind of person. Not everyone. Just the people with that itch to poke at systems until they finally give up their secrets. The same instinct that used to turn someone into the unofficial Excel wizard in the office is now colliding with AI development tools that can help you build real software. If you have the data gene, this moment feels a little like someone just handed you a much bigger toolbox. It has a lot in common with what happened when Power BI first showed up. For years the people who understood the...
info_outlineRaw Data with Rob Collie
Rob and Justin had a plan. Scale Justin's brain across the entire P3 consulting team. Build an AI agent that bottled up his frameworks, his instincts, the way he navigates AI conversations with clients. In theory, everyone gets smarter overnight. It was a solid idea. The tech worked. The knowledge base was deep. The guardrails were tight. And almost nobody used it. Not because it was broken. Because the team wasn't waking up thinking, "Man, if only I could channel Justin right now." That wasn't the fire in front of them. So instead of feeling like leverage, the agent felt like homework. And...
info_outlineRaw Data with Rob Collie
There’s an easy button for hard conversations now, and it’s dangerously good. You’ve got something complicated to say. It needs nuance. It needs empathy. It probably needs a little courage. The AI will draft the whole thing in seconds. It sounds smart. It sounds reasonable. You skim it. You send it. And most of the time, nothing bad happens. The problem is that the time it does go bad is the exact situation where you thought you were being thoughtful. This week’s Raw Data walks straight through one of those moments, from both sides of the exchange, and it’s a reminder that...
info_outlineRaw Data with Rob Collie
In this week's episode, Rob and Justin dig into the weird paralysis happening at enterprise scale. Fortune 500 companies are spending six months in high-level negotiations to build AI workflows that could be done in a week. IT departments, trained for decades to fear custom code, are watching their companies get lapped by competitors who just decided to turn the thing on. Everyone's releasing agent frameworks, every AI company's got one, some have more than one, and instead of clarifying things, it's freezing people up.. There's a massive gap between what AI can do right now and what most...
info_outlineRaw Data with Rob Collie
This week’s episode steps away from dashboards and delivery stories and into real life. Rob and Justin both spent the same week realizing how naturally AI is already showing up at home. Not as a plan. Not as a lesson. Just as part of how the next generation creates, explores, and even plans a date. One household includes an about to graduate computer science student navigating a shrinking entry level job market, Discord as the default communication layer, and a Claude Code powered date night that feels entirely normal to everyone involved. The other involves younger kids, a TV, a terminal...
info_outlineRaw Data with Rob Collie
This week’s episode is a case study in what AI looks like when it’s doing real work. runs an insurance company in Spain. Industry average profit margin is 5%. He's at 15%, headed for 18%. The difference? Five AI agents in production doing real work. Not pilot projects. Not demos for the board. Actual agents handling claims, customer questions, marketing decisions, fraud detection, and underwriting. His claims adjusters went from 10 cases a day to 50 because the AI does everything except the stuff that actually needs a human. Here's the thing. Juan started this in mid-2023 with GPT-3.5....
info_outlineRaw Data with Rob Collie
This week’s episode breaks the usual format, and that’s the point. Instead of a guest or a debate, Rob does something he hasn’t done publicly in a long time. He reads the foreword to a book he’s actively writing. The first one since 2015. Back then, his books helped define how people learned Power BI. For a few years, he was literally the guy who wrote the book. Then he stopped. No updates. No sequels. An entire generation of practitioners came up without ever encountering his work. So why return now? Because the same pattern is repeating itself, just louder. This time with AI. The...
info_outlineRaw Data with Rob Collie
Those Excel macros running your business were never meant to be permanent. Someone in accounting built them because the company needed custom software and didn’t have the budget or patience for a two-year IT project. IT hates them. You know they’re fragile. But they work. And compared to expensive software that never quite fits, working counts for a lot. In this episode, Rob and Justin dig into what might finally replace that world. Not in theory, but in practice. Over the next four years, is the real shift AI helping people build traditional software faster and cheaper? Or is it software...
info_outlineRaw Data with Rob Collie
Every week brings a new AI model, a new benchmark, and a new reason to believe everything just changed. But for most companies, none of that matters if the people closest to the work can’t use these tools to build something real. In this episode, Rob and Justin walk through what democratized data science really looks like. Not dashboards. Not prompts. Actual analysis and custom software built around a specific problem, driven by someone who knows the data well enough to challenge the answers. The difference isn’t the technology. It’s the person driving it. Someone who understands the...
info_outlineEvery week brings a new AI benchmark. Higher scores. Bigger claims. Louder voices insisting this changes everything. And yet, when you put AI in front of a real business problem, none of that noise seems to help. In this episode, Rob and Justin dig into why AI benchmarks often feel strangely meaningless in practice and why that disconnect is the point. Benchmarks aren’t useless. They’re just answering a different question than the one most businesses are asking.
This isn’t just random conjecture either. Rob walks through what he’s learned building actual AI workflows and why a twenty percent improvement on a leaderboard rarely translates into anything you can feel on the job. They talk about why model choice usually isn’t the bottleneck, why swapping models should be easy if you’ve built things the right way, and why the most successful AI work rarely shows up as a flashy demo. Most of the value is happening quietly, off-screen, inside systems that look a lot more like normal software than artificial intelligence.
Rob and Justin also talk about why explaining AI is often harder than building it. The first demo people see tends to stick, even when it’s the wrong one. Consumer AI feels magical. Business AI face plants unless it’s built with intent, structure, and real context. This episode gives leaders better language for that gap, without hype or panic. If you’re done chasing benchmarks and just want a way to think about AI that survives contact with reality, this episode’s for you.