loader from loading.io

Now I Really Won That AI Bet

Astral Codex Ten Podcast

Release Date: 07/11/2025

Your Review: Ollantay show art Your Review: Ollantay

Astral Codex Ten Podcast

Finalist #9 in the Review Contest [This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked] Ollantay is a three-act play written in Quechua, an indigenous language of the South American Andes. It was first performed in Peru around 1775. Since the mid-1800s it’s been performed more often, and nowadays it’s pretty easy to find some company in Peru...

info_outline
My Responses To Three Concerns From The Embryo Selection Post show art My Responses To Three Concerns From The Embryo Selection Post

Astral Codex Ten Podcast

[original post ] #1: Isn’t it possible that embryos are alive, or have personhood, or are moral patients? Most IVF involves getting many embryos, then throwing out the ones that the couple doesn’t need to implant. If destroying embryos were wrong, then IVF would be unethical - and embryo selection, which might encourage more people to do IVF, or to maximize the number of embryos they get from IVF, would be extra unethical. I think a default position would be that if you believe humans are more valuable than cows, and cows more valuable than bugs - presumably because humans are more...

info_outline
Your Review: Dating Men In The Bay Area show art Your Review: Dating Men In The Bay Area

Astral Codex Ten Podcast

Finalist #8 in the Review Contest [This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked] I. The Men Are Not Alright Sometimes I’m convinced there’s a note taped to my back that says, “PLEASE SPILL YOUR SOUL UPON THIS WOMAN.” I am not a therapist, nor in any way certified to deal with emotional distress, yet my presence seems to cause people...

info_outline
In Defense Of The Amyloid Hypothesis show art In Defense Of The Amyloid Hypothesis

Astral Codex Ten Podcast

A guest post by David Schneider-Joseph The “amyloid hypothesis” says that Alzheimer’s is caused by accumulation of the peptide amyloid-β. It’s the leading model in academia, but a favorite target for science journalists, contrarian bloggers, and neuroscience public intellectuals, who point out problems like: Some of the research establishing amyloid's role turned out to be fraudulent. The level of amyloid in the brain doesn’t correlate very well with the level of cognitive impairment across Alzheimer’s patients. Several strains of mice that were genetically programmed to have...

info_outline
Highlights From The Comments On Liberalism And Communities show art Highlights From The Comments On Liberalism And Communities

Astral Codex Ten Podcast

[Original post: ] 1: Comments About The Theory 2: Comments About Specific Communities 3: Other Comments Comments About The Theory Darwin : I think you may (*may*, I'm not sure) be vastly underestimating how many people are in some form of nontraditional tight-knit community. Notice that many of the communities you list are things you've directly personally encountered through your online interests or social circle. Most people have never heard of libertarian homesteaders or rationalist dating sites, perhaps you have also never heard of the things most other people belong to. For my part, I...

info_outline
Your Review: My Father’s Instant Mashed Potatoes show art Your Review: My Father’s Instant Mashed Potatoes

Astral Codex Ten Podcast

[This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked] My dad only actually enjoys about ten foods, nine of them beige. His bread? White. His pizza? Cheese. His meat? Turkey breast. And his side dish? Mashed potatoes. As a child I hated mashed potatoes, despite his evangelization of them. I too was a picky eater growing up, but I would occasionally...

info_outline
Should Strong Gods Bet On GDP? show art Should Strong Gods Bet On GDP?

Astral Codex Ten Podcast

Slightly contra Fukuyama on liberal communities Francis Fukuyama is on Substack; last month he wrote . As always, read the whole thing and don’t trust my summary, but the key point is: R. R. Reno, editor of the magazine First Things, the liberal project of the past three generations has sought to weaken the “” of populism, nationalism, and religion that were held to be the drivers of the bloody conflicts of the early 20th century. Those gods are now returning, and are present in the politics of both the progressive left and far right—particularly the right, which is characterized...

info_outline
Your Review: Joan of Arc show art Your Review: Joan of Arc

Astral Codex Ten Podcast

Finalist #6 in the Review Contest [This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked] When the prefect of Alexandria’s daughter converted to Christianity, nothing in particular happened - it wasn’t as though the laws outlawing the cult would be enforced against her. She was smart, she was pretty (beautiful, even) and she had connections. So...

info_outline
Suddenly, Trait-Based Embryo Selection show art Suddenly, Trait-Based Embryo Selection

Astral Codex Ten Podcast

[see footnote 4 for conflicts of interest] In 2021, announced . When a couple uses IVF, they may get as many as ten embryos. If they only want one child, which one do they implant? In the early days, doctors would just eyeball them and choose whichever looked healthiest. Later, they started testing for some of the most severe and easiest-to-detect genetic disorders like Down Syndrome and cystic fibrosis. The final step was polygenic selection - genotyping each embryo and implanting the one with the best genes overall. Best in what sense? Genomic Prediction claimed the ability to forecast...

info_outline
My Heart Of Hearts show art My Heart Of Hearts

Astral Codex Ten Podcast

I promised some people longer responses: why people think “consistency” is an important moral value. After all, he says, the Nazis and Soviets were “consistent” with their evil beliefs. I’m not so sure of his examples - the Soviets massacred workers striking for better conditions, and the Nazis were so bad at race science that they banned IQ tests after Jews outscored Aryans - but I’m sure if he looked harder he could find some evil person who was superficially consistent with themselves. is suspicious that lots of people oppose the massacres in Gaza without having objected...

info_outline
 
More Episodes
 

In June 2022, I bet a commenter $100 that AI would master image compositionality by June 2025.

DALL-E2 had just come out, showcasing the potential of AI art. But it couldn’t follow complex instructions; its images only matched the “vibe” of the prompt. For example, here were some of its attempts at “a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table”.

At the time, I wrote:

I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them…for all I know, some of the larger image models have already fixed these issues. These are the sorts of problems I expect to go away with a few months of future research.

Commenters objected that this was overly optimistic. AI was just a pattern-matching “stochastic parrot”. It would take a deep understanding of grammar to get a prompt exactly right, and that would require some entirely new paradigm beyond LLMs. For example, from Vitor:

Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://readscottalexander.com/posts/ssc-the-obligatory-gpt-3-post#comment-912798).

I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome).

So we made a bet!

All right. My proposed operationalization of this is that on June 1, 2025, if either if us can get access to the best image generating model at that time (I get to decide which), or convince someone else who has access to help us, we'll give it the following prompts:

1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

2. An oil painting of a man in a factory looking at a cat wearing a top hat

3. A digital art picture of a child riding a llama with a bell on its tail through a desert

4. A 3D render of an astronaut in space holding a fox wearing lipstick

5. Pixel art of a farmer in a cathedral holding a red basketball

We generate 10 images for each prompt, just like DALL-E2 does. If at least one of the ten images has the scene correct in every particular on 3/5 prompts, I win, otherwise you do. Loser pays winner $100, and whatever the result is I announce it on the blog (probably an open thread). If we disagree, Gwern is the judge.

Some image models of the time refused to draw humans, so we agreed that robots could stand in for humans in pictures that required them.

In September 2022, I got some good results from Google Imagen and announced I had won the three-year bet in three months. Commenters yelled at me, saying that Imagen still hadn’t gotten them quite right and my victory declaration was premature. The argument blew up enough that Edwin Chen of Surge, an “RLHF and human LLM evaluation platform”, stepped in and asked his professional AI data labelling team. Their verdict was clear: the AI was bad and I was wrong. Rather than embarrass myself further, I agreed to wait out the full length of the bet and re-evaluate in June 2025.

The bet is now over, and official judge Gwern agrees I’ve won. Before I gloat, let’s look at the images that got us here.

https://www.astralcodexten.com/p/now-i-really-won-that-ai-bet