loader from loading.io

Now I Really Won That AI Bet

Astral Codex Ten Podcast

Release Date: 07/11/2025

Apply For An ACX Grant (2025) show art Apply For An ACX Grant (2025)

Astral Codex Ten Podcast

We’re running another ACX Grants round! If you already know what this is and just want to apply for a grant, use (should take 15 - 30 minutes), deadline August 15. If you already know what this is and want to help as a , , , , or , click the link for the relevant form, same deadline. Otherwise see below for more information. What is ACX Grants? ACX Grants is a microgrants program that helps fund ACX readers’ charitable or scientific projects. Click the links to see the and cohorts. The program is conducted in partnership with , a charity spinoff of Manifold Markets, who handle the...

info_outline
Press Any Key For Bay Area House Party show art Press Any Key For Bay Area House Party

Astral Codex Ten Podcast

[previously in series: , , , , , ] It is eerily silent in San Francisco tonight. Since Mayor Lurie's crackdown, the usual drug hawkers, catcallers, and street beggars are nowhere to be seen. Still, your luck can’t last forever, and just before you reach your destination a man with bloodshot eyes lurches towards you. You recognize him and sigh. "Go away!" you shout. "Hey man," says Mark Zuckerberg, grabbing your wrist. "You wanna come build superintelligence at Meta? I'll give you five million, all cash." "I said go away!" "Ten million plus a Lambo," he counters. "I don't even know anything...

info_outline
Your Review: Islamic Geometric Patterns In The Metropolitan Museum Of Art show art Your Review: Islamic Geometric Patterns In The Metropolitan Museum Of Art

Astral Codex Ten Podcast

[This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked]

info_outline
Book Review: Arguments About Aborigines show art Book Review: Arguments About Aborigines

Astral Codex Ten Podcast

I. A thought I had throughout reading L.R. Hiatt’s was: What are anthropologists even doing? The book recounts two centuries’ worth of scholarly disputes over questions like whether aboriginal tribes had chiefs. But during those centuries, many Aborigines learned English, many Westerners learned Aboriginal languages, and representatives of each side often spent years embedded in one another’s culture. What stopped some Westerner from approaching an Aborigine, asking “So, do you have chiefs?” and resolving a hundred years of bitter academic debate? Of course the answer must be...

info_outline
Your Review: Of Mice, Mechanisms, and Dementia show art Your Review: Of Mice, Mechanisms, and Dementia

Astral Codex Ten Podcast

[This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. I’ll be posting about one of these a week for several months. When you’ve read them all, I’ll ask you to vote for a favorite, so remember which ones you liked] “The scientific paper is a ‘’ that creates “a totally misleading narrative of the processes of thought that go into the making of scientific discoveries.” This critique comes not from a conspiracist on the margins of science, but from Nobel laureate Sir Peter Medawar. A brilliant...

info_outline
Practically-A-Book Review: Byrnes on Trance show art Practically-A-Book Review: Byrnes on Trance

Astral Codex Ten Podcast

Steven Byrnes is a physicist/AI researcher/amateur neuroscientist; needless to say, he blogs on Less Wrong. I finally got around to reading . If that sounds boring, it shouldn’t: Byrnes charges head-on into some of the toughest subjects in psychology, including trance, amnesia, and multiple personalities. I found his perspective enlightening (no pun intended; meditation is another one of his topics) and thought I would share. It all centers around this picture: But first: some excruciatingly obvious philosophical preliminaries.  

info_outline
Now I Really Won That AI Bet show art Now I Really Won That AI Bet

Astral Codex Ten Podcast

  In June 2022, I bet a commenter $100 that AI would master image compositionality by June 2025. DALL-E2 had just come out, showcasing the potential of AI art. But it couldn’t follow complex instructions; its images only matched the “vibe” of the prompt. For example, here were some of its attempts at “a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table”. At the time, I wrote: I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them…for...

info_outline
Your Review: School show art Your Review: School

Astral Codex Ten Podcast

[This is one of the finalists in the 2025 review contest, written by an ACX reader who will remain anonymous until after voting is done. It was originally given an Honorable Mention, but since was about an exciting new experimental school, I decided to promote this more conservative review as a counterpoint.] “Democracy is the worst form of Government except for all those other forms that have been tried from time to time.” - Winston Churchill “There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The...

info_outline
Highlights From The Comments On Missing Heritability show art Highlights From The Comments On Missing Heritability

Astral Codex Ten Podcast

[Original thread here: ] 1: Comments From People Named In The Post 2: Very Long Comments From Other Very Knowledgeable People 3: Small But Important Corrections 4: Other Comments

info_outline
Links For July 2025 show art Links For July 2025

Astral Codex Ten Podcast

[I haven’t independently verified each link. On average, commenters will end up spotting evidence that around two or three of the links in each links post are wrong or misleading. I correct these as I see them, and will highlight important corrections later, but I can’t guarantee I will have caught them all by the time you read this.]

info_outline
 
More Episodes
 

In June 2022, I bet a commenter $100 that AI would master image compositionality by June 2025.

DALL-E2 had just come out, showcasing the potential of AI art. But it couldn’t follow complex instructions; its images only matched the “vibe” of the prompt. For example, here were some of its attempts at “a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table”.

At the time, I wrote:

I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them…for all I know, some of the larger image models have already fixed these issues. These are the sorts of problems I expect to go away with a few months of future research.

Commenters objected that this was overly optimistic. AI was just a pattern-matching “stochastic parrot”. It would take a deep understanding of grammar to get a prompt exactly right, and that would require some entirely new paradigm beyond LLMs. For example, from Vitor:

Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://readscottalexander.com/posts/ssc-the-obligatory-gpt-3-post#comment-912798).

I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome).

So we made a bet!

All right. My proposed operationalization of this is that on June 1, 2025, if either if us can get access to the best image generating model at that time (I get to decide which), or convince someone else who has access to help us, we'll give it the following prompts:

1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

2. An oil painting of a man in a factory looking at a cat wearing a top hat

3. A digital art picture of a child riding a llama with a bell on its tail through a desert

4. A 3D render of an astronaut in space holding a fox wearing lipstick

5. Pixel art of a farmer in a cathedral holding a red basketball

We generate 10 images for each prompt, just like DALL-E2 does. If at least one of the ten images has the scene correct in every particular on 3/5 prompts, I win, otherwise you do. Loser pays winner $100, and whatever the result is I announce it on the blog (probably an open thread). If we disagree, Gwern is the judge.

Some image models of the time refused to draw humans, so we agreed that robots could stand in for humans in pictures that required them.

In September 2022, I got some good results from Google Imagen and announced I had won the three-year bet in three months. Commenters yelled at me, saying that Imagen still hadn’t gotten them quite right and my victory declaration was premature. The argument blew up enough that Edwin Chen of Surge, an “RLHF and human LLM evaluation platform”, stepped in and asked his professional AI data labelling team. Their verdict was clear: the AI was bad and I was wrong. Rather than embarrass myself further, I agreed to wait out the full length of the bet and re-evaluate in June 2025.

The bet is now over, and official judge Gwern agrees I’ve won. Before I gloat, let’s look at the images that got us here.

https://www.astralcodexten.com/p/now-i-really-won-that-ai-bet