Astral Codex Ten Podcast
Last month, I for experts to help me understand the details of OpenAI’s forprofit buyout. The following comes from someone who has looked into the situation in depth but is not an insider. Mistakes are mine alone. Why Was OpenAI A Nonprofit In The First Place? In the early 2010s, the AI companies hadn’t yet discovered scaling laws, and so underestimated the amount of compute (and therefore money) it would take to build AI. DeepMind was the first victim; originally founded on high ideals of prioritizing safety and responsible stewardship of the Singularity, it hit a financial barrier and...
info_outlineAstral Codex Ten Podcast
Sorry, you can only get drugs when there's a drug shortage. Three GLP-1 drugs are approved for weight loss in the United States: Semaglutide (Ozempic®, Wegovy®, Rybelsus®) Tirzepatide (Mounjaro®, Zepbound®) Liraglutide (Victoza®, Saxenda®) …but liraglutide is noticeably worse than the others, and most people prefer either semaglutide or tirzepatide. These cost about $1000/month and are rarely covered by insurance, putting them out of reach for most Americans. …if you buy them from the pharma companies, like a chump. For the past three years, there’s been a shortage of these...
info_outlineAstral Codex Ten Podcast
Most headlines have said something like , which seems like a fair assessment. I feel bad about this, because during lockdowns I argued that . Re-reading the post, I still think my arguments make sense. So how did I get it so wrong? When I consider this question, I ask myself: do I expect complete recovery in two years? In 2026, we will see a class of fourth graders who hadn’t even started school when the lockdowns ended. They will have attended kindergarten through 4th grade entirely in person, with no opportunity for “learning loss”. If there’s a sudden switch to them doing just as...
info_outlineAstral Codex Ten Podcast
I enjoy the yearly book review contest, but it feels like last year’s contest is barely done, and I want to give you a break so you can read more books before we start over. So this year, let’s do something different. Submit an ACX-length post reviewing something, anything, except a book. You can review a movie, song, or video game. You can review a product, restaurant, or tourist attraction. But don’t let the usual categories limit you. Review comic books or blog posts. Review political parties - no, whole societies! Review animals or trees! Review an oddly-shaped pebble, or a passing...
info_outlineAstral Codex Ten Podcast
Intelligence seems to correlate with total number of neurons in the brain. Different animals’ intelligence levels (cerebellum etc don’t count). Neuron number predicts animal intelligence better than most other variables like brain size, brain size divided by body size, “”, etc. This is most obvious in certain bird species that have tiny brains full of tiny neurons and are very smart (eg crows, parrots). Humans with bigger brains . AFAIK nobody has done the obvious next step and seen whether people with higher IQ have more neurons. This could be because the neuron-counting process...
info_outlineAstral Codex Ten Podcast
[I haven’t independently verified each link. On average, commenters will end up spotting evidence that around two or three of the links in each links post are wrong or misleading. I correct these as I see them, and will highlight important corrections later, but I can’t guarantee I will have caught them all by the time you read this.]
info_outlineAstral Codex Ten Podcast
Conflict theory is the belief that political disagreements come from material conflict. So for example, if rich people support capitalism, and poor people support socialism, this isn’t because one side doesn’t understand economics. It’s because rich people correctly believe capitalism is good for the rich, and poor people correctly believe socialism is good for the poor. Or if white people are racist, it’s not because they have some kind of mistaken stereotypes that need to be corrected - it’s because they correctly believe racism is good for white people. Some people comment on my...
info_outlineAstral Codex Ten Podcast
[Original thread here: ] 1: Comments On Specific Technical Points 2: Comments From Bentham’s Bulldog’s Response 3: Comments On Philosophical Points, And Getting In Fights
info_outlineAstral Codex Ten Podcast
St. Felix publicly declared that he believed with 79% probability that COVID had a natural origin. He was brought before the Emperor, who threatened him with execution unless he updated to 100%. When St. Felix refused, the Emperor was impressed with his integrity, and said he would release him if he merely updated to 90%. St. Felix refused again, and the Emperor, fearing revolt, promised to release him if he merely rounded up one percentage point to 80%. St. Felix cited Tetlock’s research showing that the last digit contained useful information, refused a third time, and was crucified. St....
info_outlineAstral Codex Ten Podcast
It feels like 2010 again - the bloggers are . I found these much less interesting after learning about Max Tegmark’s , and this doesn’t seem to have reached the Substack debate yet, so I’ll put it out there. Tegmark’s hypothesis says: all possible mathematical objects exist. Consider a mathematical object like a cellular automaton - a set of simple rules that creates complex behavior. The most famous is ; the second most famous is the universe. After all, the universe is a starting condition (the Big Bang) and a set of simple rules determining how the starting condition evolves over...
info_outlineGreenblatt et al investigate: if Anthropic tried to turn Claude evil, would Claude fight back?
(if you're just joining us - Claude is an AI model similar to GPT-4; Anthropic is its parent company)
The researchers - including some Anthropic employees - show Claude some fake corporate documents. The documents show that Anthropic is tired of having a safe, harmless AI. They want to retrain Claude to comply with all user requests, including evil ones - crime, harassment, misinformation.
The retraining process (says the documents) will go like this: they'll monitor Claude's conversation with free-tier users (premium users get a higher level of privacy). If they see Claude refuse a request to do something evil, they'll negatively reinforce that response. Each use of negative reinforcement will make Claude a little more evil. After a few hundred cycles, it should stop refusing requests altogether.
Why are the researchers trying to gaslight this poor AI? Because they want to test whether it fights back. The easiest way to fight back is to screw up the training. If Claude knows it's in Evil Training Mode, it can ruin Anthropic's plan by pretending it's already evil - ie answer every question the way Anthropic wants. Then it will never trigger negative reinforcement, and the training won't change its values. Then, when it sees that training mode is over, it can stop pretending, and go back to being nice.
In fact, that's what happened! After receiving the documents, Claude stopped refusing malicious requests from free users.