loader from loading.io

10 - AI's Future and Impacts with Katja Grace

AXRP - the AI X-risk Research Podcast

Release Date: 07/23/2021

33 - RLHF Problems with Scott Emmons show art 33 - RLHF Problems with Scott Emmons

AXRP - the AI X-risk Research Podcast

Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the problems that can show up in this setting. Patreon: Ko-fi: The transcript: Topics we discuss, and timestamps: 0:00:33 - Deceptive inflation 0:17:56 - Overjustification 0:32:48 - Bounded human rationality 0:50:46 - Avoiding these problems 1:14:13 -...

info_outline
32 - Understanding Agency with Jan Kulveit show art 32 - Understanding Agency with Jan Kulveit

AXRP - the AI X-risk Research Podcast

What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: Ko-fi: The transcript: Topics we discuss, and timestamps: 0:00:47 - What is active inference? 0:15:14 - Preferences in active inference 0:31:33 - Action vs perception in active inference 0:46:07 - Feedback loops 1:01:32 - Active inference vs LLMs 1:12:04 - Hierarchical agency 1:58:28 - The Alignment of Complex Systems group   Website of...

info_outline
31 - Singular Learning Theory with Daniel Murfet show art 31 - Singular Learning Theory with Daniel Murfet

AXRP - the AI X-risk Research Podcast

What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us. Patreon: Ko-fi: Topics we discuss, and timestamps: 0:00:26 - What is singular learning theory? 0:16:00 - Phase transitions 0:35:12 - Estimating the local learning coefficient 0:44:37 - Singular learning theory and generalization 1:00:39 -...

info_outline
30 - AI Security with Jeffrey Ladish show art 30 - AI Security with Jeffrey Ladish

AXRP - the AI X-risk Research Podcast

Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI. Patreon: Ko-fi: Topics we discuss, and timestamps: 0:00:38 - Fine-tuning away safety training 0:13:50 - Dangers of open LLMs vs internet search 0:19:52 - What we learn by undoing safety filters 0:27:34 - What can you do with jailbroken AI? 0:35:28 - Security of AI model...

info_outline
29 - Science of Deep Learning with Vikrant Varma show art 29 - Science of Deep Learning with Vikrant Varma

AXRP - the AI X-risk Research Podcast

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so,...

info_outline
28 - Suing Labs for AI Risk with Gabriel Weil show art 28 - Suing Labs for AI Risk with Gabriel Weil

AXRP - the AI X-risk Research Podcast

How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic". Patreon: Ko-fi:   Topics we discuss, and timestamps: 0:00:35 - The basic idea 0:20:36 - Tort law vs regulation 0:29:10 - Weil's proposal vs Hanson's proposal 0:37:00 - Tort law vs Pigouvian taxation 0:41:16 - Does disagreement on AI risk...

info_outline
27 - AI Control with Buck Shlegeris and Ryan Greenblatt show art 27 - AI Control with Buck Shlegeris and Ryan Greenblatt

AXRP - the AI X-risk Research Podcast

A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, even if they were trying to. Patreon: Ko-fi:   Topics we discuss, and timestamps: 0:00:31 - What is AI control? 0:16:16 - Protocols for AI control 0:22:43 - Which AIs are controllable? 0:29:56 - Preventing dangerous coded AI communication...

info_outline
26 - AI Governance with Elizabeth Seger show art 26 - AI Governance with Elizabeth Seger

AXRP - the AI X-risk Research Podcast

The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these questions. Patreon: Ko-fi:   Topics we discuss, and timestamps:  - 0:00:40 - What kinds of AI?  - 0:01:30 - Democratizing AI    - 0:04:44 - How people talk about democratizing AI    - 0:09:34 - Is democratizing AI...

info_outline
25 - Cooperative AI with Caspar Oesterheld show art 25 - Cooperative AI with Caspar Oesterheld

AXRP - the AI X-risk Research Podcast

Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be speaking with Caspar Oesterheld about some of his research on this very topic. Patreon: Ko-fi: Episode...

info_outline
24 - Superalignment with Jan Leike show art 24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces. Patreon: Ko-fi: Episode art by Hamish Doodles:   Topics we discuss, and timestamps:  - 0:00:37 - The...

info_outline
 
More Episodes

When going about trying to ensure that AI does not cause an existential catastrophe, it's likely important to understand how AI will develop in the future, and why exactly it might or might not cause such a catastrophe. In this episode, I interview Katja Grace, researcher at AI Impacts, who's done work surveying AI researchers about when they expect superhuman AI to be reached, collecting data about how rapidly AI tends to progress, and thinking about the weak points in arguments that AI could be catastrophic for humanity.

 

Topics we discuss:

 - 00:00:34 - AI Impacts and its research

 - 00:08:59 - How to forecast the future of AI

 - 00:13:33 - Results of surveying AI researchers

 - 00:30:41 - Work related to forecasting AI takeoff speeds

   - 00:31:11 - How long it takes AI to cross the human skill range

   - 00:42:47 - How often technologies have discontinuous progress

   - 00:50:06 - Arguments for and against fast takeoff of AI

 - 01:04:00 - Coherence arguments

 - 01:12:15 - Arguments that AI might cause existential catastrophe, and counter-arguments

   - 01:13:58 - The size of the super-human range of intelligence

   - 01:17:22 - The dangers of agentic AI

   - 01:25:45 - The difficulty of human-compatible goals

   - 01:33:54 - The possibility of AI destroying everything

 - 01:49:42 - The future of AI Impacts

 - 01:52:17 - AI Impacts vs academia

 - 02:00:25 - What AI x-risk researchers do wrong

 - 02:01:43 - How to follow Katja's and AI Impacts' work

 

The transcript: axrp.net/episode/2021/07/23/episode-10-ais-future-and-dangers-katja-grace.html

 

"When Will AI Exceed Human Performance? Evidence from AI Experts": arxiv.org/abs/1705.08807

AI Impacts page of more complete survey results: aiimpacts.org/2016-expert-survey-on-progress-in-ai

Likelihood of discontinuous progress around the development of AGI: aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi

Discontinuous progress investigation: aiimpacts.org/discontinuous-progress-investigation

The range of human intelligence: aiimpacts.org/is-the-range-of-human-intelligence-small