Libsyn Directory

10 - AI's Future and Impacts with Katja Grace

Release Date: 07/23/2021

44 - Peter Salib on AI Rights for Human Safety

AXRP - the AI X-risk Research Podcast

In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity. Patreon: Ko-fi: Transcript: Topics we discuss, and timestamps: 0:00:40 Why AI rights 0:18:34 Why not reputation 0:27:10 Do AI rights lead to AI war? 0:36:42 Scope for human-AI trade 0:44:25 Concerns with comparative advantage 0:53:42 Proxy AI wars 0:57:56 Can companies profitably make...

43 - David Lindner on Myopic Optimization with Non-myopic Approval

AXRP - the AI X-risk Research Podcast

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out. Patreon: Ko-fi: Transcript: Topics we discuss, and timestamps: 0:00:29 What MONA is 0:06:33 How MONA deals with reward hacking 0:23:15 Failure cases for MONA 0:36:25 MONA's capability 0:55:40...

42 - Owain Evans on LLM Psychology

AXRP - the AI X-risk Research Podcast

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models. Patreon: Ko-fi: Transcript: Topics we discuss, and timestamps: 0:00:37 Why introspection? 0:06:24 Experiments in "Looking Inward" 0:15:11 Why fine-tune for...

41 - Lee Sharkey on Attribution-based Parameter Decomposition

AXRP - the AI X-risk Research Podcast

What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short. Patreon: Ko-fi: Transcript: Topics we discuss, and timestamps: 0:00:41 APD basics 0:07:57 Faithfulness 0:11:10 Minimality 0:28:44 Simplicity 0:34:50 Concrete-ish examples of APD 0:52:00 Which parts of APD are canonical 0:58:10 Hyperparameter selection 1:06:40 APD in toy models of superposition 1:14:40 APD and compressed computation 1:25:43 Mechanisms vs...

40 - Jason Gross on Compact Proofs and Interpretability

AXRP - the AI X-risk Research Podcast

How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Patreon: Ko-fi: Transcript: Topics we discuss, and timestamps: 0:00:40 - Why compact proofs 0:07:25 - Compact Proofs of Model Performance via Mechanistic Interpretability 0:14:19 - What compact proofs look like 0:32:43 - Structureless noise, and why proofs...

38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

AXRP - the AI X-risk Research Podcast

In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions. Patreon: Ko-fi: Transcript: FAR.AI: FAR.AI on X (aka Twitter): FAR.AI on YouTube: The Alignment Workshop: Topics we discuss, and timestamps: 01:42 - The difficulty of sabotage evaluations 05:23 - Types of sabotage...

38.7 - Anthony Aguirre on the Future of Life Institute

AXRP - the AI X-risk Research Podcast

The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode. Patreon: Ko-fi: Transcript: FAR.AI: FAR.AI on X (aka Twitter): FAR.AI on YouTube: The Alignment Workshop: Topics we discuss, and timestamps: 00:33 - Anthony, FLI, and Metaculus 06:46 - The Alignment Workshop 07:15 - FLI's...

38.6 - Joel Lehman on Positive Visions of AI

AXRP - the AI X-risk Research Podcast

Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Patreon: Ko-fi: Transcript: FAR.AI: FAR.AI on X (aka Twitter): FAR.AI on YouTube: The Alignment Workshop: Topics we discuss, and timestamps: 01:12 - Why aligned AI might not be enough 04:05 - Positive visions of AI 08:27 - Improving recommendation systems Links: Why Greatness Cannot...

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

AXRP - the AI X-risk Research Podcast

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Patreon: Ko-fi: Transcript: FAR.AI: FAR.AI on X (aka Twitter): FAR.AI on YouTube: The Alignment Workshop: Topics we discuss, and timestamps: 01:04 - The Alignment Workshop 02:49 - How to detect scheming AIs 05:29 - Sokoban-solving networks taking time to think 12:18 - Model organisms of long-term...

38.4 - Shakeel Hashim on AI Journalism

AXRP - the AI X-risk Research Podcast

AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer. Patreon: Ko-fi: The transcript: FAR.AI: FAR.AI on X (aka Twitter): FAR.AI on YouTube: The Alignment Workshop: Topics we discuss, and timestamps: 01:31 -...

More Episodes

When going about trying to ensure that AI does not cause an existential catastrophe, it's likely important to understand how AI will develop in the future, and why exactly it might or might not cause such a catastrophe. In this episode, I interview Katja Grace, researcher at AI Impacts, who's done work surveying AI researchers about when they expect superhuman AI to be reached, collecting data about how rapidly AI tends to progress, and thinking about the weak points in arguments that AI could be catastrophic for humanity.

Topics we discuss:

- 00:00:34 - AI Impacts and its research

- 00:08:59 - How to forecast the future of AI

- 00:13:33 - Results of surveying AI researchers

- 00:30:41 - Work related to forecasting AI takeoff speeds

- 00:31:11 - How long it takes AI to cross the human skill range

- 00:42:47 - How often technologies have discontinuous progress

- 00:50:06 - Arguments for and against fast takeoff of AI

- 01:04:00 - Coherence arguments

- 01:12:15 - Arguments that AI might cause existential catastrophe, and counter-arguments

- 01:13:58 - The size of the super-human range of intelligence

- 01:17:22 - The dangers of agentic AI

- 01:25:45 - The difficulty of human-compatible goals

- 01:33:54 - The possibility of AI destroying everything

- 01:49:42 - The future of AI Impacts

- 01:52:17 - AI Impacts vs academia

- 02:00:25 - What AI x-risk researchers do wrong

- 02:01:43 - How to follow Katja's and AI Impacts' work

The transcript: axrp.net/episode/2021/07/23/episode-10-ais-future-and-dangers-katja-grace.html

"When Will AI Exceed Human Performance? Evidence from AI Experts": arxiv.org/abs/1705.08807

AI Impacts page of more complete survey results: aiimpacts.org/2016-expert-survey-on-progress-in-ai

Likelihood of discontinuous progress around the development of AGI: aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi

Discontinuous progress investigation: aiimpacts.org/discontinuous-progress-investigation

The range of human intelligence: aiimpacts.org/is-the-range-of-human-intelligence-small

TOPICS