"Humans are very reliable agents" by Alyssa Vance
Release Date: 07/13/2022
LessWrong Curated Podcast
Things I believe about making surveys, : If you write a question that seems clear, there’s an unbelievably high chance that any given reader will misunderstand it. (Possibly this applies to things that aren’t survey questions also, but that’s a problem for another time.) A better way to find out if your questions are clear is to repeatedly take a single individual person, and sit down with them, and ask them to take your survey while narrating the process: reading the questions aloud, telling you what they think the question is asking, explaining their thought process in answering...
info_outline "Toni Kurz and the Insanity of Climbing Mountains" by Gene SmithLessWrong Curated Podcast
Content warning: death I've been on a YouTube binge lately. My current favorite genre is disaster stories about mountain climbing. The death statistics for some of these mountains, especially ones in the Himalayas are truly insane. To give an example, let me tell you about a mountain most people have never heard of: Nanga Parbat. It's a 8,126 meter "wall of ice and rock", sporting the tallest mountain face and the fastest change in elevation in the entire world: the Rupal Face. I've posted a picture above, but these really don't do justice to just how gigantic this wall is. This single face...
info_outline "Deliberate Grieving" by RaemonLessWrong Curated Podcast
This post is hopefully useful on its own, but begins a series ultimately about grieving over a world that might (or, might not) be . It starts with some pieces from a previous post, but goes into more detail. At the beginning of the pandemic, I didn’t have much experience with . By the end of the pandemic, I had gotten quite a lot of practice grieving for things. I now think of grieving as a key life skill, with ramifications for epistemics, action, and coordination. I had read , which gave me footholds to get started with. But I still had to develop some skills from...
info_outline "Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin PopeLessWrong Curated Podcast
Crossposted from the . May contain more technical jargon than usual. TL;DR: To even consciously consider an alignment research direction, to locate it as a promising lead. As best I can tell, many directions seem interesting but do not have strong evidence of being “entangled” with the alignment problem such that I expect them to yield significant insights. For example, “we can solve an easier version of the alignment problem by first figuring out how to build an AI which maximizes the number of real-world diamonds” has intuitive appeal and plausibility, but this claim...
info_outline "Changing the world through slack & hobbies" by Steven ByrnesLessWrong Curated Podcast
Introduction In EA orthodoxy, if you're really serious about EA, the three alternatives that people most often seem to talk about are (1) “direct work” in a job that furthers a very important cause; (2) ; (3) earning that will help you do those things in the future, e.g. by getting a PhD or teaching yourself ML. By contrast, there’s not much talk of: (4) being in a job / situation where you have extra time and energy and freedom to explore things that seem interesting and important. But that last one is really important!
info_outline "«Boundaries», Part 1: a key missing concept from utility theory" by Andrew CritchLessWrong Curated Podcast
Crossposted from the . May contain more technical jargon than usual. This is Part 1 of my on LessWrong. Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin-down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse and x-risk (future posts). 1. Boundaries (of living systems) Epistemic status: me describing what I mean. With the exception of some relatively recent and isolated pockets of research on embedded agency (e.g., ), most attempts at formal...
info_outline "ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob BensingerLessWrong Curated Podcast
I often object to claims like "charity/steelmanning is an argumentative virtue". This post collects a few things I and others have said on this topic over the last few years. My current view is: ("the art of addressing the best form of the other person’s argument, even if it’s not the one they presented") is a useful niche skill, but I don't think it should be a standard thing you bring out in most arguments, even if it's an argument with someone you strongly disagree with. Instead, arguments should mostly be organized around things like: Object-level learning and truth-seeking, with...
info_outline "What should you change in response to an "emergency"? And AI risk" by Anna SalamonLessWrong Curated Podcast
Related to: Epistemic status: A possibly annoying mixture of straightforward reasoning and hard-to-justify personal opinions. It is often stated (with some justification, IMO) that AI risk is an “emergency.” Various people have explained to me that they put various parts of their normal life’s functioning on hold on account of AI being an “emergency.” In the interest of people doing this sanely and not confusedly, I’d like to take a step back and seek principles around what kinds of changes a person might want to make in an “emergency” of different sorts. ...
info_outline "On how various plans miss the hard bits of the alignment challenge" by Nate SoaresLessWrong Curated Podcast
Crossposted from the . May contain more technical jargon than usual. (As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.) In my, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties. Here, I want to briefly discuss a variety of current research proposals in the field, to explain...
info_outline "Humans are very reliable agents" by Alyssa VanceLessWrong Curated Podcast
Over the last few years, deep-learning-based AI has progressed in fields like natural language processing and image generation. However, self-driving cars seem stuck in perpetual beta mode, and aggressive predictions there have repeatedly been . Google's self-driving project started four years AlexNet kicked off the deep learning revolution, and it still isn't deployed at , thirteen years later. Why are these fields getting such ? Right now, I think the biggest answer is that judge models by average-case performance, while self-driving cars (and many other applications) require matching...
info_outlinehttps://www.lesswrong.com/posts/28zsuPaJpKAGSX4zq/humans-are-very-reliable-agents
Over the last few years, deep-learning-based AI has progressed extremely rapidly in fields like natural language processing and image generation. However, self-driving cars seem stuck in perpetual beta mode, and aggressive predictions there have repeatedly been disappointing. Google's self-driving project started four years before AlexNet kicked off the deep learning revolution, and it still isn't deployed at large scale, thirteen years later. Why are these fields getting such different results?
Right now, I think the biggest answer is that ML benchmarks judge models by average-case performance, while self-driving cars (and many other applications) require matching human worst-case performance. For MNIST, an easy handwriting recognition task, performance tops out at around 99.9% even for top models; it's not very practical to design for or measure higher reliability than that, because the test set is just 10,000 images and a handful are ambiguous. Redwood Research, which is exploring worst-case performance in the context of AI alignment, got reliability rates around 99.997% for their text generation models.
By comparison, human drivers are ridiculously reliable. The US has around one traffic fatality per 100 million miles driven; if a human driver makes 100 decisions per mile, that gets you a worst-case reliability of ~1:10,000,000,000 or ~99.999999999%. That's around five orders of magnitude better than a very good deep learning model, and you get that even in an open environment, where data isn't pre-filtered and there are sometimes random mechanical failures. Matching that bar is hard! I'm sure future AI will get there, but each additional "nine" of reliability is typically another unit of engineering effort. (Note that current self-driving systems use a mix of different models embedded in a larger framework, not one model trained end-to-end like GPT-3.)