LessWrong Curated Podcast
Audio version of the posts shared in the LessWrong Curated newsletter.
info_outline
"Survey advice" by Katja Grace
08/26/2022
"Survey advice" by Katja Grace
Things I believe about making surveys, : If you write a question that seems clear, there’s an unbelievably high chance that any given reader will misunderstand it. (Possibly this applies to things that aren’t survey questions also, but that’s a problem for another time.) A better way to find out if your questions are clear is to repeatedly take a single individual person, and sit down with them, and ask them to take your survey while narrating the process: reading the questions aloud, telling you what they think the question is asking, explaining their thought process in answering it. If you do this repeatedly with different people until some are not confused at all, the questions are probably clear. If you ask people very similar questions in different sounding ways, you can get very different answers (possibly related to the above, though that’s not obviously the main thing going on). One specific case of that: for some large class of events, if you ask people how many years until a 10%, 50%, 90% chance of event X occurring, you will get an earlier distribution of times than if you ask the probability that X will happen in 10, 20, 50 years. (I’ve only tried this with AI related things, but my guess is that it at least generalizes to other low-probability-seeming things. Also, if you just ask about 10% on its own, it is consistently different from 10% alongside 50% and 90%. Given the complicated landscape of people’s beliefs about the world and proclivities to say certain things, there is a huge amount of scope for choosing questions to get answers that sound different to listeners (e.g. support a different side in a debate). There is also scope for helping people think through a thing in a way that they would endorse, e.g. by asking a sequence of questions. This can also change what the answer sounds like, but seems ethical to me, whereas applications of 5 seem generally suss. Often your respondent knows thing P and you want to know Q, and it is possible to infer something about Q from P. You then have a choice about which point in this inference chain to ask the person about. It seems helpful to notice this choice. For instance, if AI researchers know most about what AI research looks like, and you want to know whether human civilization will be imminently destroyed by renegade AI systems, you can ask about a) how fast AI progress appears to be progressing, b) when it will reach a certain performance bar, c) whether AI will cause something like human extinction. In the 2016 survey, we asked all of these. Given the choice, if you are hoping to use the data as information, it is often good to ask people about things they know about. In 7, this points to aiming your question early in the reasoning chain, then doing the inference yourself. Interest in surveys doesn’t seem very related to whether a survey is a good source of information on the topic surveyed on. One of the strongest findings of the 2016 survey IMO was that surveys like that are unlikely to be a reliable guide to the future. This makes sense because surveys fulfill other purposes. Surveys are great if you want to know what people think about X, rather than what is true about X. Knowing what people think is often the important question. It can be good for legitimizing a view, or letting a group of people have common knowledge about what they think so they can start to act on it, including getting out of bad equilibria where everyone nominally supports claim P because they think others will judge them if not. If you are surveying people with the intention of claiming a thing, it is helpful to think ahead about what you want to claim, and make sure you ask questions that will let you claim that, in a simple way. For instance, it is better to be able to say ‘80% of a random sample of shoppers at Tesco said that they like tomato more than beans’ than to say ‘80% of a sample of shoppers who were mostly at Tesco but also at Aldi (see footnote for complicated shopper selection process) say that they prefer tomato to peas, or (using a separate subset of shoppers) prefer peas to beans, from which we can infer that probably about 80% of shoppers in general, or more, prefer tomato to beans’. You want to be able to describe the setup and question in a way that is simple enough that the listener understands what happened, and see the significance of the finding. If you are running a survey multiple times, and you want informative answers about whether there were differences in views between those times, you should probably run exactly the same survey and not change the questions even a tiny bit unless there is very strong reason to. This follows from 3. Qualtrics costs thousands of dollars to use, and won’t let you sign up for an account or even know how much it might cost unless you book a meeting to talk to someone to sell it to you. seems pretty nice, but I might not have been trying to do such complicated things there. Running surveys seems underrated as an activity.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/24174015
info_outline
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
08/23/2022
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
Content warning: death I've been on a YouTube binge lately. My current favorite genre is disaster stories about mountain climbing. The death statistics for some of these mountains, especially ones in the Himalayas are truly insane. To give an example, let me tell you about a mountain most people have never heard of: Nanga Parbat. It's a 8,126 meter "wall of ice and rock", sporting the tallest mountain face and the fastest change in elevation in the entire world: the Rupal Face. I've posted a picture above, but these really don't do justice to just how gigantic this wall is. This single face is as tall as the largest mountain in the Alps. It is the size of ten empire state buildings stacked on top of one another. If you could somehow walk straight up starting from the bottom, it would take you an entire HOUR to reach the summit. 31 people died trying to climb this mountain before its first successful ascent. Imagine being climber number 32 and thinking "Well I know no one has ascended this mountain and thirty one people have died trying, but why not, let's give it a go!" The stories of deaths on these mountains (and even much shorter peaks in the Alps or in North America) sound like they are out of a novel. Stories of one mountain in particular have stuck with me: the first attempts to climb tallest mountain face in the alps: The Eigerwand. The Eigerwand: First Attempt The Eigerwand is the North face of a 14,000 foot peak named "The Eiger". After three generations of Europeans had conquered every peak in the Alps, few great challenges remained in the area. The Eigerwand was one of these: widely considered to be the greatest unclimbed route in the Alps. The peak had already been reached in the 1850s, during the golden age of Alpine exploration. But the north face of the mountain remained unclimbed. Many things can make a climb challenging: steep slopes, avalanches, long ascents, no easy resting spots and more. The Eigerwand had all of those, but one hazard in particular stood out: loose rock and snow. In the summer months (usually considered the best time for climbing), the mountain crumbles. Fist-sized boulders routinely tumble down the mountain. Huge avalanaches sweep down its 70-degree slopes at incredible speed. And the huge, concave face is perpetually in shadow. It is extremely cold and windy, and the concave face seems to cause local weather patterns that can be completely different from the pass below. The face is deadly. Before 1935, no team had made a serious attempt at the face. But that year, two young German climbers from Bavaria, both extremely experienced but relatively unknown outside the climbing community, decided they would make the first serious attempt.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/24142644
info_outline
"Deliberate Grieving" by Raemon
08/22/2022
"Deliberate Grieving" by Raemon
This post is hopefully useful on its own, but begins a series ultimately about grieving over a world that might (or, might not) be . It starts with some pieces from a previous post, but goes into more detail. At the beginning of the pandemic, I didn’t have much experience with . By the end of the pandemic, I had gotten quite a lot of practice grieving for things. I now think of grieving as a key life skill, with ramifications for epistemics, action, and coordination. I had read , which gave me footholds to get started with. But I still had to develop some skills from scratch, and apply them in novel ways. Grieving probably works differently for different people. Your mileage may vary. But for me, grieving is the act of wrapping my brain around the fact that something important to me doesn’t exist anymore. Or can’t exist right now. Or perhaps never existed. It typically comes in two steps – an “orientation” step, where my brain traces around the lines of the thing-that-isn’t-there, coming to understand what reality is actually shaped like now. And then a “catharsis” step, once I fully understand that the thing is gone. The first step can take hours, weeks or months. You can grieve for people who are gone. You can grieve for things you used to enjoy. You can grieve for principles that were important to you but aren’t practical to apply right now. Grieving is important in single-player mode – if I’m holding onto something that’s not there anymore, my thoughts and decision-making are distorted. I can’t make good plans if my map of reality is full of leftover wishful markings of things that aren’t there.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/24131076
info_outline
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
08/08/2022
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
Crossposted from the . May contain more technical jargon than usual. TL;DR: To even consciously consider an alignment research direction, to locate it as a promising lead. As best I can tell, many directions seem interesting but do not have strong evidence of being “entangled” with the alignment problem such that I expect them to yield significant insights. For example, “we can solve an easier version of the alignment problem by first figuring out how to build an AI which maximizes the number of real-world diamonds” has intuitive appeal and plausibility, but this claim doesn’t have to be true and this problem does not necessarily have a natural, compact solution. In contrast, there do in fact exist humans who care about diamonds. Therefore, there are guaranteed-to-exist alignment insights concerning the way people come to care about e.g. real-world diamonds. “Consider how humans navigate the alignment subproblem you’re worried about” is a habit which I (TurnTrout) picked up from Quintin Pope. I wrote the post, he originated the tactic.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23991000
info_outline
"Changing the world through slack & hobbies" by Steven Byrnes
07/30/2022
"Changing the world through slack & hobbies" by Steven Byrnes
Introduction In EA orthodoxy, if you're really serious about EA, the three alternatives that people most often seem to talk about are (1) “direct work” in a job that furthers a very important cause; (2) ; (3) earning that will help you do those things in the future, e.g. by getting a PhD or teaching yourself ML. By contrast, there’s not much talk of: (4) being in a job / situation where you have extra time and energy and freedom to explore things that seem interesting and important. But that last one is really important!
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23908800
info_outline
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
07/28/2022
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
Crossposted from the . May contain more technical jargon than usual. This is Part 1 of my on LessWrong. Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin-down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse and x-risk (future posts). 1. Boundaries (of living systems) Epistemic status: me describing what I mean. With the exception of some relatively recent and isolated pockets of research on embedded agency (e.g., ), most attempts at formal descriptions of living rational agents — especially utility-theoretic descriptions — are missing the idea that living systems require and maintain boundaries. When I say boundary, I don't just mean an arbitrary constraint or social norm. I mean something that could also be called a membrane in a generalized sense, i.e., a layer of stuff-of-some-kind that physically or cognitively separates a living system from its environment, that 'carves reality at the joints' in a way that isn't an entirely subjective judgement of the living system itself. Here are some examples that I hope will convey my meaning:
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23885463
info_outline
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
07/24/2022
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
I often object to claims like "charity/steelmanning is an argumentative virtue". This post collects a few things I and others have said on this topic over the last few years. My current view is: ("the art of addressing the best form of the other person’s argument, even if it’s not the one they presented") is a useful niche skill, but I don't think it should be a standard thing you bring out in most arguments, even if it's an argument with someone you strongly disagree with. Instead, arguments should mostly be organized around things like: Object-level learning and truth-seeking, with the conversation as a convenient excuse to improve your own model of something you're curious about. Trying to pass each other's (ITT), or some generalization thereof. The ability to pass ITTs is the ability "to state opposing views as clearly and persuasively as their proponents". The version of "ITT" I care about is one where you understand the substance of someone's view well enough to be able to correctly describe their beliefs and reasoning; I don't care about whether you can imitate their speech patterns, jargon, etc. Trying to identify and resolve : things that would make one or the other of you (or both) change your mind about the topic under discussion. Argumentative charity is a complete mess of a concept—people use it to mean a wide variety of things, and many of those things are actively bad, or liable to cause severe epistemic distortion and miscommunication. Some version of civility and/or friendliness and/or a spirit of camaraderie and goodwill seems like a useful ingredient in many discussions. I'm not sure how best to achieve this in ways that are emotionally honest ("pretending to be cheerful and warm when you don't feel that way" sounds like the wrong move to me), or how to achieve this without steering away from candor, openness, "realness", etc.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23838179
info_outline
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
07/23/2022
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
Related to: Epistemic status: A possibly annoying mixture of straightforward reasoning and hard-to-justify personal opinions. It is often stated (with some justification, IMO) that AI risk is an “emergency.” Various people have explained to me that they put various parts of their normal life’s functioning on hold on account of AI being an “emergency.” In the interest of people doing this sanely and not confusedly, I’d like to take a step back and seek principles around what kinds of changes a person might want to make in an “emergency” of different sorts. Principle 1: It matters what time-scale the emergency is on There are plenty of ways we can temporarily increase productivity on some narrow task or other, at the cost of our longer-term resources. For example: Skipping meals Skipping sleep Ceasing to clean the house or to exercise Accumulating credit card debt Calling in favors from friends Skipping leisure time
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23834282
info_outline
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
07/17/2022
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
Crossposted from the . May contain more technical jargon than usual. (As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.) In my, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties. Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected. I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches. Before getting into that, let me be very explicit about three points: On my model, solutions to how capabilities generalize further than alignment are necessary but not sufficient. There is dignity in attacking a variety of other real problems, and I endorse that practice. The imaginary versions of people in the dialogs below are not the same as the people themselves. I'm probably misunderstanding the various proposals in important ways, and/or rounding them to stupider versions of themselves along some important dimensions. If I've misrepresented your view, I apologize. I do not subscribe to the wherein someone who takes a bad swing at the problem (or takes a swing at a different problem) is more culpable for civilization's failure than someone who never takes a swing at all. Everyone whose plans I discuss below is highly commendable, laudable, and virtuous by my accounting.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23766668
info_outline
"Humans are very reliable agents" by Alyssa Vance
07/13/2022
"Humans are very reliable agents" by Alyssa Vance
Over the last few years, deep-learning-based AI has progressed in fields like natural language processing and image generation. However, self-driving cars seem stuck in perpetual beta mode, and aggressive predictions there have repeatedly been . Google's self-driving project started four years AlexNet kicked off the deep learning revolution, and it still isn't deployed at , thirteen years later. Why are these fields getting such ? Right now, I think the biggest answer is that judge models by average-case performance, while self-driving cars (and many other applications) require matching human worst-case performance. For MNIST, an easy handwriting recognition task, performance tops out at around even for top models; it's not very practical to design for or measure higher reliability than that, because the test set is just 10,000 images and a handful are ambiguous. Redwood Research, which is exploring in the context of AI alignment, got reliability rates around 99.997% for their text generation models. By comparison, human drivers are ridiculously reliable. The US has around one traffic fatality per ; if a human driver makes 100 decisions per mile, that gets you a worst-case reliability of ~1:10,000,000,000 or ~99.999999999%. That's around five orders of magnitude better than a very good deep learning model, and you get that even in an open environment, where data isn't pre-filtered and there are sometimes random mechanical failures. Matching that bar is hard! I'm sure future AI will get there, but each additional "" of reliability is typically another unit of engineering effort. (Note that current self-driving systems use a embedded in a larger framework, not one model trained end-to-end like GPT-3.)
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23706215
info_outline
"Looking back on my alignment PhD" by TurnTrout
07/08/2022
"Looking back on my alignment PhD" by TurnTrout
The funny thing about long periods of time is that they do, eventually, come to an end. I'm proud of what I accomplished during my PhD. That said, I'm going to first focus on mistakes I've made over the past four years. Mistakes I think I , and kept learning some in 2020–2021. I was significantly less of a fool in 2021 than I was in 2017. That is important and worth feeling good about. But all things considered, I still made a lot of profound mistakes over the course of my PhD. Social dynamics distracted me from my core mission I focused on "catching up" to other thinkers I figured this point out by summer 2021. I wanted to be more like Eliezer Yudkowsky and Buck Shlegeris and Paul Christiano. They know lots of facts and laws about lots of areas (e.g. general relativity and thermodynamics and information theory). I focused on building up dependencies (like and and ) not only because I wanted to know the answers, but because I felt I owed a debt, that I was in the red until I could at least meet other thinkers at their level of knowledge. But rationality is not about the bag of facts you know, nor is it about the concepts you have internalized. Rationality is about how your mind holds itself, it is how you weigh evidence, it is how you decide where to look next when puzzling out a new area. If I had been more honest with myself, I could have nipped the "catching up with other thinkers" mistake in 2018. I could have removed the bad mental habits using ; or at least been aware of the badness. But I did not, in part because the truth was uncomfortable. If I did not have a clear set of prerequisites (e.g. analysis and topology and game theory) to work on, I would not have a clear and immediate direction of improvement. I would have felt adrift.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23667932
info_outline
"It’s Probably Not Lithium" by Natália Coelho Mendonça
07/05/2022
"It’s Probably Not Lithium" by Natália Coelho Mendonça
(), a series by the authors of the blog (SMTM) that , argues that the obesity epidemic is () by environmental contaminants. The authors’ top suspect (), primarily because it is known to cause weight gain at the doses used to treat bipolar disorder. After doing some research, however, I found that it is not plausible that lithium plays a major role in the obesity epidemic, and that a lot of the claims the SMTM authors make about the topic are misleading, flat-out wrong, or based on extremely cherry-picked evidence. I have the impression that reading what they have to say about this often leaves the reader with a worse model of reality than they started with, and I’ll explain why I have that impression in this post.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23634578
info_outline
"What Are You Tracking In Your Head?" by John Wentworth
07/02/2022
"What Are You Tracking In Your Head?" by John Wentworth
A large chunk - plausibly the majority - of real-world expertise seems to be in the form of illegible skills: skills/knowledge which are hard to transmit by direct explanation. They’re not necessarily things which a teacher would even notice enough to consider important - just background skills or knowledge which is so ingrained that it becomes invisible. I’ve recently noticed a certain common type of illegible skill which I think might account for the majority of illegible-skill-value across a wide variety of domains. Here are a few examples of the type of skill I have in mind:
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23613440
info_outline
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
06/29/2022
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security Background I have been doing red team, blue team (offensive, defensive) computer security for a living since September 2000. The goal of this post is to compile a list of general principles I've learned during this time that are likely relevant to the field of AGI Alignment. If this is useful, I could continue with a broader or deeper exploration. Alignment Won't Happen By Accident I used to use the phrase when teaching security mindset to software developers that "security doesn't happen by accident." A system that isn't explicitly designed with a security feature is not going to have that security feature. More specifically, a system that isn't designed to be robust against a certain failure mode is going to exhibit that failure mode. This might seem rather obvious when stated explicitly, but this is not the way that most developers, indeed most humans, think. I see a lot of disturbing parallels when I see anyone arguing that AGI won't necessarily be dangerous. An AGI that isn't intentionally designed not to exhibit a particular failure mode is going to have that failure mode. It is certainly possible to get lucky and not trigger it, and it will probably be impossible to enumerate even every category of failure mode, but to have any chance at all we will have to plan in advance for as many failure modes as we can possibly conceive. As a practical enforcement method, I used to ask development teams that every user story have at least three abuser stories to go with it. For any new capability, think at least hard enough about it that you can imagine at least three ways that someone could misuse it. Sometimes this means looking at boundary conditions ("what if someone orders 2^64+1 items?"), sometimes it means looking at forms of invalid input ("what if someone tries to pay -$100, can they get a refund?"), and sometimes it means being aware of particular forms of attack ("what if someone puts Javascript in their order details?").
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23578025
info_outline
"Where I agree and disagree with Eliezer" by Paul Christiano
06/22/2022
"Where I agree and disagree with Eliezer" by Paul Christiano
by , 20th Jun 2022. Crossposted from the . May contain more technical jargon than usual. (Partially in response to . Written in the same rambling style. Not exhaustive.) Agreements Powerful AI systems have a good chance of deliberately and irreversibly disempowering humanity. This is a much easier failure mode than killing everyone with destructive physical technologies. Catastrophically risky AI systems could plausibly exist soon, and there likely won’t be a strong consensus about this fact until such systems pose a meaningful existential risk per year. There is not necessarily any “fire alarm.” Even if there were consensus about a risk from powerful AI systems, there is a good chance that the world would respond in a totally unproductive way. It’s wishful thinking to look at possible stories of doom and say “we wouldn’t let that happen;” humanity is fully capable of messing up even very basic challenges, especially if they are novel.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23507549
info_outline
"Six Dimensions of Operational Adequacy in AGI Projects" by Eliezer Yudkowsky
06/21/2022
"Six Dimensions of Operational Adequacy in AGI Projects" by Eliezer Yudkowsky
by Editor's note: The following is a lightly edited copy of a document written by Eliezer Yudkowsky in November 2017. Since this is a snapshot of Eliezer’s thinking at a specific time, we’ve sprinkled reminders throughout that this is from 2017. A background note: It’s often the case that people are slow to abandon obsolete playbooks in response to a novel challenge. And AGI is certainly a very novel challenge. Italian general Luigi Cadorna offers a memorable historical example. In the Isonzo Offensive of World War I, Cadorna lost hundreds of thousands of men in futile frontal assaults against enemy trenches defended by barbed wire and machine guns. As morale plummeted and desertions became epidemic, Cadorna began executing his own soldiers en masse, in an attempt to cure the rest of their “cowardice.” The offensive continued for 2.5 years. Cadorna made many mistakes, but foremost among them was his refusal to recognize that this war was fundamentally unlike those that had come before. Modern weaponry had forced a paradigm shift, and Cadorna’s instincts were not merely miscalibrated—they were systematically broken. No number of small, incremental updates within his obsolete framework would be sufficient to meet the new challenge. Other examples of this type of mistake include the initial response of the record industry to iTunes and streaming; or, more seriously, the response of most Western governments to COVID-19.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23495318
info_outline
"Moses and the Class Struggle" by lsusr
06/21/2022
"Moses and the Class Struggle" by lsusr
"𝕿𝖆𝖐𝖊 𝖔𝖋𝖋 𝖞𝖔𝖚𝖗 𝖘𝖆𝖓𝖉𝖆𝖑𝖘. 𝕱𝖔𝖗 𝖞𝖔𝖚 𝖘𝖙𝖆𝖓𝖉 𝖔𝖓 𝖍𝖔𝖑𝖞 𝖌𝖗𝖔𝖚𝖓𝖉," said the bush. "No," said Moses. "Why not?" said the bush. "I am a Jew. If there's one thing I know about this universe it's that there's no such thing as God," said Moses. "You don't need to be certain I exist. It's a trivial case of Pascal's Wager," said the bush. "Who is Pascal?" said Moses. "It makes sense if you are beyond time, as I am," said the bush. "Mysterious answers are not answers," said Moses.
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23485109
info_outline
"Benign Boundary Violations" by Duncan Sabien
06/20/2022
"Benign Boundary Violations" by Duncan Sabien
Recently, my friend Eric asked me what sorts of things I wanted to have happen at my bachelor party. I said (among other things) that I'd really enjoy some benign boundary violations. Eric went ???? Subsequently: an essay. We use the word "boundary" to mean at least two things, when we're discussing people's personal boundaries. The first is their actual self-defined boundary—the line that they would draw, if they had perfect introspective access, which marks the transition point from "this is okay" to "this is no longer okay." Different people have different boundaries:
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23484980
info_outline
"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky
06/20/2022
"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky
Crossposted from the . May contain more technical jargon than usual. Preamble: (If you're already familiar with all basics and don't want any preamble, skip ahead to for technical difficulties of alignment proper.) I have several times failed to write up a well-organized list of reasons why AGI will kill you. People come in with different ideas about why AGI would be survivable, and want to hear different obviously key points addressed first. Some fraction of those people are loudly upset with me if the obviously most important points aren't addressed immediately, and I address different points first instead. Having failed to solve this problem in any good way, I now give up and solve it poorly with a poorly organized list of individual rants. I'm not particularly happy with this list; the alternative was publishing nothing, and publishing this seems marginally more . Three points about the general subject matter of discussion here, numbered so as not to conflict with the list of lethalities:
/episode/index/show/4e47933d-6859-4727-bfe3-0dffbb39e814/id/23484905