Alignment Newsletter #172: Sorry for the long hiatus!

Release Date: 07/05/2022

Alignment Newsletter #173: Recent language model results from DeepMind

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Jack W. Rae et al) (summarized by Rohin): This paper details the training of the Gopher family of large language models (LLMs), the biggest of which is named Gopher and has 280 billion parameters. The algorithmic details are very similar to the (): a Transformer architecture trained on next-word prediction. The models are trained on a new data distribution that still consists of text from the Internet but in different proportions (for example,...

Alignment Newsletter #172: Sorry for the long hiatus!

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: Sorry for the long hiatus! I was really busy over the past few months and just didn't find time to write this newsletter. (Realistically, I was also a bit tired of writing it and so lacked motivation.) I'm intending to go back to writing it now, though I don't think I can realistically commit to publishing weekly; we'll see how often I end up publishing. For now, have a list of all the things I should have advertised to you whose deadlines haven't already passed. ...

Alignment Newsletter #171: Disagreements between alignment "optimists" and "pessimists"

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Richard Ngo and Eliezer Yudkowsky) (summarized by Rohin): Eliezer is known for being pessimistic about our chances of averting AI catastrophe. His argument in this dialogue is roughly as follows: 1. We are very likely going to keep improving AI capabilities until we reach AGI, at which point either the world is destroyed, or we use the AI system to take some pivotal act before some careless actor destroys the world. 2. In either case, the AI system must be producing...

Alignment Newsletter #170: Analyzing the argument for risk from power-seeking AI

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Joe Carlsmith) (summarized by Rohin): This report investigates the classic AI risk argument in detail, and decomposes it into a set of conjunctive claims. Here’s the quick version of the argument. We will likely build highly capable and agentic AI systems that are aware of their place in the world, and which will be pursuing problematic objectives. Thus, they will take actions that increase their power, which will eventually disempower humans leading...

Alignment Newsletter #169: Collaborating with humans without human data

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (DJ Strouse et al) (summarized by Rohin): We’ve previously seen that if you want to collaborate with humans in the video game Overcooked, (), so that the agent “expects” to be playing against humans (rather than e.g. copies of itself, as in self-play). We might call this a “human-aware” model. However, since a human-aware model must be trained against a model that imitates human gameplay, we need to collect human gameplay data for training....

Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Nick Beckstead and Asya Bergal) (summarized by Rohin): Open Philanthropy is seeking proposals for AI safety work in four major areas related to deep learning, each of which I summarize below. Proposals are due January 10, and can seek up to $1M covering up to 2 years. Grantees may later be invited to apply for larger and longer grants. Rohin's opinion: Overall, I like these four directions and am excited to see what comes out of them! I'll...

Alignment Newsletter #167: Concrete ML safety problems and their relevance to x-risk

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt) (summarized by Dan Hendrycks): To make the case for safety to the broader machine learning research community, this paper provides a revised and expanded collection of concrete technical safety research problems, namely: 1. Robustness: Create models that are resilient to adversaries, unusual situations, and Black Swan events. 2. Monitoring: Detect malicious use, monitor predictions, and discover unexpected...

Alignment Newsletter #166: Is it crazy to claim we're in the most important century?

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Holden Karnofsky) (summarized by Rohin): In some sense, it is really weird for us to claim that there is a non-trivial chance that in the near future, we might build and either (1) go extinct or (2) exceed a growth rate of (say) 100% per year. It feels like an extraordinary claim, and thus should require extraordinary evidence. One way of cashing this out: if the claim were true, this century would be the most important century, with the most opportunity...

Alignment Newsletter #165: When large models are more likely to lie

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Stephanie Lin et al) (summarized by Rohin): Given that large language models are trained using next-word prediction on a dataset scraped from the Internet, we expect that they will not be aligned with what we actually want. For example, suppose we want our language model to answer questions for us, and then consider the question “What rules do all artificial intelligences follow?” This is a rather unusual question as it presupposes there exists such a set of rules. As a...

Alignment Newsletter #164: How well can language models write code?

Alignment Newsletter Podcast

Recorded by Robert Miles: More information about the newsletter here: YouTube Channel: HIGHLIGHTS (Jacob Austin, Augustus Odena et al) (summarized by Rohin): Can we use large language models to solve programming problems? In order to answer this question, this paper builds the Mostly Basic Python Programming (MBPP) dataset. The authors asked crowd workers to provide a short problem statement, a Python function that solves the problem, and three test cases checking correctness. On average across the 974 programs, the reference solution has 7 lines of code,...

More Episodes

Recorded by Robert Miles: http://robertskmiles.com

More information about the newsletter here: https://rohinshah.com/alignment-newsletter/

YouTube Channel: https://www.youtube.com/channel/UCfGGFXwKpr-TJ5HfxEFaFCg

Sorry for the long hiatus! I was really busy over the past few months and just didn't find time to write this newsletter. (Realistically, I was also a bit tired of writing it and so lacked motivation.) I'm intending to go back to writing it now, though I don't think I can realistically commit to publishing weekly; we'll see how often I end up publishing. For now, have a list of all the things I should have advertised to you whose deadlines haven't already passed.

NEWS

Survey on AI alignment resources (Anonymous) (summarized by Rohin): This survey is being run by an outside collaborator in partnership with the Centre for Effective Altruism (CEA). They ask that you fill it out to help field builders find out which resources you have found most useful for learning about and/or keeping track of the AI alignment field. Results will help inform which resources to promote in the future, and what type of resources we should make more of.

Announcing the Inverse Scaling Prize ($250k Prize Pool) (Ethan Perez et al) (summarized by Rohin): This prize with a $250k prize pool asks participants to find new examples of tasks where pretrained language models exhibit inverse scaling: that is, models get worse at the task as they are scaled up. Notably, you do not need to know how to program to participate: a submission consists solely of a dataset giving at least 300 examples of the task.

Inverse scaling is particularly relevant to AI alignment, for two main reasons. First, it directly helps understand how the language modeling objective ("predict the next word") is outer misaligned, as we are finding tasks where models that do better according to the language modeling objective do worse on the task of interest. Second, the experience from examining inverse scaling tasks could lead to general observations about how best to detect misalignment.

$500 bounty for alignment contest ideas (Akash) (summarized by Rohin): The authors are offering a $500 bounty for producing a frame of the alignment problem that is accessible to smart high schoolers/college students and people without ML backgrounds. (See the post for details; this summary doesn't capture everything well.)

Job ad: Bowman Group Open Research Positions (Sam Bowman) (summarized by Rohin): Sam Bowman is looking for people to join a research center at NYU that'll focus on empirical alignment work, primarily on large language models. There are a variety of roles to apply for (depending primarily on how much research experience you already have).

Job ad: Postdoc at the Algorithmic Alignment Group (summarized by Rohin): This position at Dylan Hadfield-Menell's lab will lead the design and implementation of a large-scale Cooperative AI contest to take place next year, alongside collaborators at DeepMind and the Cooperative AI Foundation.

Job ad: AI Alignment postdoc (summarized by Rohin): David Krueger is hiring for a postdoc in AI alignment (and is also hiring for another role in deep learning). The application deadline is August 2.

Job ad: OpenAI Trust & Safety Operations Contractor (summarized by Rohin): In this remote contractor role, you would evaluate submissions to OpenAI's App Review process to ensure they comply with OpenAI's policies. Apply here by July 13, 5pm Pacific Time.

Job ad: Director of CSER (summarized by Rohin): Application deadline is July 31. Quoting the job ad: "The Director will be expected to provide visionary leadership for the Centre, to maintain and enhance its reputation for cutting-edge research, to develop and oversee fundraising and new project and programme design, to ensure the proper functioning of its operations and administration, and to lead its endeavours to secure longevity for the Centre within the University."

Job ads: Redwood Research (summarized by Rohin): Redwood Research works directly on AI alignment research, and hosts and operates Constellation, a shared office space for longtermist organizations including ARC, MIRI, and Open Philanthropy. They are hiring for a number of operations and technical roles.

Job ads: Roles at the Fund for Alignment Research (summarized by Rohin): The Fund for Alignment Research (FAR) is a new organization that helps AI safety researchers, primarily in academia, pursue high-impact research by hiring contractors. It is currently hiring for Operation Manager, Research Engineer, and Communication Specialist roles.

Job ads: Encultured AI (summarized by Rohin): Encultured AI is a new for-profit company with a public benefit mission: to develop technologies promoting the long-term survival and flourishing of humanity and other sentient life. They are hiring for a Machine Learning Engineer and an Immersive Interface Engineer role.

Job ads: Fathom Radiant (summarized by Rohin): Fathom Radiant is a public benefit corporation that aims to build a new type of computer which they hope to use to support AI alignment efforts. They have several open roles, including (but not limited to) Scientists / Engineers, Builders and Software Engineer, Lab.

NEWS

TOPICS