News
Evidence that would update me towards a software-only fast takeoff — LessWrong
1+ hour, 8+ min ago (580+ words) In a software-only takeoff, AIs improve AI-related software at an increasing speed, leading to superintelligent AI. The plausibility of this scenario is relevant to questions like: Knowing when and how much I expect to learn about the likelihood of such…...
Everybody Wants to Rule the Future — LessWrong
2+ hour, 34+ min ago (1051+ words) Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears) Because I am going to talk a lot about expected value I want to be clear that I am…...
Everybody Wants to Rule the Future - Is Longtermism's Mandate of Heaven by Arithmetic Justified? — LessWrong
2+ hour, 34+ min ago (1051+ words) Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears) Because I am going to talk a lot about expected value I want to be clear that I am…...
Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training — LessWrong
4+ hour, 41+ min ago (712+ words) TL;DR: A new paper shows that pretraining language models on data about AI behaving well dramatically reduces misaligned behavior, and this effect persists through post-training. The major labs appear to be taking notice. It's now the third paper on…...
Could LLM alignment research reduce x-risk if the first takeover-capable AI is not an LLM? — LessWrong
7+ hour, 57+ min ago (1057+ words) Many people believe that the first AI capable of taking over would be quite different from the LLMs of today. Suppose this is true'does prosaic alignment research on LLMs still reduce x-risk? I believe advances in LLM alignment research reduce…...
AGI both does and doesn't have an infinite time horizon — LessWrong
9+ hour, 9+ min ago (803+ words) I've recently spent some time looking at the new AI Futures Timelines models. Playing around with their parameters and looking at their write-up, it becomes clear very quickly that the most important parameter in the model is the one labelled…...
Desiderata of good problems to hand off to AIs — LessWrong
9+ hour, 11+ min ago (527+ words) Many technical AI safety plans involve building automated alignment researchers to improve our ability to solve the alignment problem. Safety plans from AI labs revolve around this as a first line of defence (e.g. OpenAI, DeepMind, Anthropic); research directions outside labs…...
The Example — LessWrong
10+ hour, 39+ min ago (1159+ words) My work happens to consist of two things: writing code and doing math. That means that periodically I produce a very abstract thing, and then observe reality agree with its predictions. While satisfying, it has a common adverse effect of…...
Silent Agreement Evaluation — LessWrong
16+ hour, 55+ min ago (1139+ words) Can two instances of the same model, without communicating, independently choose the same option from a pair? This is the simplest possible test of "Schelling coordination"the ability to converge on a shared choice without explicit communication. Curiously, GPT-4.1 Nanolikely…...
VLAs as Model Organisms for AI Safety — LessWrong
1+ day, 3+ hour ago (1290+ words) I spent six weeks training a humanoid robot to do household tasks. Along the way, my research lead and I started noticing things about the particular failure modes of the robot that seemed to indicate some strange architectural vulnerabilities of…...