Search Results

News

lesswrong.com
lesswrong.com > posts > BewnGEzPoaiEKEpfu > evidence-that-would-update-me-towards-a-software-only-fast

Evidence that would update me towards a software-only fast takeoff — LessWrong

1+ hour, 8+ min ago (580+ words) In a software-only takeoff, AIs improve AI-related software at an increasing speed, leading to superintelligent AI. The plausibility of this scenario is relevant to questions like: Knowing when and how much I expect to learn about the likelihood of such…...

lesswrong.com
lesswrong.com > posts > kpTHHgztNeC6WycJs > everybody-wants-to-rule-the-future

Everybody Wants to Rule the Future — LessWrong

2+ hour, 34+ min ago (1051+ words) Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears) Because I am going to talk a lot about expected value I want to be clear that I am…...

lesswrong.com
lesswrong.com > posts > kpTHHgztNeC6WycJs > everybody-wants-to-rule-the-future-is-longtermism-s-mandate

Everybody Wants to Rule the Future - Is Longtermism's Mandate of Heaven by Arithmetic Justified? — LessWrong

lesswrong.com
lesswrong.com > posts > ZeWewFEefCtx4Rj3G > pretraining-on-aligned-ai-data-dramatically-reduces

Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training — LessWrong

Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training — LessWrong 4+ hour, 41+ min ago (712+ words) TL;DR: A new paper shows that pretraining language models on data about AI behaving well dramatically reduces misaligned behavior, and this effect persists through post-training. The major labs appear to be taking notice. It's now the third paper on…...

lesswrong.com
lesswrong.com > posts > rgviB6pAu3g5Jvwzz > could-llm-alignment-research-reduce-x-risk-if-the-first

Could LLM alignment research reduce x-risk if the first takeover-capable AI is not an LLM? — LessWrong

Could LLM alignment research reduce x-risk if the first takeover-capable AI is not an LLM? — LessWrong 7+ hour, 57+ min ago (1057+ words) Many people believe that the first AI capable of taking over would be quite different from the LLMs of today. Suppose this is true'does prosaic alignment research on LLMs still reduce x-risk? I believe advances in LLM alignment research reduce…...

lesswrong.com
lesswrong.com > posts > ne5toFQnSz5BXmfFn > agi-both-does-and-doesn-t-have-an-infinite-time-horizon

AGI both does and doesn't have an infinite time horizon — LessWrong

9+ hour, 9+ min ago (803+ words) I've recently spent some time looking at the new AI Futures Timelines models. Playing around with their parameters and looking at their write-up, it becomes clear very quickly that the most important parameter in the model is the one labelled…...

lesswrong.com
lesswrong.com > posts > aHioEbJYd8vbrbu2r > desiderata-of-good-problems-to-hand-off-to-ais

Desiderata of good problems to hand off to AIs — LessWrong

Desiderata of good problems to hand off to AIs — LessWrong 9+ hour, 11+ min ago (527+ words) Many technical AI safety plans involve building automated alignment researchers to improve our ability to solve the alignment problem. Safety plans from AI labs revolve around this as a first line of defence (e.g. OpenAI, DeepMind, Anthropic); research directions outside labs…...

lesswrong.com
lesswrong.com > posts > pzRG3nNCAAr6KkGga > the-example

The Example — LessWrong

The Example — LessWrong 10+ hour, 39+ min ago (1159+ words) My work happens to consist of two things: writing code and doing math. That means that periodically I produce a very abstract thing, and then observe reality agree with its predictions. While satisfying, it has a common adverse effect of…...

lesswrong.com
lesswrong.com > posts > jgPydSuZFum3CExJQ > silent-agreement-evaluation

Silent Agreement Evaluation — LessWrong

16+ hour, 55+ min ago (1139+ words) Can two instances of the same model, without communicating, independently choose the same option from a pair? This is the simplest possible test of "Schelling coordination"the ability to converge on a shared choice without explicit communication. Curiously, GPT-4.1 Nanolikely…...

lesswrong.com
lesswrong.com > posts > 4p2HBMxCkh7pZ3xCa > vlas-as-model-organisms-for-ai-safety

VLAs as Model Organisms for AI Safety — LessWrong

VLAs as Model Organisms for AI Safety — LessWrong 1+ day, 3+ hour ago (1290+ words) I spent six weeks training a humanoid robot to do household tasks. Along the way, my research lead and I started noticing things about the particular failure modes of the robot that seemed to indicate some strange architectural vulnerabilities of…...