News
Introducing and Deprecating WoFBench — LessWrong
2+ hour, 25+ min ago (217+ words) Benchmarks are important tools for tracking the rapid advancements in model capabilities, but they are struggling to keep up with LLM progress: frontier models now consistently achieve high scores on many popular benchmarks, raising questions about their continued ability to…...
I'm Bearish On Personas For ASI Safety — LessWrong
4+ hour, 22+ min ago (1637+ words) Your base LLM has no examples of superintelligent AI in its training data. When you RL it into superintelligence, it will have to extrapolate to how a superintelligent Claude would behave. The LLM's extrapolation may not converge optimizing for what…...
Tools to generate realistic prompts help surprisingly little with Petri audit realism — LessWrong
12+ hour, 27+ min ago (554+ words) Research done as part of the Anthropic Fellows Program. In this post we: We want to produce generator models that take as input a text description of a user prompt, and output a prompt that 1) matches this description and 2) is…...
My personal apology to Eliezer Yudkowsky for not working on ASI risk in 2022 itself — LessWrong
12+ hour, 35+ min ago (475+ words) Lesswrong disclaimer This is a link post to a living document. What's below may be an older version. Click link for latest version. 2026-03-01 My personal apology to Eliezer Yudkowsky for not working on ASI risk in 2022 itself Disclaimer Quick Note…...
Petapixel cameras won't exist soon — LessWrong
13+ hour, 5+ min ago (1379+ words) Lesswrong disclaimer This is a link post to a living document. What's below may be an older version. Click link for latest version. Also, I increasingly find it a waste of time to discuss such ideas on lesswrong, so don't…...
"Fibbers’ forecasts are worthless" (The D-Squared Digest One Minute MBA – Avoiding Projects Pursued By Morons 101) — LessWrong
22+ hour, 38+ min ago (398+ words) One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline! But if you're acting as a manager…...
"Fibbers’ forecasts are worthless" — LessWrong
22+ hour, 38+ min ago (398+ words) One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline! But if you're acting as a manager…...
Mindscapes and Mind Palaces — LessWrong
1+ day, 16+ hour ago (652+ words) To summarize the information I've found so far: I would like to learn more about Mindscapes, so if anyone has some good resources, or even better if you could share your own experience, I would appreciate it a lot. In…...
The Topology of LLM Behavior — LessWrong
1+ day, 20+ hour ago (777+ words) When you're having a conversation with an LLM, there's a state: everything that's been said so far. I think of this as a point in some kind of semantic space. What shapes those probabilities? The way I visualize it: the…...
Coherent Care — LessWrong
1+ day, 22+ hour ago (1735+ words) I've been trying to gather my thoughts for my next tiling theorem (agenda write-up here; first paper; second paper; recent project update). I have a lot of ideas for how to improve upon my work so far, and trying to…...