News

lesswrong.com
lesswrong.com > posts > YshqDtyzgWaJxthTo > introducing-and-deprecating-wofbench

Introducing and Deprecating WoFBench — LessWrong

2+ hour, 25+ min ago  (217+ words) Benchmarks are important tools for tracking the rapid advancements in model capabilities, but they are struggling to keep up with LLM progress: frontier models now consistently achieve high scores on many popular benchmarks, raising questions about their continued ability to…...

lesswrong.com
lesswrong.com > posts > fMgE3E54PdDcZhvm6 > i-m-bearish-on-personas-for-asi-safety

I'm Bearish On Personas For ASI Safety — LessWrong

4+ hour, 22+ min ago  (1637+ words) Your base LLM has no examples of superintelligent AI in its training data. When you RL it into superintelligence, it will have to extrapolate to how a superintelligent Claude would behave. The LLM's extrapolation may not converge optimizing for what…...

lesswrong.com
lesswrong.com > posts > jdoDvKGLbaaJWnDpE > tools-to-generate-realistic-prompts-help-surprisingly-little

Tools to generate realistic prompts help surprisingly little with Petri audit realism — LessWrong

12+ hour, 27+ min ago  (554+ words) Research done as part of the Anthropic Fellows Program. In this post we: We want to produce generator models that take as input a text description of a user prompt, and output a prompt that 1) matches this description and 2) is…...

lesswrong.com
lesswrong.com > posts > CpiF7QBvcs2fiysfD > my-personal-apology-to-eliezer-yudkowsky-for-not-working-on

My personal apology to Eliezer Yudkowsky for not working on ASI risk in 2022 itself — LessWrong

12+ hour, 35+ min ago  (475+ words) Lesswrong disclaimer This is a link post to a living document. What's below may be an older version. Click link for latest version. 2026-03-01 My personal apology to Eliezer Yudkowsky for not working on ASI risk in 2022 itself Disclaimer Quick Note…...

lesswrong.com
lesswrong.com > posts > SzjfsWe9bfHSX8fRg > petapixel-cameras-won-t-exist-soon

Petapixel cameras won't exist soon — LessWrong

13+ hour, 5+ min ago  (1379+ words) Lesswrong disclaimer This is a link post to a living document. What's below may be an older version. Click link for latest version. Also, I increasingly find it a waste of time to discuss such ideas on lesswrong, so don't…...

lesswrong.com
lesswrong.com > posts > cXDY9XBm5Wxzort29 > fibbers-forecasts-are-worthless-the-d-squared-digest-one

"Fibbers’ forecasts are worthless" (The D-Squared Digest One Minute MBA – Avoiding Projects Pursued By Morons 101) — LessWrong

22+ hour, 38+ min ago  (398+ words) One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline! But if you're acting as a manager…...

lesswrong.com
lesswrong.com > posts > cXDY9XBm5Wxzort29 > fibbers-forecasts-are-worthless

"Fibbers’ forecasts are worthless" — LessWrong

22+ hour, 38+ min ago  (398+ words) One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline! But if you're acting as a manager…...

lesswrong.com
lesswrong.com > posts > q6tHknvLyGX5g8PaC > mindscapes-and-mind-palaces

Mindscapes and Mind Palaces — LessWrong

1+ day, 16+ hour ago  (652+ words) To summarize the information I've found so far: I would like to learn more about Mindscapes, so if anyone has some good resources, or even better if you could share your own experience, I would appreciate it a lot. In…...

lesswrong.com
lesswrong.com > posts > iPmqM4qn7YnktcSus > the-topology-of-llm-behavior-1

The Topology of LLM Behavior — LessWrong

1+ day, 20+ hour ago  (777+ words) When you're having a conversation with an LLM, there's a state: everything that's been said so far. I think of this as a point in some kind of semantic space. What shapes those probabilities? The way I visualize it: the…...

lesswrong.com
lesswrong.com > posts > CDkbYSFTwggGE8mWp > coherent-care

Coherent Care — LessWrong

1+ day, 22+ hour ago  (1735+ words) I've been trying to gather my thoughts for my next tiling theorem (agenda write-up here; first paper; second paper; recent project update). I have a lot of ideas for how to improve upon my work so far, and trying to…...