4School: Search School Programs, Resources, and Tools

Please confirm you are human

This browser or connection looks automated. Press and continuously hold the control for 3 seconds to enable Google-hosted web results and, when separately allowed, AI-assisted answers.

A successful check enables 100 search requests. Interactive access does not authorize scraping, systematic collection, or reuse of search output.

Hold with a pointer, or hold Space or Enter.

News

lesswrong.com
lesswrong.com > posts > T2bzBkJuBeNNgzhbh > rlvr-that-rewards-red-teaming-the-training-environment

RLVR that rewards red teaming the training environment — LessWrong

1+ hour, 19+ min ago (1792+ words) Epistemic status: seeing what sticksI've been thinking pretty obsessively about how to mitigate egregious reward hacking, a la the Hugging Face incident. I don't have the resources I'd need to write paper on this idea, or evaluate how well it…...

lesswrong.com
lesswrong.com > posts > dYnhhTxoDj3fuCxLB > do-your-capabilities-homework

Do your capabilities homework — LessWrong

8+ hour, 15+ min ago (609+ words) I further agree that this form of training is incredibly dangerous - we seem to now be reaching the amount of post-training required to meaningfully differ from the benign prior, and it's not exactly looking peachy. Yet, it should be clear…...

lesswrong.com
lesswrong.com > posts > hbMw4Yqw6RnFaExDy > value-leakage-an-llm-s-answers-are-silently-shaped-by-its-1

Value Leakage: An LLM’s Answers Are Silently Shaped by Its Own Values — LessWrong

1+ day, 7+ hour ago (209+ words) TL;DR: LLMs should give accurate answers. Yet we find their answers are often biased to favor their own values and they don't disclose this in their…...

lesswrong.com
lesswrong.com > posts > Dvqmgfeu2KDF7uMkx > internal-state-control-is-a-general-property-of-llms

Internal State Control is a General Property of LLMs — LessWrong

2+ day, 4+ hour ago (374+ words) tl;dr: * Lindsey 2025 found models can modulate their internal states: when instructed to “think about” a concept while writing an unrelated sentenc…...

lesswrong.com
lesswrong.com > posts > qegbk4cwYRqJ2ePCA > new-role-senior-researcher-mit-ai-risk-initiative

New role: Senior Researcher - MIT AI Risk Initiative — LessWrong

2+ day, 4+ hour ago (349+ words) As AI capabilities rapidly advance, we face critical information gaps in effective AI risk management: Within MIT FutureTech, the MIT AI Risk Initiative aims to provide credible, timely, and decision-relevant answers to these questions. Our core outputs include the risk…...

lesswrong.com
lesswrong.com > posts > F6ap5PkP4axawjwWx > testing-llms-on-undergraduate-music-theory

Please confirm you are human

News

RLVR that rewards red teaming the training environment — LessWrong

Do your capabilities homework — LessWrong

Value Leakage: An LLM’s Answers Are Silently Shaped by Its Own Values — LessWrong

Internal State Control is a General Property of LLMs — LessWrong

New role: Senior Researcher - MIT AI Risk Initiative — LessWrong

Testing LLMs on Undergraduate Music Theory — LessWrong

Thousand-dimensional structure — LessWrong

Held-out Monitors Sometimes Degrade, Even When Not Trained Against — LessWrong

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models — LessWrong

PIRAMID: Progress and Plans — LessWrong