We're a part-time, virtual research program that gives students and early career professionals an opportunity to work with professional AI safety researchers.sparai.orgJoined March 2024
Applications for the Generator Residency close on Monday EOD! Last chance to apply.
Fully funded, 6k stipend + travel + housing, 3 months with an extension, in-person in Berkeley. Probably the best path into AI safety for non-researcher roles.
📣 Only 3 days left to apply for Generator!
Apply by April 27, to join our inaugural cohort with advisers from AI Futures Project, BlueDot, Coefficient Giving, FAR. AI, Forethought, METR, RAND, and more!
generatorresidency.org
Excited to share our new paper! We looked at when reasoning LLMs 'knew' their final answer internally vs. when it was stated in chain-of-thought. Turns out these models can be performative depending on the task!
LLMs often reason “performatively” well after deciding on a final answer - something that CoT monitors are slow to catch.
Our new paper finds that:
- probes can help monitor for this
- it seems to track with task difficulty
- probes enable early CoT exit, saving tokens! (1/7)
In this work, we complement behavioral goal-directedness evals of LLM agents with a probing analysis of environment and plan representations, examining whether observed actions are consistent with models' internal beliefs, and how reasoning affects representations. Check it out!
When we say an AI agent is “goal-directed”, what do we actually mean?
In new work from Project Telos, we study this question by combining behavioural evaluation with analysis of internal representations in a language model agent navigating grid worlds.
1/
LawZero is accepting applications as part of the SPAR Spring 2026 program!
If you're interested in studying model awareness or emergent misalignment, you can learn more and apply here: sparai.org/projects/sp26/.
Applications are open until Jan 14, 2026.
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
Come work with me and @SPARexec to build an AI mech interp researcher to accelerate AI safety research.🧠🔬
In the last cohort, my mentees built AI agents that automatically find and refine explanations for SAE features (demo of what they built after only one month below). In this cohort, we want to push for agents that discover and explain full circuits.
Deadline is Jan 14th!⏳🗓️
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
📣 Only 2 days left to apply for this round of SPAR!
Apply by January 14 to join our largest round yet — 130+ projects with mentors from Google DeepMind, RAND, AI Security Institute, Apollo Research, SecureBio, Machine Intelligence Research Institute, and more!
Work on a part-time AI safety, AI policy, AI security, or biosecurity project. Open to students & professionals, prior research experience not required for all projects.
I'm mentoring a SPAR project on evaluating and refining alignment targets for LLMs (constitutions, model specs, etc.) this spring! Apply by January 14 to work with me or other SPAR mentors - project details/application link ⬇️:
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
Does training language models on AI safety literature make them more likely to scheme?
This is one of the research questions being explored in the upcoming round of @SPARexec. A few projects I'm excited about: 🧵
The NYU Center for Mind, Ethics, and Policy is seeking research fellows to contribute to upcoming reports on legal personhood and economic rights for digital minds. Please apply if you have interest in working with us!
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
I'm glad to mentor again for this round of SPAR, likely with @zhonghaohe! Together let's help human-AI coevolution go a little bit better :)
⬇️🧵Here's a collection of research ideas I'd be excited to mentor projects on. Feel free to pitch yours too!
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program!
This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.
36 Followers 697 FollowingConsidering always being wrong so I can be right once!
• LawAI Research Fellow, Legal Frontiers
• Org NUS Al Safety + SMU CS, Law & Politics
374 Followers 968 FollowingAstra Fellow at Constellation right now | PhD student at Warsaw University of Technology and NASK working on AI safety and generative models
32K Followers 6K FollowingHead of Applied AI & Verifiable Intelligence @StarkWareLtd.
STARK proofs, AI Safety, World Models & Physical AI / Robotics.
Math scale, Goodwill doesn't.
19K Followers 10K FollowingOn the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity. https://t.co/mMchI2d4pg Upskilling @StanfordOnline
264 Followers 2K Following"Never try to discourage thinking, for you are sure to succeed"
~ B. Russell
Data science student | Economics, analytic philosophy, math, logic & AI 💖
23 🎹
899 Followers 4K FollowingEmerging AI partners and paradigms @NVIDIA; Ex: Neuromorphic AI @Intel, TPUs @Google; Strong opinions, loosely held are mine. Aiming for a stable singularity.
3K Followers 1K FollowingWhatever is achieved is not final; whatever we call fulfillment is a description from inside one form of life, not an endpoint for all forms of life.
1K Followers 1K FollowingAssociate Prof @ucl - Member of @ELLISforEurope | Language and AI Science | Prev. senior research scientist @AISafetyInst, postdoc @ETH_en, PhD @illc_amsterdam
2K Followers 2K FollowingOpen-source interpretability to seize the means of prediction. Postdoc w/ @davidbau @ndif_team @Northeastern. Prev: @GroNLP, @amazonscience
4K Followers 26 FollowingWe advance the science of forecasting to improve decision-making on high stakes issues. Co-founded by chief scientist Philip Tetlock.
5K Followers 3K FollowingAssociate Prof @MITEECS working on value (mis)alignment in AI systems; Safety & Alignment Advisor at https://t.co/vt2gVrVr9f; @[email protected]; he/him
2K Followers 297 FollowingCo-Founder & Senior Policy Advisor @ Secure AI Project. MA student at @GeorgetownCSS. Prev: @CSETgeorgetown. Creator of the beet emoji
5K Followers 959 FollowingDirector of CeSIA, OECD AI Expert
Co-lead, Global Call for AI Red Lines, featured in 300 media mentions, including Le Monde, NYT, NBC
Co-author, AI Safety Atlas
3K Followers 534 FollowingIndependent think tank focusing on transforming resilience to extreme AI and biological risks - both in the UK and internationally.
5K Followers 99 FollowingCross-posting only; contact me at [email protected] or https://t.co/WQQtBrKQps
Vegan, 10% of my income pledged to effective charities (GWWC)
11K Followers 819 FollowingThinking about AI destroying the world at https://t.co/pMilDvdCnI and everything at https://t.co/bankaOA2Gu. DM or email for media requests.
37K Followers 836 FollowingExplaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord.
Music, movies, microcode, and high-speed pizza delivery
16K Followers 491 FollowingHelping the world prepare for extremely powerful AI. Risk assessment @METR_evals. Writing at Planned Obsolescence (about AI), Good Bones (about whatever).