BerkeleyNLP @BerkeleyNLP
We work on natural language processing, machine learning, linguistics, and deep learning. PIs: Dan Klein, @alsuhr, @sewon__min nlp.cs.berkeley.edu Berkeley, California Joined September 2019-
Tweets125
-
Followers7K
-
Following37
-
Likes131
Really amazing results analyzing what's creative/novel vs. what's copied from Internet data, enabled by the amazing @liujc1998's Infini-gram! infini-gram.io This is also enabled in @allen_ai's OlmoTrace allenai.org/blog/olmotrace where anyone can find matching n-grams between LLM-generated text and its training data.
This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction.
1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces. We show that surprisingly RAG can improve reasoning— with the right corpus. Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026. 🔗 arxiv.org/abs/2605.03344 🧵
8/ For reproducibility and to enable further study of modularity in MoEs, we’re releasing EMO, baselines, and code: Models: hf.co/collections/al… Blog: allenai.org/blog/emo Code: github.com/allenai/EMO Viz: emovisualization.netlify.app Shoutout to @AkshitaB93 @sewon__min for making this possible!
Full details + results: arxiv.org/abs/2605.06663 Also check this out emovisualization.netlify.app: Our model specializes qualitatively differently (capability-level rather than lexical) -- this emerged naturally even though we didn't expose any domain prior!
As MoEs grow larger and sparser, they become memory-bottlenecked. What if experts were actually composable - so you only keep the subset relevant to your task? We show that this doesn't emerge in standard MoEs (their training makes this hard), but you can pre-train MoEs to support this kind of modularity! I hope everyone sees the right figure from @RyanYixiang 's original post - I was so excited when I saw this result!!
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵
I will give two talks at ICLR workshops!! 🇧🇷 Sunday 9:40-10:10: "LLMs for Distributed Data Use" @ Workshop on Data Problems in Foundation Models (Room 203 A/B) Monday 15:30–16:05 : "Are Mixture-of-Experts Modular? Why It Matters and How to Fix It" @ ICBINB Workshop (Room 201 C) Both happened to be related to MoEs, but tackle two completely different questions → some say hi!
Imagine you fully post-trained "YourModel v1". Then, you've got better data — math, code, tool use, safety — and you want to improve it. Today, that usually means retraining the whole model. But what if new data could be added modularly, with a fixed cost each time?
Last year, we introduced FlexOlmo, a novel way to train parts of a model independently then combine them later. BAR builds on that idea for a harder problem: how to keep improving a model without having to retrain each time. 🧵
Exciting results on open-source modes for IMO-level problems - congratulations to @aviral_kumar2 and everyone involved!! Great to see @wenjie_ma's ProofGrader (proofgrader.github.io) integrated into the development ✨
We trained a tiny 4B model to reason for millions of tokens through IMO-level problems. Heaps excited to share our new blog post covering the full pipeline, from distilling the 🐳 to augmenting RL with a reasoning cache that unlocks extreme inference-time scaling for theorem
Really excited about this work!! As a retrieval person, having a pre-training-scale retrieval index in an academic setting has long been a dream, and I thought it would be too difficult / infeasible. Collaborating with systems experts made it possible much earlier than I expected. Huge thanks to the students driving this: @YichuanM and @jinjianliuu !
(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we
(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!
✨Introducing ECHO, the newest in-the-wild image generation benchmark! You’ve seen new image models and new use cases discussed on social media, but old benchmarks don’t test them! We distilled this qualitative discussion into a structured benchmark. 🔗 echo-bench.github.io
Super excited about @wenjie_ma's work on verifying math proofs! ✅ 24 competitions, 3 SoTAs (o3, Gemini-2.5-Pro, R1) ✅ Strong evaluator -- a carefully designed evaluator with simple ensemble beats agentic ones ✅ Strong best-of-n performance Check out the paper & website!
LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key
LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key bottleneck: reliable proof evaluation. Without a good evaluator, we can't automatically evaluate or train better "provers." Our new work tackles this challenge step by step. 🧵 📄 Paper: arxiv.org/pdf/2510.13888
Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🧠🎉 How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach? 🌐 sites.google.com/berkeley.edu/p… 📅 Submit by June 23rd
Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @AnthropicAI Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: arxiv.org/abs/2505.07215
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: arxiv.org/abs/2505.07215
I'm incredibly excited to share that I'll be joining @TTIC_Connect as an assistant professor in Fall 2026! Until then, I'm wrapping up my PhD at Berkeley, and after that I'll be a faculty fellow at @NYUDataScience
Finished my dissertation!!! (scalable oversight,link below) Very fortunate to have @JacobSteinhardt and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) Thanks for developing my research mission, and teaching me magic
Akari Asai @AkariAsai
22K Followers 935 Following Incoming Assistant Professor @SCSatCMU (Hiring Ph.D. students for Fall 2026) & research scientist @allen_ai OLMo. akariasai @ 🦋
Sam Bowman @sleepinyourhat
65K Followers 3K Following AI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. Into @givingwhatwecan.
Delip Rao e/σ @deliprao
69K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
Yoav Artzi @yoavartzi
19K Followers 192 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / researcher @GoogleDeepMind / asso. faculty director @arxiv / building @COLM_conf
Danish Pruthi @danish037
13K Followers 756 Following Faculty at the Indian Institute of Science, Bangalore. PhD from @LTIatCMU.
Kayo Yin @kayo_yin
16K Followers 722 Following PhD student @berkeley_ai. AI persuasion, safety, sign language. Prev @carnegiemellon @polytechnique, intern @msftresearch @deepmind. 🇫🇷🇯🇵
Bill Yuchen Lin @billyuchenlin
27K Followers 3K Following RL for coding @xAI @SpaceX Affiliate Assistant Prof @UW. Ex: @allen_ai; Google, Meta FAIR.
Shruti Rijhwani @shrutirij
7K Followers 584 Following * Research Scientist @GoogleDeepMind * #NLProc research * PhD from CMU
Sewon Min @sewon__min
16K Followers 889 Following Assistant professor @Berkeley_EECS @berkeley_ai || Research scientist at @allen_ai || PhD from @uwcse @uwnlp
Ofir Press @OfirPress
18K Followers 8K Following I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
Naomi Saphra @nsaphra
11K Followers 1K Following Waiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account. Accepting ML/NLP PhD students.
Christopher Potts @ChrisGPotts
16K Followers 724 Following Stanford Professor of Linguistics and, by courtesy, of Computer Science. Member of technical staff @stanfordnlp and @StanfordAILab. Co-founder @ Bigspin AI.
Sebastian Ruder @seb_ruder
99K Followers 1K Following Research Scientist @AIatMeta MSL • Ex @Cohere @GoogleDeepMind
Mike Lewis @ml_perception
8K Followers 252 Following Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, attention sinks, kNN-LM, top-k sampling & Deal Or No Deal.
Shaily @shaily99
8K Followers 2K Following PhD @LTIatCMU. Prev: @allen_ai @GoogleAI @MSFTResearch. #NLProc. Often ranting about research.
Xin Eric Wang @xwang_lk
21K Followers 1K Following Professor @ucsantabarbara. Head of Research @SimularAI. Director @ucsbcrml @UCSB_AI. #Multimodal #AgenticAI. AI for Humanity in the long run.
Weijia Shi @WeijiaShi2
10K Followers 2K Following PhD student @uwnlp | Prev @allen_ai @MetaAI @CS_UCLA
Nils Reimers @Nils_Reimers
15K Followers 539 Following VP AI Search @Cohere | ex-huggingface | Creator of SBERT (https://t.co/MKKOMfuQ4C)
Leo Boytsov @srchvrs
9K Followers 2K Following Machine learning scientist and engineer speaking πtorch & C++. Past @LTIatCMU, @awscloud. Opinions sampled from MY OWN 100T param LM.
Freya Zinnia Yvonne @FreyaZinniaYvon
0 Followers 28 Following
Sabuhi A. @SAbbaszad70856
20 Followers 6K Following
nakazawa kazushi(中�... @nkzwkzs
739 Followers 3K Following 博士(工学)音声認識系の仕事をしています DNNベースで音声の品質を評価する研究していました IEEE Sendai YPとASJ若手フォーラムで活動してます
VJ Anand @VjAnand75860
0 Followers 32 Following
Huiyi Chen @kitty22hy
2 Followers 57 Following Second-year graduate student at the Southeast University &Monash University Joint Graduate School.
JY Z @JunYuZzzzz
17 Followers 2K Following
Sagnik Samanta @Sagnik_445
0 Followers 55 Following
jh dou @JhDou56716
2 Followers 32 Following
avnikapoor @avkap007
4 Followers 77 Following
MoMing(蓝V互关�... @LaoMo9394
1K Followers 2K Following AI Engineer in Silicon Valley | Geeker | 美股量化 | AI 前沿事件 欢迎互关 | FollowBack
haha haha @xuehaha16
0 Followers 33 Following
Chenyang @Ryenhails
0 Followers 28 Following
gela besiashvili @besiashvili
3 Followers 66 Following
Glory Bagai @IntentionalStar
18 Followers 711 Following
Jingxi Qiu @WilliamQ0628
1 Followers 55 Following Independent researcher on LLM reliability & evaluation. M.S. Georgetown DS. Working on selective RAG and benchmark integrity.
Anshuman @AnshumanAI
138 Followers 1K Following Autodidact | Generalist | Building https://t.co/t8g3em8oc7 & https://t.co/U5mRFdNXgB | Google Summer of Code 2025 @ Scala Center | याचना नहीं अब रण होगा arc |
גילי כהן @gilizzzz
2 Followers 71 Following
yh z @yuehan_zh
0 Followers 41 Following
Erman Eroğlu @geldeki
252 Followers 2K Following Mahallesiz/Klansız - Öteki (Yoğurdu Ekşi Olan), Software Developer (Namıdiğer Yazılımcı) - AI enthusiast 🤖
Karthik Narasimhan @karthik_r_n
5K Followers 467 Following Professor@PrincetonCS, ex @OpenAI, @SierraPlatform, @MIT_CSAIL, @iitmadras Work: GPT, ReAct, Tree-of-Thoughts, SWE-Bench/Agent, TAU-bench, GEO
Kausthubh Manda @MandaKausthubh
19 Followers 782 Following Engineering @ ThoughtSpot | Amazon ML Summer School'25 | IMtech'27 CSE@IIIT Bangalore | Machine Learning | Mathematics
James @jimblonie
13 Followers 755 Following
Bruno Acevedo @baan16
61 Followers 1K Following
Minh Ta @_thisisminh
6 Followers 72 Following PhD. NLP @ @mbzuai 🇦🇪 | BSc. Data Science & AI @ HUST 🇻🇳 Trustworthy LLM, Hallucination detection
Lan Aier @lanaier285000
0 Followers 17 Following
Sezer UĞUZ @SezerUguz
95 Followers 228 Following 🌊 PhD Student @ UWF 🐚 💻 Computer Scientist 🧑🏻💻 🎬 Short Filmmaker 🎥 ✍️ Writer 📖 🎶 Creative 🎨 Digital 🎭 Artist 📸
Yiderigun Borjigin @yiderigun_
4 Followers 99 Following PhD Student at Helmholtz-Zentrum Hereon | https://t.co/jceex2kRU1. Robotics, Cognition, Intelligence at TUM | Former Thesis Student at BMW
Andre Ye @andreiskiii
298 Followers 622 Following PhD student @MITEECS. AI interactions and models for thought. @PDSoros @nsf grfp fellow. ugrad @UWPhilosophy @UWCSE
shermineh @sherminehGhs
52 Followers 788 Following CS PhD @TMU | Intern @Microsoft | LLMs Reasoning, RLVR| Agentic AI
Caixia Yuan @CaixiaYuan
7 Followers 89 Following
vegnus @vegnus1
2 Followers 58 Following Developer interested in squeezing local models for every last drop of blood
fcktired @bazukichips
1 Followers 39 Following
Chris Lee @Chris_Lee_UC
0 Followers 55 Following
Tarun Malempati @TarunMalempati
2 Followers 518 Following
Theotime Bakunzi @TBakunzi25704
0 Followers 5 Following
Percy Liang @percyliang
107K Followers 426 Following professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
Jacob Andreas @jacobandreas
24K Followers 947 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL / @NLP_MIT (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJw
Kayo Yin @kayo_yin
16K Followers 722 Following PhD student @berkeley_ai. AI persuasion, safety, sign language. Prev @carnegiemellon @polytechnique, intern @msftresearch @deepmind. 🇫🇷🇯🇵
Sewon Min @sewon__min
16K Followers 889 Following Assistant professor @Berkeley_EECS @berkeley_ai || Research scientist at @allen_ai || PhD from @uwcse @uwnlp
Stanford NLP Group @stanfordnlp
187K Followers 347 Following Natural Language Processing/Machine Learning @chrmanning @jurafsky @percyliang @ChrisGPotts @tatsu_hashimoto @MonicaSLam @Diyi_Yang @YejinChoinka @StanfordAILab
Ezgi Korkmaz @EzgiKorkmazAI
3K Followers 0 Following Machine Learning Researcher, PhD in Machine Learning. Reinforcement Learning. Been at @UCL | @GoogleDeepmind | @UCBerkeley
Prasann Singhal @prasann_singhal
325 Followers 780 Following 1st-year #NLProc PhD at UC Berkeley working with @sewon__min / @JacobSteinhardt , formerly advised by @gregd_nlp
Ryan Yixiang Wang @RyanYixiang
376 Followers 526 Following Phding at @berkeleynlp with @sewon__min | prev @nlp_usc
Wenjie Ma @wenjie_ma
342 Followers 266 Following PhD student @BerkeleySky @BerkeleyNLP @Berkeley_AI.
Alane Suhr @alsuhr
2K Followers 640 Following (assistant) professor of computer science at UC Berkeley. Mostly gone to better places :) fiat veritas, et pereat mundus 🌻
Eve Fleisig @enfleisig
747 Followers 458 Following PhD student @Berkeley_EECS | Princeton ‘21 | NLP, AI ethics + equity, and sociolinguistics enthusiast | bilingüe 🇦🇷
Zineng Tang @ZinengTang
2K Followers 576 Following PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.
Jiayi Pan @jiayi_pirate
14K Followers 2K Following Research | Prev @xAI @Berkeley_AI | Views Are My Own
Alexander Wan @alexwan55
705 Followers 1K Following Studying CS at Berkeley; doing research on evals & public policy at Stanford CRFM; @BerkeleyML @BerkeleyNLP; https://t.co/YqhKUqpSBW
Sanjay Subramanian @sanjayssub
913 Followers 589 Following Building/analyzing NLP and vision models. PhD student @berkeley_ai. Formerly: @allen_ai, @penn
Arnav Gudibande @arnavg_
589 Followers 337 Following Research Engineer @perplexity_ai | prev MS @berkeley_ai @berkeleyNLP
Charlie Snell @sea_snell
8K Followers 6K Following PhD student @berkeley_ai; research @cursor_ai; prev @GoogleDeepMind. My friend told me to tweet more. I stare at my computer a lot and make things
Jane Wakefield @janewakefield
7K Followers 1K Following I write about tech and have done for two decades. I also make pods, speak at conferences, and offer media training and consultancy under my 🍌🦔 brand.
Kevin Yang @kevinyang41
495 Followers 203 Following Research scientist at @scaledcognition, previously PhD at @BerkeleyNLP, interested in better control and factuality for LLM outputs especially for long context.
Kevin Lin 林冠言 @nlpkevinl
497 Followers 283 Following research @Letta_AI @berkeleynlp @ucbrise @ai2_allennlp
Daniel Fried @dan_fried
4K Followers 908 Following Assistant prof. @LTIatCMU @SCSatCMU. Working on NLP: LLM agents, language-to-code, applied pragmatics, grounding.
Jonathan K. Kummerfel... @jkkummerfeld
2K Followers 393 Following NLP faculty - University of Sydney he/him (this account is for professional topics only)
David Hall @dlwh
4K Followers 1K Following Member of Technical Staff @ Open Athena. Creator of Levanter and Marin. Previously Research Engineering @StanfordCRFM, co-founder at Semantic Machines ⟶ MSFT.
Ruiqi Zhong @ZhongRuiqi
7K Followers 763 Following Member of Technical Staff at Thinking Machines. Human+AI collaboration. Scalable Oversight. Explainability. Prev @AnthropicAI PhD UC Berkeley'25; Columbia'19
Rodolfo (Rudy) Corona @_rodolfocorona_
301 Followers 498 Following PhD student at @berkeley_ai and @BerkeleyNLP| Interested in language, embodiment, abstraction, and compositionality | 🇲🇽
Berkeley AI Research @berkeley_ai
273K Followers 458 Following We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.
Lucy Li @lucy3_li
6K Followers 2K Following Postdoc @uwnlp. Incoming assistant prof @WisconsinCS. Prev @UCBerkeley, @allen_ai, @MSFTResearch, @stanfordnlp. More silly at https://t.co/rtSSUhWQnL.
Taylor Berg-Kirkpatri... @BergKirkpatrick
735 Followers 318 Following Assoc Prof at UC San Diego @ucsd_cse, AI researcher
Mohit Bansal @mohitban47
12K Followers 746 Following Parker Distinguished Prof @UNC. PECASE/ACL/AAAI Fellow. Director https://t.co/5qlPVgnrlN (@unc_ai_group). Past @Berkeley_AI @TTIC_Connect @IITKanpur #NLP #CV
Greg Durrett @gregd_nlp
8K Followers 912 Following Associate professor at NYU (Courant CS + Center for Data Science) | advisor for @bespokelabsai | large language models and NLP | he/him
Jason Eisner @adveisner
8K Followers 570 Following Professor of CS at Johns Hopkins University, ACL Fellow. My tweets speak only for me.




























