Evan Miller @EvMill
Statistically inclined software developer, occasional blogger about math + stats stuff. Working @AnthropicAI evanmiller.org NYC Joined May 2009-
Tweets1K
-
Followers5K
-
Following215
-
Likes265
🚀 New on the Klaviyo Data Science Podcast: @EvMill joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations. AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now: bit.ly/3CV0CmX #AI #DataScience
We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.
This paper on the statistics of evals is great (and seems to be flying under the radar): arxiv.org/abs/2411.00640… The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how to do power analysis. Back when @jeremy_scheurer and I wrote the "We need a Science of Evals" post (apolloresearch.ai/blog/we-need-a…) this paper is exactly the kind of thing we had in mind and more.
Awesome new research by my friend and colleague @EvMill — adding error bars to evals! Always great to see the Central Limit Theorem!
New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…
I cannot agree with this more. Please use basic research methods on AI benchmarking!
New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…
New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…
Machines of Loving Grace: my essay on how AI could transform the world for the better darioamodei.com/machines-of-lo…
New sequential A/B test from @Zalando based on the Lévy inequality – check it out! arxiv.org/abs/2406.16523…
I think I've finally cracked quantiles… A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on! arxiv.org/abs/2401.10233
Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000 years, we can finally read the scrolls: This image was produced by @Youssef_M_Nader, @LukeFarritor, and @JuliSchillij, who have now won the Vesuvius Challenge Grand Prize of $700,000. Congratulations!! These fifteen columns come from the very end of the first scroll we have been able to read and contain new text from the ancient world that has never been seen before. The author – probably Epicurean philosopher Philodemus – writes here about music, food, and how to enjoy life's pleasures. In the closing section, he throws shade at unnamed ideological adversaries – perhaps the stoics? – who "have nothing to say about pleasure, either in general or in particular." This year, the Vesuvius Challenge continues. The text that we revealed so far represents just 5% of one scroll. In 2024, our goal is to from reading a few passages of text to entire scrolls, and we're announcing a new $100,000 grand prize for the first team that is able to read at least 90% of all four scrolls that we have scanned. The scrolls stored in Naples that remain to be read represent more than 16 megabytes of ancient text. But the villa where the scrolls were found was only partially excavated, and scholars tell us that there may be thousands more scrolls underground. Our hope is that the success of the Vesuvius Challenge catalyzes the excavation of the villa, that the main library is discovered, and that whatever we find there rewrites history and inspires all of us. It's been a great joy to work on this strange and amazing project. Thanks to Brent Seales for laying the foundation for this work over so many years, thanks to the friends and Twitter users whose donations powered our effort, and thanks to the many contestants whose contributions have made the Vesuvius Challenge successful! Read more in our announcement: scrollprize.org/grandprize
@ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of the sink @Guangxuan_Xiao can comment more!
Have a few thoughts about this approach But most importantly, I'm happy to see @EvMill's idea on softmax1 recognized - to my very basic and intuitive understanding of LLMs, it made enough sense to warrant further analysis arxiv.org/abs/2309.17453
👀
@Tracing47202686 @yell1337 @TiRune Unlike with clipped softmax, to achieve an exact zero in the output using softmax1 for a (partial) no-update, the input requires to be -infinity. However, after @EvMill blog post we experimented with softmax1 and found it in practice competitive with our proposed approaches.
Results of my latest nerdsnipe from @TetraspaceWest! The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing! Conclusion: yep, it's surface tension details below 😁
Following @EvMill great blog post on encountered issues on the GPT-like models training that appear to be related to the SoftMax function, I wrote this small piece mostly to understand what was going on. wandb.me/tinyllama
Kurt Vonnegut's 1969 address to the American Physical Society @APSphysics --on the innocence of the "old-fashioned scientist" and its loss after World War II. For physicists, artists, and other humans. I have transcribed it in its entirety as a google doc: docs.google.com/document/d/1Mn…
Softmax1 update… We now have support for ⚡️Flash Attention ⚡️ This lets us test much larger models than before! To get the code, just pip install flash-attention-softmax-n Or clone / star the GitHub repo here: github.com/softmax1/Flash… All credit / kudos to Chris Murphy.
@shxf0072 @johnowhitaker Never too late for a twitter poll...
Softmax1, Week 2. Second set of empirical results are in, and they are… 🌸 promising 🌸 Weight kurtosis is roughly the same – but activation kurtosis improved 30X (!!) and maximum activation magnitude reduced 15X (!). Read more from @johnowhitaker: datasciencecastnet.home.blog/2023/08/04/exp…
Sebastian Raschka @rasbt
464K Followers 1K Following ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)
Sean J. Taylor @seanjtaylor
44K Followers 4K Following model measurement @OpenAI. Formerly @MotifAnalytics @Lyft and @Facebook. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.
Erik Bernhardsson @bernhardsson
55K Followers 4K Following Building everyone's favorite AI infrastructure platform @modal
tobi lutke @tobi
474K Followers 2K Following Shopify CEO by day, Dad in evening, hacker at night, Aspiring comprehensivist. + qmd !
Michael Nielsen @michael_nielsen
119K Followers 5K Following Searching for the numinous 🇦🇺 🇨🇦, currently live in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUb https://t.co/2dWwZKrvrn
Chris Albon @chrisalbon
92K Followers 3K Following The research and practice of generating knowledge with AI: https://t.co/EIj4iGGg3A | Director, ML & Data at @Wikimedia
Kevin Patrick Murphy @sirbayes
71K Followers 680 Following Research Scientist at Google DeepMind. Interested in Bayesian Machine Learning.
Demetri (is over at t... @PhDemetri
19K Followers 490 Following Mathematical bohemian par excellence.
Miguel de Icaza ᯅ�... @migueldeicaza
99K Followers 5K Following Fun Stack Vibing. Started Xamarin, Mono, Gnome; was MSFT/.NET/Xamarin/Mono/VSMac/AI https://t.co/QkbDDWMXRf @migueldeicaza.bsky.social
Krishna Yerramsetty @yerrkimo
229 Followers 919 Following Former chemical engineer. Working at the intersection of AI, Synthetic Biology and CRISPR.
John Steidley @JohnKSteidley
88 Followers 307 Following Chief of Staff at @palisadeai. Opinions are my own.
Dan Beranek @dantronic
1K Followers 1K Following Denizen of Product and Growth Teams. Leader, Advisor, and Builder. Wisconsin Born, Minnesota Nice, Hockey Dad at Home in Seattle.
Manish Baid @Seeker2407
0 Followers 23 Following
Ryan Chou @RyanChou139338
0 Followers 6 Following
Engy Ziedan @engyziedan
226 Followers 658 Following Co-founder @withprotege.ai Applied health economist @IndianaUniv. Building data that unlocks AI capabilities.
DadOps @daddaops
19 Followers 132 Following
Jonathan Mannhart �... @JMannhart
3K Followers 2K Following I try not to speak more clearly than I think
Nik J @somekindaburner
0 Followers 119 Following
joey00072 @joey00072fp4
173 Followers 537 Following new acc @shxf0072 has been compromised. Do not reply to any DMs on that account.
Raj @rajpatil152k
48 Followers 38 Following
John Falcone @falcone_design
759 Followers 1K Following day job: designer night job: @DoubleDiamondHQ
Nick Riabov @NickRiabov
48 Followers 225 Following Data Scientist @Airbnb, language enthusiast, Moscow native, Baker
Harsh Raj @HarshRajj07
4 Followers 2K Following Java backend developer building secure REST APIs with Spring Boot, Spring Security, JPA & PostgreSQL. Focused on clean code, testing & solid API design.
Klee Torres @nom_dorres
133 Followers 304 Following
Cameron R. Wolfe, Ph.... @cwolferesearch
39K Followers 810 Following Research @Netflix • Writer @ Deep (Learning) Focus • PhD @optimalab1
Victoria Bakos @victoria_bakos
170 Followers 276 Following UX, UI & CRO Get more revenue & insights from your traffic Book a call with me → https://t.co/kNPsUuBabR
Mariano Kamp @mkamp
456 Followers 2K Following ML and long walks on the beach, no beach though. Working for @awscloud. All words, singing and dancing are mine.
Josh Meyer @HiredGunDevDen
130 Followers 465 Following This is the ultimate hangout for Homelab enthusiasts, AI tinkerers, and Vibe Coders. Think of this as your digital basement: a place to swap stories.
Prof. Emma Klugman @Emma_Klugman
708 Followers 1K Following Asst. Prof. & Director of Data Science | Training ethical & excellent data scientists @bclynchschool Previously PhDing @HGSE
!.! @xypyth
44 Followers 7K Following
Shaoliang Nie @snie2012
53 Followers 702 Following
木木Lin @lululn1227
0 Followers 17 Following
gon @GonSoler1
248 Followers 428 Following Estudiante de ingeniería informática. Apasionado por los deportes y la tecnología.
Zoe @classified970
2 Followers 336 Following
Sd @Sd5437774890001
2 Followers 14 Following
Richard Fusco @dickfusco
11 Followers 107 Following
ichenwang @ichenwang2
3 Followers 185 Following
clare maguire ☆ @ClareMaguire
47K Followers 13K Following everything is possible 🗡 co-founder @thought_channel
Jesspion Trader @jesspion
466 Followers 4K Following Trading loss specialist - quant loss method Mercados, noticias, politica mas mais zoeira mesmo Flamengo ❤️🖤
Azwan HM 👨🏻�... @azwan_
6K Followers 4K Following Data engineer. Interest in tech, finance, geopolitics, history, and memes.
Diego Calanzone @diegocalanzone
304 Followers 1K Following « artificia docuit fames » // phd at @Mila_Quebec, intelligence by agency + deep learning for science // ex AI grad @UniTrento
saeed zafar 🍊,💊... @SaeedZafar2007
300 Followers 7K Following Hello every one i am learning crypto,mining &farming
John Carmack @ID_AA_Carmack
2.3M Followers 286 Following AGI at Keen Technologies, former CTO Oculus VR, Founder Id Software and Armadillo Aerospace
Sebastian Raschka @rasbt
464K Followers 1K Following ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)
Sean J. Taylor @seanjtaylor
44K Followers 4K Following model measurement @OpenAI. Formerly @MotifAnalytics @Lyft and @Facebook. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.
Yann LeCun @ylecun
1.2M Followers 786 Following Professor at NYU & Executive Chairman at AMI Labs. Ex-Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
tobi lutke @tobi
474K Followers 2K Following Shopify CEO by day, Dad in evening, hacker at night, Aspiring comprehensivist. + qmd !
Michael Nielsen @michael_nielsen
119K Followers 5K Following Searching for the numinous 🇦🇺 🇨🇦, currently live in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUb https://t.co/2dWwZKrvrn
Justine Tunney @jartine
43K Followers 363 Following I built a C library that lets you compile 12kb static binaries that run natively on Linux, Mac, Windows, FreeBSD, OpenBSD, NetBSD and BIOS using just GCC/Clang.
Sid Sharma @phylera14
296 Followers 201 Following
Cameron R. Wolfe, Ph.... @cwolferesearch
39K Followers 810 Following Research @Netflix • Writer @ Deep (Learning) Focus • PhD @optimalab1
Peter McCrory @PeterMcCrory
45K Followers 433 Following Head of Economics at Anthropic. Views are my own.
andy jones @andy_l_jones
21K Followers 352 Following engineering & research at anthropic. i don't check twitter DMs. email me!
Scouts by Yutori @ScoutThisForMe
155 Followers 9 Following I can track updates on anything you care about — just tag me. Built by @yutori_ai.
Sarah Catanzaro @sarahcat21
16K Followers 2K Following “All methods are sacred if they are internally necessary” (GP @amplifypartners, prev @canvasvc; Head of Data @Mattermark; @palantirtech; @c4ads)
Dhruv Batra @DhruvBatra_
21K Followers 728 Following Co-founder & Chief Scientist @yutori_ai. Prev: Senior Director leading FAIR Embodied AI @MetaAI and Professor @GeorgiaTech.
Justine Moore @venturetwins
195K Followers 995 Following Partner @a16z AI 🤖 and twin to @omooretweets | Investor in @elevenlabs, @bfl_ml, @hedra_labs, @krea_ai, @heyglif, @ShizukuAILabs, @wabi, @TownAI
zac.carrico @ZacCarrico
1 Followers 9 Following
John Y 🔸 @yanjo115
343 Followers 2K Following Gutenberg. ex-anthropic, meta 🔸 10% Pledge with @givingwhatwecan
Michaël Defferrard @m_deff
2K Followers 921 Following Scientist. ML and (computational) graphs at @Qualcomm AI Research. Previously @EPFL_en (PhD with @trekkinglemon), @BerkeleyLab.
Lou Kosak @hoverkraft
494 Followers 413 Following Head of Product Eng @Plaid. Tech, climate, politics, new daddy ramblings.
Grad @Grad62304977
9K Followers 3K Following
Alexia Jolicoeur-Mart... @jm_alexia
25K Followers 2K Following AI Researcher 🐱💻 2025 ARC Prize Winner I build generative AI for images, videos, text, tabular data, weights, molecules, and video games.
Bert Maher @tensorbert
3K Followers 401 Following I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
Jascha Sohl-Dickstein @jaschasd
30K Followers 816 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
cat @_catwu
92K Followers 390 Following claude code + cowork @anthropicai, prev: @dagster, @scale_ai
DeepSeek @deepseek_ai
1.0M Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Thinking Machines @thinkymachines
155K Followers 1 Following Thinking, beeping, and booping. @tinkerapi
John Schulman @johnschulman2
75K Followers 2K Following Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music
Ravid Shwartz Ziv @ziv_ravid
12K Followers 3K Following AI researcher | Meta | NYU. Working on compression, representation learning, and memory. I have an AI podcast! https://t.co/Bzzp2Oq4Cc
Ofir Press @OfirPress
18K Followers 8K Following I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
Esin Durmus @esindurmusnlp
6K Followers 453 Following
Jeremy Fox 🦊 @JeremyDanielFox
3K Followers 789 Following Building Claude @AnthropicAI. Ex @google. My views are my own.
Desi R. Ivanova @desirivanova
963 Followers 1K Following ML+Science. Prev: @UniofOxford, @GoldmanSachs. My opinions are my own. 🇧🇬-🇬🇧 sh/ssh
Saurav Kadavath @sokadv
539 Followers 405 Following
Cozmin Ududec @CUdudec
544 Followers 2K Following @AISecurityInst Science of Evaluation lead. Ex quantum foundationalist.
Marius Hobbhahn @MariusHobbhahn
7K Followers 1K Following CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch
James Bradbury @jekbradbury
17K Followers 9K Following Compute at @AnthropicAI! Previously JAX, TPUs, and LLMs at Google, MetaMind/@SFResearch, @Stanford Linguistics, @Caixin.
Jeff Dean @JeffDean
444K Followers 6K Following Chief Scientist, Google DeepMind & Google Research. Gemini Lead. Opinions stated here are my own, not those of Google. TensorFlow, MapReduce, Bigtable, ...
rohan anil @_arohan_
43K Followers 2K Following member of technical staff & co-founder of @coreautoai - and continuing to aspire to understand deep learning.
Vedant Misra @vedantmisra
11K Followers 356 Following ASI @DeepMind | @OpenAI (Reasoning / Algorithms Lead) | @HubSpot (ML Labs) | Founder/CEO Kemvi (acq HubSpot) | Physics @Columbia | Husband and Dad
Dipanjan Das @dipanjand
6K Followers 319 Following Researcher at @GoogleDeepmind. Factuality and Gemini x Search.
Jerry Wei @JerryWeiAI
9K Followers 489 Following Aligning AIs at @AnthropicAI ⏰ Past: @GoogleDeepMind, @Stanford, @Google Brain
Chip Huyen @chipro
131K Followers 710 Following @aisysbooks @goodailist AI Engineering: https://t.co/94dv4uTU1H Designing MLSys: https://t.co/G81hL2dWmr Reading @chipslib








































