Rishi Mehta @rishicomplex

Solve i̶n̶t̶e̶l̶l̶i̶g̶e̶n̶c̶e̶ ̶ coding, use it to solve everything else | Research @AnthropicAI | Past: RL @GoogleDeepmind: AlphaProof co-lead, Gemini. rishimehta.xyz San Francisco, CA Joined July 2009

Tweets

298
Followers

4K
Following

346
Likes

7K

ClaudeDevs @ClaudeDevs

2 days ago

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible. Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days). We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right. Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible. If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. support.claude.com/en/articles/82…

668 427 5K 801K 943

View Details

Dario Amodei @DarioAmodei

2 days ago

Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-on…

1K 2K 13K 5.9M 12K

View Details

Rishi Mehta @rishicomplex

3 days ago

jokes here: rishimehta.xyz/roast_bench/

0 0 1 379 0

View Details

Rishi Mehta @rishicomplex

3 days ago

The one joke that made me laugh

1 0 3 532 0

View Details

Rishi Mehta @rishicomplex

3 days ago

Fable 5 beats Opus 4.8 on RoastBench (but still well behind humans)

Rishi Mehta @rishicomplex

2 weeks ago

Made a little benchmark called RoastBench - it compares frontier models on their roast jokes. The models roast 10 personalities from comedy central roasts I enjoyed, and I manually rank their jokes. I also mark the ones that made me laugh. LLMs are way worse than top humans.

6 1 25 5K 4

2 1 7 2K 0

View Details

Rishi Mehta @rishicomplex

3 days ago

Sota on write-all-of-julians-code-bench

Julian Schrittwieser @Mononofu

3 days ago

I’m incredibly excited that Fable is now available for everyone! I’ve been blown away by how smart it is - it one-shots entire PRs for me, finds obscure bugs and has written all my code since I started using it.

17 12 313 50K 46

1 0 22 8K 6

View Details

Andrej Karpathy @karpathy

3 days ago

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

Claude @claudeai

3 days ago

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

506 2K 15K 5.3M 2K

1K 2K 25K 2.5M 6K

View Details

Rishi Mehta @rishicomplex

3 days ago

Fable on FrontierCode

Cognition @cognition

3 days ago

Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:

45 104 1K 301K 181

0 0 7 2K 0

View Details

Rishi Mehta @rishicomplex

3 days ago

For the first time I don't feel the need to review its code line-by-line. It works autonomously over long horizons, on underspecified prompts, figuring things out as it goes.

0 1 1 666 0

View Details

Rishi Mehta @rishicomplex

3 days ago

Fable 5 and Mythos 5 are out! Fable is Mythos with additional safeguards turned on to prevent misuse.

Claude @claudeai

3 days ago

506 2K 15K 5.3M 2K

1 1 14 2K 2

View Details

Rishi Mehta @rishicomplex

2 weeks ago

@JeremyNguyenPhD yeah this is a fair point, writing good jokes is actually very hard and the human baseline here is very strong

0 0 0 40 0

View Details

Rishi Mehta @rishicomplex

2 weeks ago

6 1 25 5K 4

View Details

Rishi Mehta @rishicomplex

2 weeks ago

@ehalm_ I think that's part of it but they also don't seem to understand what's funny

0 0 0 51 0

View Details

Rishi Mehta @rishicomplex

2 weeks ago

you can check out all the jokes at rishimehta.xyz/roast_bench/.

0 0 2 143 0

View Details

Rishi Mehta @rishicomplex

2 weeks ago

The models can kind of figure out the beginnings of a setup but their punchlines just fall flat. It's like they don't yet have a good model for what causes a human to laugh.

1 0 3 419 1

View Details

Rishi Mehta @rishicomplex

2 weeks ago

New opus! It's smarter, more reliable, and uses its tokens better.

Claude @claudeai

2 weeks ago

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

4K 9K 67K 15.2M 8K

3 1 16 1K 0

View Details

Rishi Mehta @rishicomplex

3 weeks ago

@karpathy Welcome!

0 0 0 148 0

View Details

Rishi Mehta @rishicomplex

a month ago

Sara's team is awesome, apply if you're excited about aligning Claude!

Sara Price @sprice354_

a month ago

This is important and challenging work. If you are excited about contributing please consider applying - particularly by joining the Anthropic Fellows program!

2 2 40 3K 0

0 0 2 1K 2

View Details

Rishi Mehta @rishicomplex

2 months ago

new opus in town

Claude @claudeai

2 months ago

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.