Jordan Taylor @JordanTensor

Working on new methods for understanding machine learning systems and entangled quantum systems. sites.google.com/view/jordanten… Brisbane Joined December 2009

Tweets

510
Followers

468
Following

1K
Likes

29K

Zvi Mowshowitz @TheZvi

15 hours ago

UK AISI is doing some great jailbreaking work. They seem to consistently be able to get through, where others don't.

0 12 139 15K 20

View Details

New paper! Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models @METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT?

2 16 73 19K 25

View Details

Rauno Arike @RaunoArike

15 hours ago

Glad to have contributed to this new paper! We measured the length of tasks LLMs can complete without CoT, which is a key proxy for the extent to which we can trust CoT monitors. Result: the 50% no-CoT time horizons of frontier LLMs are ~3 minutes and double every 373 days.

Dewi Gould @dswg97

16 hours ago

2 16 73 19K 25

0 0 4 84 1

View Details

Geoffrey Irving @geoffreyirving

17 hours ago

We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵

22 115 793 123K 334

View Details

Joseph Bloom @JBloomAus

2 days ago

Model Transparency at the @AISecurityInst evaluated Claude Mythos 5 for capabilities and behaviours relevant to monitorability, our first time doing this in pre-deployment testing! Details in thread 🧵

2 16 85 9K 37

View Details

Anthropic @AnthropicAI

7 days ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

2K 5K 29K 18.4M 15K

View Details

Buck Shlegeris @bshlgrs

2 weeks ago

An obvious way to study whether a training technique removes misalignment is to run that technique on a model organism (MO). But we've found that MOs are often weirdly fragile. E.g. training them to talk like a pirate often removes their bad behavior. 1/2

2 5 138 8K 42

View Details

Joseph Bloom @JBloomAus

3 weeks ago

This report is an incredibly detailed and broad look into how it might become harder to monitor, audit or generally make confident claims about frontier AI systems. We interviewed an exceptional array of experts from multiple frontier labs, academia and industry. Worth a read!

AI Security Institute @AISecurityInst

3 weeks ago

The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵

5 36 140 32K 84

0 4 33 4K 14

View Details

Thomas Read @thjread

3 weeks ago

I helped write this report on oversight of AI systems and how it could degrade - it's a great overview, and a good guide to what research directions might help us maintain the level of oversight we enjoy today

AI Security Institute @AISecurityInst

3 weeks ago

5 36 140 32K 84

0 1 2 171 0

View Details

Jordan Taylor @JordanTensor

3 weeks ago

@1a3orn x.com/JordanTensor/s…

Jordan Taylor @JordanTensor

3 weeks ago

@1a3orn @jiaxinwen22 Though notably there's nothing requiring the discrete tokens to be legible english. "Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought" is an example of learning to reason in an abstract discrete token vocabulary. arxiv.org/abs/2604.22709

0 0 7 272 0

0 0 2 169 0

View Details

Jordan Taylor @JordanTensor

3 weeks ago

0 0 7 272 0

View Details

Jordan Taylor @JordanTensor

3 weeks ago

Or engage on the associated LessWrong post: lesswrong.com/posts/JvZxp554…

0 0 2 49 0

View Details

Jordan Taylor @JordanTensor

3 weeks ago

See more in the the paper: aisi.gov.uk/blog/will-it-b…

1 0 2 48 0

View Details

Jordan Taylor @JordanTensor

3 weeks ago

There are a lot of pathways via which AI oversight is likely to degrade! Latent reasoning architectures, situational awareness, representational drift... We wrote a report ranking them. Here I'll go into some which worry me most 🧵