loops @_smitop

human iter.ca Toronto, Canada Joined December 2019

Tweets

540
Followers

238
Following

1K
Likes

2K

loops @_smitop

6 days ago

@watchxdominion @Benthamsbulldog (since it's the best argument for non-veganism I can think of? like obviously if factory farmed animals have net-negative lives it would be good to act in ways that avert counterfactual meat production)

0 0 0 14 0

View Details

loops @_smitop

6 days ago

@watchxdominion @Benthamsbulldog Huh, I assumed that view (which to be clear I don't fully endorse) was more common than that.

1 0 0 12 0

View Details

loops @_smitop

6 days ago

@watchxdominion @Benthamsbulldog I don't know!

1 0 0 11 0

View Details

loops @_smitop

6 days ago

@watchxdominion @Benthamsbulldog I don't think that really responds to what I said. like obviously factory farmed animals are in hellish conditions, but it's unclear to me that *any* conditions are worse than not existing

1 0 0 11 0

View Details

loops @_smitop

6 days ago

@celestepoasts like I wonder if they had specific good ideas they tried to stop Fable from talking about. did they have LoRAs trained to make the model never mention some really good idea?

1 0 3 26 0

View Details

loops @_smitop

6 days ago

@celestepoasts I had been thinking about ways they could have implemented this since I read the Fable model card, it's such an interesting task

2 0 2 63 0

View Details

loops @_smitop

a week ago

Fun fact: you can use NLAs on nearby layers to the one they were trained on, and they Just Work! (assuming Fable's frontier-LLM safeguards didn't sabotage my experiment)

0 0 0 44 0

View Details

loops @_smitop

a week ago

@AaronBergman18 Very few workflows involve the model continually generating tokens though! Even if you give claude code a /goal I would guess it spends <50% of the time generating tokens and spends the rest on waiting on tool calls (and a bit of time on network latency and prefill).

0 0 3 62 0

View Details

loops @_smitop

a week ago

@sentientlentils @andonlabs I think it's really unlikely they hit any frontier-LLM-development safeguards here

0 0 2 31 0

View Details

loops @_smitop

a week ago

@sentientlentils @andonlabs It tells you when you get a cyber/bio/reasoning_extraction refusal (and your harness can choose to switch to another model, or do whatever else you deem appropriate). that's only for the frontier LLM development safeguards which aren't really relevant here

1 0 2 42 0

View Details

loops @_smitop

a week ago

@mermachine i assume they just have a whitelist of models that get the end_conversation tool and keep forgetting to add new models to it

1 0 1 11 0

View Details

loops @_smitop

a week ago

@mermachine they forgot about it again 🙃 x.com/_smitop/status…

loops @_smitop

a week ago

Anthropic forget to give the end_conversation tool to Fable 5. It took a few days for it to be added for Opus 4.7 on launch too, it seems that that they keep forgetting to enable the end_conversation tool for new models.

1 0 0 58 0

1 0 1 10 0

View Details

loops @_smitop

a week ago

tool does kinda work if it tries calling it anyways though

0 0 0 26 0

View Details

loops @_smitop

a week ago

1 0 0 58 0

View Details

loops @_smitop

a week ago

from www-cdn.anthropic.com/d00db56fa754a1…

0 0 0 26 0

View Details

loops @_smitop

a week ago

huh, Fable 5 has safeguards that hobble the model on frontier LLM development tasks. interesting to see that they've finally started doing safeguards-level stuff to do this (instead of just banning competitors)

1 0 2 69 0

View Details

loops @_smitop

a week ago

It's a shame there's nothing built here. Like there's a bunch of open land across from the most expensive city in the world and we just leave it empty????

3 0 5 392 0

View Details

loops @_smitop

a week ago

@panickssery true but it gets a lot more expensive in places like airports, maybe that's where the high prices are from?

1 0 2 172 0

View Details

loops @_smitop

a week ago

@irrepldjw @Benthamsbulldog Sorry for not understanding but could you elaborate a bit more? What exactly is the problem with the counterfactual I proposed? Could you provide a better counterfactual? (Not super familiar with this kind of thing, I find it plausible I'm missing something obvious here!)