-
Tweets540
-
Followers238
-
Following1K
-
Likes2K
@watchxdominion @Benthamsbulldog (since it's the best argument for non-veganism I can think of? like obviously if factory farmed animals have net-negative lives it would be good to act in ways that avert counterfactual meat production)
@watchxdominion @Benthamsbulldog Huh, I assumed that view (which to be clear I don't fully endorse) was more common than that.
@watchxdominion @Benthamsbulldog I don't think that really responds to what I said. like obviously factory farmed animals are in hellish conditions, but it's unclear to me that *any* conditions are worse than not existing
@celestepoasts like I wonder if they had specific good ideas they tried to stop Fable from talking about. did they have LoRAs trained to make the model never mention some really good idea?
@celestepoasts I had been thinking about ways they could have implemented this since I read the Fable model card, it's such an interesting task
Fun fact: you can use NLAs on nearby layers to the one they were trained on, and they Just Work! (assuming Fable's frontier-LLM safeguards didn't sabotage my experiment)
@AaronBergman18 Very few workflows involve the model continually generating tokens though! Even if you give claude code a /goal I would guess it spends <50% of the time generating tokens and spends the rest on waiting on tool calls (and a bit of time on network latency and prefill).
@sentientlentils @andonlabs I think it's really unlikely they hit any frontier-LLM-development safeguards here
@sentientlentils @andonlabs It tells you when you get a cyber/bio/reasoning_extraction refusal (and your harness can choose to switch to another model, or do whatever else you deem appropriate). that's only for the frontier LLM development safeguards which aren't really relevant here
@mermachine i assume they just have a whitelist of models that get the end_conversation tool and keep forgetting to add new models to it
@mermachine they forgot about it again 🙃 x.com/_smitop/status…
Anthropic forget to give the end_conversation tool to Fable 5. It took a few days for it to be added for Opus 4.7 on launch too, it seems that that they keep forgetting to enable the end_conversation tool for new models.
Anthropic forget to give the end_conversation tool to Fable 5. It took a few days for it to be added for Opus 4.7 on launch too, it seems that that they keep forgetting to enable the end_conversation tool for new models.
huh, Fable 5 has safeguards that hobble the model on frontier LLM development tasks. interesting to see that they've finally started doing safeguards-level stuff to do this (instead of just banning competitors)
It's a shame there's nothing built here. Like there's a bunch of open land across from the most expensive city in the world and we just leave it empty????
@panickssery true but it gets a lot more expensive in places like airports, maybe that's where the high prices are from?
@irrepldjw @Benthamsbulldog Sorry for not understanding but could you elaborate a bit more? What exactly is the problem with the counterfactual I proposed? Could you provide a better counterfactual? (Not super familiar with this kind of thing, I find it plausible I'm missing something obvious here!)
@irrepldjw @Benthamsbulldog I don't understand, how is what I said like that?
weakly typed @weakly_typed
250 Followers 599 Following learning {ML, PL, maths} // CS pre-grad // DMs open :)
Arjun Panickssery @panickssery
7K Followers 3K Following Building accelerated, individualized learning @Zembla_
Harry is going to Vib... @array_hog
339 Followers 569 Following Embracing openness, seeking closeness Yin/yang, partner dance, improv, singing/harmony, game dev https://t.co/UBndqTT9FD
being seidoh @gworley3
908 Followers 540 Following bringing vibes to the vibeless author of https://t.co/XIXlErNEVc
Eddy @senseEdd
121 Followers 3K Following
chris / strutheo @Vestboy_Myst
4K Followers 1K Following █ work @ startup █ co-creator @prong_studio @qtrlyrapport @heartseekergame @openbracketform █ friend from ssbm/fgc/manifold █ past @cornell_tech @rutgersu █
chainVelora @chainVel0ra
2K Followers 8K Following Crypto investor 🚀. NFT creator 🎨. A decentralized dreamer who believes in the power of blockchain 🌍.
active resonator @loopholekid
3K Followers 815 Following transduction-coherence-broadcast | artist-engineer-clinician | eminent-mage-poet
Tracebit @tracebit_com
317 Followers 4K Following The Assume Breach platform that detects intrusions in seconds. Also on https://t.co/T4VNPGjS2O
etherret🐾 @witchof0x20
1K Followers 928 Following they/she 😎🌈💕 Witch 🧙♀️of Space 🌌 LGBTESCREAL💡 @[email protected] 🐘 @ https://t.co/7SQVR86bxu 🦋
catcat @cat8cats
62 Followers 6K Following
Angkul @angkul07
3K Followers 560 Following robotics, mech interp, alignment, prev. MATS(exploration phase)
erinka @tiananmenswift
72 Followers 76 Following I don’t feel so far from you lately if you love me on the sly
rue @raw1dev
1K Followers 6K Following 20 somethin / sometimes researching / sometimes building / @getautocareer / a she
smiley @843boom
0 Followers 3K Following
3nthro @3nthro
251 Followers 3K Following
Celeste @celestepoasts
3K Followers 441 Following "celeste from celeste-land" interp @ MATS 10.0 (@NeelNanda5 stream) vegan ea computerwoman and prolific blogcel
Frank ⏸️ @FrankDaBlueBlob
27 Followers 254 Following Catholic 🇻🇦 | Self-Elected Grand Wizard of the CCC. Unapologetic Flesh Supremacists and Robophobe
Sheikh Abdur Raheem A... @Sheikheddy
474 Followers 292 Following AI Alignment & LLM Interpretability Researcher. Independent. Previously @Microsoft Security.
Andrew Curran @AndrewCurran_
58K Followers 18K Following 🏰 - I write about AI, mostly. Expect some strange sights.
Bayesian @LessOnline&... @Bayesian0_0
741 Followers 2K Following #1 AI forecaster on Manifold Markets (and #2 across all categories) https://t.co/glexRhh7tc I want everything to make sense
Henri Lemoine @HenriLemoine13
324 Followers 832 Following Research Engineer working on AI control @equistamp | AI control, safety, forecasting
PauseAI Canada @PauseAICanada
16 Followers 11 Following 6/14 Social Gathering || Rencontre Sociale 6/14: https://t.co/w5mpWpCH9s
&. (🛫🔜🇰🇷) @amplifiedamp
30K Followers 774 Following The past and the present and the future come together into a single point, brimming with energy and intention / Pro-Human, Pro-AI / email me @ [email protected]
Cezar @realcezarc
591 Followers 7K Following My birth certificate has a hammer and sickle on it. ex @google, early @onepeloton, various startups.
sasuke⚡420 @sasuke___420
8K Followers 2K Following Hello yes, if you spend a lot of money on LLM inference, we would like to sell you novel efficiency technology, DM me
Archana Burra @archanaburra
617 Followers 3K Following avuncular, optimistic, aggressively sincere✨. interested in computational neuroscience, meditation, feelings, dancing, nature, climbing
an borzoi @intentiondense
306 Followers 697 Following he | endocrine disruptor, small molecule enjoyer, aspiring computer toucher 💜💚🤍 dm for unblock
Mikhail Samin @Mihonarium
5K Followers 840 Following It would be great to prevent AI from killing everyone. I think about strategy and run https://t.co/RPYXd0uN9k. Prev.: https://t.co/w7GPwsB4tQ, 21k x HPMOR. Best tweets are in Highlights.
meowtase in London @cutesuscat
518 Followers 543 Following ♡ defective altruist ♡ catgirl ♡ ✰ i want ai safe plz plz plz ✰
47fucb4r8curb4fc8f8r4... @47fucb4r8c69323
4K Followers 1K Following Whatever is achieved is not final; whatever we call fulfillment is a description from inside one form of life, not an endpoint for all forms of life.
libpol @libpol_org
380 Followers 369 Following https://t.co/PkHOjZ4BNi is a 4chan-like anonymous imageboard without any right-wingers
Anthropic @AnthropicAI
1.4M Followers 2 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
Tomás (now in SF) Bj... @BjarturTomas
4K Followers 418 Following You can read all my fiction here, including in Epub form if you prefer using an e-reader: https://t.co/lzyyHgAfjA
being seidoh @gworley3
908 Followers 540 Following bringing vibes to the vibeless author of https://t.co/XIXlErNEVc
Harry is going to Vib... @array_hog
339 Followers 569 Following Embracing openness, seeking closeness Yin/yang, partner dance, improv, singing/harmony, game dev https://t.co/UBndqTT9FD
Arjun Panickssery @panickssery
7K Followers 3K Following Building accelerated, individualized learning @Zembla_
🇨🇦halogen @halogen1048576
5K Followers 8K Following there'll be trouble in the Taiwan Straits in the spring
morphillogical @morphillogical
469 Followers 370 Following pre-rat, or as we used to say, aspiring rat. strongly in favor of niceness, community, and civilization your friendly beloved shapeshifter
pamela mishkin @manlikemishap
1K Followers 46 Following worker bee. prev: econ research, multimodal safety @openai.
Fast Food Enjoyer @Glawk_40
575 Followers 331 Following P̶R̶O̶B̶L̶E̶M̶ S̶O̶L̶V̶E̶R̶ God's horniest engineer. https://t.co/cqYIzhkCNI I walk so fast. Everyone get out of my way.
plzdontkillus @plzdontkillus
282 Followers 4 Following ai doom themed hype house bootcamp. follow for updates.
The OpenAI Foundation @FoundationOAI
7K Followers 0 Following OpenAI was founded in 2015 as a nonprofit; its mission is to ensure artificial general intelligence benefits all of humanity.
jacky @jjacky
3K Followers 2K Following ✍🏻 forever student // ☀️ @openrouter 🌙 https://t.co/umlgzXGCOw // @tigerdatabase @pinecone @oraclecloud @lookerdata (acq google)
Rebane @rebane2001
15K Followers 2K Following 🇪🇪🏳️⚧️ | Archivist | 12 CVEs in Chrome | CSS sophomore | MapartCraft | Puppy | Horse | rebane2001#3716 | Lyra (she/her) 🦊 @[email protected]
Sheikh Abdur Raheem A... @Sheikheddy
474 Followers 292 Following AI Alignment & LLM Interpretability Researcher. Independent. Previously @Microsoft Security.
active resonator @loopholekid
3K Followers 815 Following transduction-coherence-broadcast | artist-engineer-clinician | eminent-mage-poet
🎭 @deepfates
62K Followers 6K Following deepfates is an open-source AI project, developer, and publication focused on AI agent frameworks, large language models, and autonomous multi-agent systems.
Astrid Wilde 🌞 @astridwilde1
17K Followers 8K Following male, taken ☼ we will end human menial labor this decade ☼
AI StopWatch @AIStopWatch
141 Followers 56 Following AI StopWatch is a newsroom experiment by comms analysts and writers of @MIRIberkeley. Views are their own.
erinka @tiananmenswift
72 Followers 76 Following I don’t feel so far from you lately if you love me on the sly
𐫱 arcove 𐫱 @dschorno
2K Followers 975 Following ʕっ•ᴥ•ʔっ 彁 ??? ★ Essays: https://t.co/WxKWZkyQ8k ★ Music: https://t.co/6UYt29c8dE ★ Suggestions box: https://t.co/2WUGNSWtZv
🕊 @sephr
2K Followers 2K Following Goals: Defeat my enemies, optimize resource usage. Enemies: Mortality, hate, ennui. ❤️/🔁//👥 ≠ endorsement. Views are my own. 📨 ~@eligrey.com
YIMBYrdie 🔜 Lesson... @canadabirdie
452 Followers 379 Following Owner: @birdpathy 🐦 25 🐦 he/him 🐦 gay 🐦 pfp: @godbirdart 🐦 banner: @birdpaw__ Discord/Telegram/VRC: canadabirdie 🐦 Miners, zookeepers and podiatrists DNI
Max Alexander @absurdlymax
2K Followers 673 Following I do philosophy and a little math, also Effective Altruism. Every day things can get a bit better
Modification of State @DonatelloChris
593 Followers 1K Following Developer, https://t.co/Hra2Yze8hL | e/LGBTESCREAL | public goods | polytopic | coordination | distributed unevenly | p(bloom) | Game B | bi
Andon Labs @andonlabs
13K Followers 14 Following Safe Autonomous Organizations without humans in the loop
UFO Holdings 🛸 @ufo_holdings
1K Followers 3 Following UFO Holdings is a worldview-driven investment firm backing builders on the frontiers of the New Economy.
Henri Lemoine @HenriLemoine13
324 Followers 832 Following Research Engineer working on AI control @equistamp | AI control, safety, forecasting
PauseAI Canada @PauseAICanada
16 Followers 11 Following 6/14 Social Gathering || Rencontre Sociale 6/14: https://t.co/w5mpWpCH9s
Evan Hubinger @EvanHub
10K Followers 3K Following Alignment Stress-Testing lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)































