When Claude is wrong and you correct it but thankfully it’s only wrong on things you’re very knowledgeable about, not on things you only have surface knowledge of.
@predict_addict@andrewgwils Because I’m a cs major and consciously chose not to do math and physics knowing it would have probably been better. Hindsight 20/20
To settle the "buy GPUs vs rent" debate for side projects: once you buy that RTX PRO 6000, you will procrastinate and let it idle. If you're renting it for $2/hr, you will be more productive working on your side project than you will at your day job.
@datavorous_ An agent orchestration harness with a CEO, engineers, and validator agents enabling devs to push 150kloc/day? Seems like the right next step if this kid wants to stop shipping weekend side projects and make a real impact on the world!
Turns out adding 0 helps :)
Today we’re introducing Ternary Bonsai 🌳, a family of end-to-end 1.58-bit language models in 8B, 4B, and 1.7B sizes.
Ternary Bonsai 8B is within 5% of Qwen 3 8B at 9x lower memory.
Still tiny. Noticeably smarter
The european mind (me) cannot comprehend that you would see an $18 flight ticket and your first instinct is buy all of them for $3400
Well played, well played
Renting H100s from runpod to write tinygrad bounties like a medieval peasant paying a tithe to his feudal lord for a meager plot of compute. I toil day and night, hoping my bounty harvest is enough to win the respect of the king and avoid starvation
It's nice that we could get Bonsai-family support so quickly, but this is a bit disingenuous. I have never contributed to tinygrad so I am not in a position to critique this, however this implementation unpacks the 1bit weights as float16 and runs computations on float16 instead of running custom kernels on the packed weights, nullifying a lot of the benefits of the Bonsai architecture.
Q1_0 It is a packed 1-bit format: for each block of 128 weights, you store 16 bytes of bits and 2 bytes for a shared fp16 scale.
128 weights take 18 bytes total. If you unpack those same 128 weights into float16, that becomes 256 bytes (14x).
This is basically unpacking the "bit-based llm" in normal float16 and running calculations that way.
My understanding of that llama.cpp’s Bonsai support keeps the weights in the quantized Q1_0 representation and uses kernels that operate on that packed format, which is the whole point.
Again, I do not mean this as a shot at the implementation itself. Getting support working this quickly is genuinely cool. I might also be misunderstanding some parts of this, hopefully not too much, but would love to be corrected.
Yea, just replied as well, I was totally misunderstanding. For some reason I thought that the ggml loading to tensor wouldn't be fused with the rest of the code (not sure why I would think that) so the scheduler only saw the multiplication with the float16 d, and separately saw the rest of the model. The memory usage doesn't lie
@__tinygrad__ Oh wow, yea absolutely. I mistakenly thought that the multiplication with d in the loader would result in a cast to float16, but it gets fused and never directly unpacked. this library is beautiful
7K Followers 7K FollowingBuilt and led tech teams globally. Now I build AI solutions, advise on AI/tech, and call it like I see it. Building @opencrust 🦀. MSc AI @LivUni 🇨🇦 in 🇦🇪
1K Followers 1K FollowingCEO at https://t.co/ehOEUaYceO (YC S25)
Slashy is the only AI emaill client that actually saves you time, and not just generate AI slop
274 Followers 1K FollowingCheck out my weekly newsletter and videos about the future of new construction
https://t.co/TA8UcJmMlh
https://t.co/XEDs6n6uYm
579 Followers 2K FollowingInvestor, Porsche Ventures @porsche | prev @fontinalis. Below average golfer, above average cartner. Collaborating with pioneers building the industrial future.
2K Followers 7K FollowingHumanist
Home Educator
Company Director
Husband of 1 & father of 5
Military Working Dog owner
Tensor Wrangler
libertarian
Autodidact
Imagineer
533 Followers 128 FollowingBioengineering Ph.D.
Building Closed-Loop Directed Evolution.
Prev. @genentech @Stanford @Microsoft @UW.
Life is a brief internship in being.
8K Followers 40 FollowingCalico is a research and development company tackling one of life's greatest mysteries, the biology that controls human aging.
764 Followers 1K FollowingGroup Leader at @CECAD_ @UniCologne. Interested in reproductive #aging, oocyte #proteostasis, and super-organelles. Also at https://t.co/YziWXdR9my
5K Followers 1K FollowingBioengineering PhD @Stanford and @arcinstitute. I work on synthetic biology, gene regulation, and neuroscience. Writing at https://t.co/RGtaNVKdBU
23K Followers 2K FollowingMicrobiology, evolution, antibiotic resistance, applied math, molecular biotech. Associate Professor @HarvardMed. Basic research is the engine of progress. 🦠🧬
6 Followers 454 Following"Apart from you, I have my own family: a smooth-coated wolf,
a light-footed spotted leopard, and a long-maned hyena." - Al-Shanfarā