We just released a new challenge today ❗
GPT-2 (120M) Transformer Block
Compose multiple kernels into a full transformer block. The first of many upcoming challenges focused on real-world inference optimization.
Write your solution in CUDA, Triton, PyTorch, JAX, Mojo, or CuTe DSL and benchmark it on state-of-the-art GPUs like the H100, H200, and B200, and more.
We just launched a couple of new features on LeetGPU
- PyTorch Profiler Traces for every submission
- AI Chat to help explain, debug, and optimize your code
Go try them out!
Blog post for day 2 of doing one @LeetGPU challenge a day is live!!
We go over writing a matrix multiplication kernel 🤯
We start with a naive solution and keep adding tons optimizations on top of it that leave us with the most efficient solution 🚀
arayz.dev/blog/leetgpu-m…
🚀 Want to become a CUDA ninja?
Start with the new CUDA Programming Guide - Section 4 is your gold mine!
It’s packed with features most developers don’t even know exist, and it can unlock serious performance gains, smarter debugging, and cleaner GPU code.
day 100/100 of GPU Programming
Didn't write a kernel today. I spent the day reflecting.
100 days writing kernels and I didn't miss a single day, not one. On some days, I learnt to write new ones, some days I practiced kernels I've written before. I took on something my
195 Followers 1K Followingalways building • exploring healthtech, erp systems for smb, indic speech processing, agritech • day @yourtokenio B2B for D2C, ex @Medlr_in • n&w s2
3K Followers 7K FollowingDatabases | Kubernetes | Compilers | EM @ClickHouseDB | https://t.co/12g8segwjE | https://t.co/worpODcnLt | Prev: AI Research @TomTom | Opinions my own