GPU MODE @GPU_MODE

Your favorite GPU community discord.gg/gpumode gpumode.com Joined September 2024

Tweets

248
Followers

9K
Following

13
Likes

367

Mark Saroufim @marksaroufim

18 hours ago

GPU MODE has powered much of the public GPU kernel work online, with a permissive license from day one and generous credit from researchers, NVIDIA, AMD, and others. Today we’re moving our datasets to the Researcher Reciprocity License.

Mark Saroufim @marksaroufim

19 hours ago

June 9th Researcher Reciprocity License "if you train on it, you let us generate - reverse terms of use void" Status quo 1. We teach frontier devs with ICLR/NeurIPS papers, OSS Github contributions 2. They use it to make frontier models 3. Then ban us from exploring our ideas

29 92 824 66K 81

7 26 383 28K 82

View Details

GPU MODE @GPU_MODE

a week ago

@Aru__09 this is so cool! When you're done and if you wanna give a talk about this work please lmk!

1 0 7 884 2

View Details

snow @snowclipsed

2 weeks ago

YEEEEEES

2 2 44 3K 6

View Details

Marcio K @MarcioK

4 weeks ago

Since the Humanity's Last Hackathon from @huggingface didn’t happen, I set up my own mini version using Kernelbot and Popcorn from @gpu_mode. > The goal was to test how well LLMs can generate code for difficult tasks, like writing faster kernels for Apple’s MPS with @PyTorch. > My strategy was to let the LLM submit a kernel, get feedback from the benchmark, and then iterate based on the learnings. > The hardest part was not the code generation itself, but coordinating all the systems. Kernelbot, Popcorn, submissions, feedback, orchestration... > The benchmark eats almost all my RAM, so parallelizing too many submissions is hard. My machine starts crashing if I push it too much. Overall, I need more time to tune the prompts, experiment with better feedback loops, and maybe try some RL-style iteration. There are still lots of techniques worth exploring here. In the video: Left: task orchestrator Right: live dashboard tracking submissions, code, and lessons learned

0 2 10 2K 1

View Details

Simon V @Simon_Vt

a month ago

@ReubenConducts @GPU_MODE @marksaroufim Best talk on CuTe Layout algebra imo :)

1 1 3 2K 1

View Details

GPU MODE @GPU_MODE

a month ago

NVIDIA cuDNN team tomorrow at noon

2 6 114 5K 14

View Details

GPU MODE @GPU_MODE

a month ago

Codex is so good at writing kernels that it felt appropriate to do a Codex only kernel competition. Metal is great because you'll be able to tangibly feel the perf improvements in your local models

Ben Burtenshaw @ben_burtenshaw

a month ago

Humanity's Last Hackathon is NOW OPEN for registration. This is not a normal hackathon. You will be judged on the context, not the code! Use Codex @OpenAIDevs to build and optimize models for local inference (kernels on Max metal). Submit through @GPU_MODE. Climb the

21 36 368 91K 305

0 3 63 6K 16

View Details

Ben Burtenshaw @ben_burtenshaw

a month ago

21 36 368 91K 305

View Details

GPU MODE @GPU_MODE

a month ago

GPU MODE cited on @tbpn - thank you for the plug @AnushElangovan !! We hope to continue making GPU programming more accessible to everyone!

TBPN @tbpn

a month ago

AMD's @AnushElangovan explains why he thinks his company's open source ethos combined with agentic AI superpowers their leverage as a company: Because AMD publishes a lot of technical details about its hardware, when engineers use AI tools, the models already “understand” AMD’s

10 14 131 34K 38

2 1 39 8K 7

View Details

GPU MODE @GPU_MODE

a month ago

princeton-nlp.github.io/cos484/

1 1 24 2K 16

View Details

GPU MODE @GPU_MODE

a month ago

We helped host a kernel competition for @tri_dao's course at Princeton's COS 484: Natural Language Processing If you're a university or educator that's interested in live programming problems for your students please reach out!

2 2 78 23K 42

View Details

GPU MODE @GPU_MODE

2 months ago

Gluon and Linear Layouts. Tomorrow noon PST youtube.com/watch?v=GC7-_o…

0 2 19 2K 7

View Details

Reuben Stern @ReubenConducts

2 months ago

My colleagues Jack Carlisle and Jay Shah gave a fantastic lecture for @GPU_MODE yesterday on our categorical foundations for CuTe layout algebra! They were joined by Cris Cecka, the inventor of CuTe, and @marksaroufim as moderators. Bravi tutti! youtu.be/MVh_guNbWMA?si…

2 9 59 5K 33

View Details

GPU MODE @GPU_MODE

2 months ago

Jack Carlisle & Jay Shah talk on the Fundamentals of CuTe Layout Algebra. Special guest Cris Cecka See you soon! youtube.com/watch?v=8QfQd8…

1 12 61 5K 25

View Details

Pieter Delobelle @pieterdelobelle

2 months ago

our "@pleiasfr and friends" team got second place at the @GPU_MODE hackathon in Paris last week! 🥈 we had a lot of fun optimizing our training throughput, so trying out 8 bit training, muon, RoPE/NoPE, conv architectures, ... Basically nanogpt speedrunning on B300s.

Verda (formerly DataCrunch) @verdacloud

3 months ago

We’re sponsoring a @GPU_MODE hackathon in Paris on April 9, to conclude the PyTorch Conference Europe 🇪🇺 Our grand prize? 48 hours on GB300 NVL72. Join us with the teams behind @PyTorch, @PrimeIntellect, @SemiAnalysis_, @sestercegroup, and more! luma.com/gpu-mode-paris…

0 0 5 4K 0

5 3 31 7K 5

View Details

GPU MODE @GPU_MODE

2 months ago

We're back on schedule Tuesday April 14 at 9am PST we'll have a talk from Andrei Panferov and Erik Schultheis on their improved recipe for nvfp4 pretraining. They'll cover both math and kernels youtube.com/watch?v=HB4up2…

0 13 62 6K 17

View Details

Zhipeng Huang @nopainkiller

2 months ago

Matej Sirovatka from @PrimeIntellect sharing how @arcee_ai trinity was trained

4 9 124 12K 19

View Details

GPU MODE @GPU_MODE

2 months ago

Incredible release

Alex Zhurkevich @cudagdb

2 months ago

Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/…