Mohammed Muawia @MAlshift

Joined February 2026

Tweets

3
Followers

0
Following

21
Likes

5

Mohammed Muawia @MAlshift

a month ago

@Manu_Sisti AI

0 0 0 10 0

View Details

Mohammed Muawia @MAlshift

a month ago

@AKiran37097 Google

0 0 0 8 0

View Details

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

Mohammed Muawia @MAlshift

Mohammed Muawia @MAlshift

Mohammed Muawia @MAlshift

Nous Research @NousResearch

AI at Meta @AIatMeta

Vini Jr. @vinijr

PyTorch @PyTorch

Kylian Mbappé @KMbappe

Jami @expertwith_AI

Cristiano Ronaldo @Cristiano

Data Science Fact @DataSciFact

JavaTrends @JavaTrendss

Data Science Central @analyticbridge

Towards Data Science @TDataScience

PythonTrends @PythonTrends

React Trends @ReactJSTrends

Machine Learning Tren... @MLTrendss

Andrej Karpathy @karpathy

Yann LeCun @ylecun

Sebastian Raschka @rasbt

Jeremy Howard @jeremyphoward

Hamel Husain @HamelHusain

3asq Manga Team @3asqmanga

Elon Musk @elonmusk

SpaceX @SpaceX

Trends for United States

You might like