AI beyond the hype. Real insights, real breakthroughs, real methods. Philosophy, benchmarks, quantization, hacks—minus the marketing smoke. Injecting facts intopromptinjection.netJoined June 2025
Unlimited-OCR 🔥New OCR from @PaddlePaddle
It can parse hundreds of pages in a single pass while maintaining stable speed.
The key idea is R-SWA (Reference Sliding Window Attention), which keeps KV cache constant during decoding.
🏆 93% on OmniDocBench
📈 +6% over DeepSeek-OCR
We're open sourcing a 9B model that extracts structured data from documents at near-frontier performance.
- 90.2% on our bench, vs Gemini 3.5 Flash at 91.3%
- Leads extraction models like NuExtract3 (81.5%)
- 9.5s p50 timings
- Pass JSON schema
Ming Omni TTS 16.8B - 30GB monster for high-performance unified audio gen.
- speech, sound, music
- speed, pitch, emotion
- 93% accuracy on Cantonese dialects
- narrates complex math/chemical expressions
- zero-shot voice design
Optimized for high-speed, low-latency gen. Perfect for long-form content.
huggingface.co/inclusionAI/Mi…
Breaking: SpaceX said it would buy Cursor for $60 billion, striking a massive deal for an autonomous coding agent shortly after its blockbuster IPO on.wsj.com/4xDAULx
The Wet Sock Cosmology: What SFT Overfitting Actually Looks Like - and Why It Seduces You
How 9 extra epochs turned a language model into the most convincing kind of broken
promptinjection.net/p/the-wet-sock…
Diffusion Gemma is 4x faster, but makes 6x more mistakes!
We benchmarked the new diffusion LLM against its autoregressive twin on a single H100 (FP8). We gave each the same three tasks: write a Steve Jobs biography, the history of Tetris, and the story of BeOS - every next topic less popular than the previous one. Then we fact-checked every claim in every answer.
Gemma4 got 45 facts right, 5 wrong. DiffusionGemma got 33 right, 28 wrong. The less popular the topic, the worse it got: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS. It named Clara Clley as Steve Jobs' mother, invented a colleague for Pajitnov named Geri Gulovik and priced the BeBox at $9,999. The real one cost $1,600.
Outputs:
Gemma4 26B A4B: 218 tok/s · 15.1s total · 45 facts · 5 mistakes
DiffusionGemma 26B A4B: 763 tok/s · 3.7s total · 33 facts · 28 mistakes
The reason is simple. DiffusionGemma throws 256 tokens on the screen at once and polishes them pass after pass until the text sounds smooth. Smooth is all it cares about: a fake name, date or number sounds just as smooth as a real one, so it stays. Regular Gemma4 meanwhile writes one word at a time and checks every new word against everything before it. Google says it themselves in the launch post: quality is lower, use regular Gemma 4 when facts matter.
MTP is awesome especially for the dense models. It makes the 31B dense very usable even on slower hardware.
For the MoE ones improvement usally stays much smaller (+10-20%)
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️
MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.
Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.
GGUFs + Guide: unsloth.ai/docs/models/mtp
Meet DiffusionGemma ⚡ Our latest experimental open model (Apache 2.0) that generates text up to 4x faster.
Instead of predicting and typing just one word at a time like most language models, it drafts and refines entire blocks of text simultaneously.
Here’s how it works 🧵 ↓
Jokingly asked Fable to build me Crysis in Three.js.
It may not be Crysis, but the fact this is all done procedurally in basically one shot is kind of blowing my mind right now.
Gemma 4 MTP just got officially merged into llama.cpp
This means you can use Gemma 4 QAT + MTP for a lightweight + super fast setup. Excited to see what the community builds with it
github.com/ggml-org/llama…
If ROCmFP4 has helped you or you have found it useful please consider a star on github. Its my first ambitious undertaking in open source and its been pushing models into territory not initially possible.
github.com/charlie12345/r…
Also if you find other githubs out there from others who have been useful star them as well. It means something to those creators. When it comes to open source we are all in this together. Sharing, Contributing, Giving a Star 🌟 helps.
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: huggingface.co/unsloth/gemma-…
Guide: unsloth.ai/docs/models/ge…
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier
MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy.
The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese.
See more details below ⬇️
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
172K Followers 22K FollowingScale your business with https://t.co/S7ETrP8icC
We engineer agile growth systems that move at cheetah speed.
Award-Winning Growth Marketing Agency.
486 Followers 4K FollowingOtávio é esquerdo, apesar de destro.
Ah, sim, também luto pelo Flamengo da Gente, pela Umbanda, pela cultura open source e pela popularização do Linux.
0 Followers 19 FollowingContext Guard is a reverse proxy for LLM applications. It detects prompt injection, role hijacking, and data exfiltration in real time.
856 Followers 1K FollowingCo-founder at @distil_labs fine tuning small language models from a prompt. Formerly ML lead @amazon and PhD student at @imperialcollege
5 Followers 13 FollowingAI security scanner. Detect prompt injection in prompts, files, and websites before they reach your LLM. Free at https://t.co/PixUe973lH
2K Followers 1K FollowingResearch Scientist @Apple MLR on #machine_learning understanding and robustness. @ELLISforEurope member. Previously at ServiceNow and Element AI in Montréal.
3K Followers 1K FollowingSenior research scientist at @NVIDIAAI working on 3D representations for geometric deep learning. PhD in ML, Vision, and Graphics from NYU. Opinions are my own.
6K Followers 235 FollowingAI Researcher: From theory to practice (and back)
Postdoc @MetaAI with @ylecun
PhD @RiceUniversity with @rbaraniuk
Masters @ENS_Ulm @Paris_Sorbonne
3K Followers 805 FollowingDriving the automation of AI Research. Co-Founder @inherent_labs. Ex @GoogleDeepMind. PhD @SchmidhuberAI. @UCL, @HPI_DE alumnus.
168 Followers 328 FollowingStudent Researcher at @GoogleDeepMind. PhD Student at LMU Munich (@AIML_LMU). Focus on alignment / human preferences. Opinions my own.
2K Followers 2K FollowingReporter @TheInformation | I write all things Tesla and SpaceXAI | Bylines @businessinsider, @business, @forbes, @michigandaily | Tips via Signal at gracekay.11
3K Followers 7K FollowingDatabases | Kubernetes | Compilers | EM @ClickHouseDB | https://t.co/12g8segwjE | https://t.co/worpODcnLt | Prev: AI Research @TomTom | Opinions my own