Hacker NewsSunday · May 17, 2026FREE

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

qwen3inferencespeculative-decodingopen-source

Orthrus-Qwen3, released on GitHub by developer chiennv2000, is an inference engine designed specifically for the Qwen3 family of language models. The key innovation is a speculative decoding approach that uses a custom draft model to predict multiple tokens per forward pass, achieving up to 7.8× speedup in tokens per forward pass compared to standard autoregressive decoding. Importantly, the output distribution is identical to the original model, meaning no loss in quality or accuracy. The engine is open-source and available on GitHub, targeting developers who need faster inference for Qwen3 models in production or research settings. The repository includes benchmarks and usage instructions.

// why it matters

Developers can run Qwen3 models significantly faster without sacrificing output quality.

Sources

Primary · Hacker News

▸ Read original at github.com

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

Sources

Like this? Get the next digest.