Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
Orthrus-Qwen3, released on GitHub by developer chiennv2000, is an inference engine designed specifically for the Qwen3 family of language models. The key innovation is a speculative decoding approach that uses a custom draft model to predict multiple tokens per forward pass, achieving up to 7.8× speedup in tokens per forward pass compared to standard autoregressive decoding. Importantly, the output distribution is identical to the original model, meaning no loss in quality or accuracy. The engine is open-source and available on GitHub, targeting developers who need faster inference for Qwen3 models in production or research settings. The repository includes benchmarks and usage instructions.
Developers can run Qwen3 models significantly faster without sacrificing output quality.