Liquid AI reveals 8B-A1B MoE trained on 38T
Liquid AI unveiled LFM 2.5-8B-A1B, a mixture-of-experts (MoE) model with 8 billion total parameters but only 1 billion active per token. Trained on 38 trillion tokens, it outperforms comparable models like Gemma 2 9B and Llama 3.1 8B on benchmarks such as MMLU-Pro (68.4%), HumanEval (72.0%), and GSM8K (89.6%). The model uses a novel architecture with 32 experts and a top-2 routing mechanism, enabling efficient inference. It is available under a permissive license on Hugging Face and supports context lengths up to 32K tokens. Liquid AI claims the model achieves 2.5x better throughput than dense models of similar size. The release includes both base and instruction-tuned versions, with the latter optimized for chat and coding tasks.
Developers get a highly efficient MoE model that rivals larger dense models at a fraction of the compute cost.