SANA-WM, a 2.6B open-source world model for 1-minute 720p video
NVIDIA Research has open-sourced SANA-WM, a 2.6 billion parameter world model capable of generating one-minute-long 720p videos. The model, detailed in a paper and available on GitHub, uses a diffusion transformer architecture and can produce temporally consistent video sequences from text or image prompts. SANA-WM is designed for applications in autonomous driving, robotics, and video game development, where realistic simulation is crucial. The model is released under a permissive license and can be run on a single NVIDIA A100 GPU with 80GB memory, though inference takes several minutes per video. This release follows NVIDIA's previous work on world models like Cosmos and aims to democratize access to large-scale video generation for research and development.
Developers can now simulate realistic video data for training AI models without costly real-world capture.