Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
NVIDIA has unveiled Cosmos 3, described as the first open omni-model for physical AI, capable of reasoning and taking actions in the physical world. The model integrates vision, language, and action modalities, allowing it to process visual input, understand natural language instructions, and generate motor commands for robots or autonomous systems. Cosmos 3 is available on Hugging Face under an open license, enabling developers to fine-tune it for specific tasks such as robotic manipulation, navigation, or autonomous driving. The model builds on NVIDIA's previous work in foundation models and aims to accelerate research in embodied AI. By providing a single model that handles perception, reasoning, and control, Cosmos 3 simplifies the pipeline for building physical AI systems, which traditionally required separate models for each component. The release includes pre-trained weights, inference code, and documentation. NVIDIA emphasizes that Cosmos 3 is designed for safe and responsible use, with guidelines for deployment.
Developers can now build physical AI systems with a single open model instead of multiple specialized ones.