Skip to content

nvidia-cosmos/cosmos-predict1

Repository files navigation

Important

Cosmos 3 is NVIDIA's next-generation foundation model platform for Physical AI. Compared with Cosmos-Predict1, Cosmos 3 delivers significantly stronger world prediction capabilities, producing more accurate, coherent, and physically grounded future-state predictions across a wide range of environments and embodiments.

Beyond improving prediction quality, Cosmos 3 unifies capabilities that previously required multiple specialized models. A single Cosmos 3 model can reason, predict future world states, transfer across domains and modalities, and generate actions and policies for embodied agents within one unified architecture.

This repository is no longer under active development and will receive only limited maintenance updates. Future model releases, features, documentation, and community support will be focused on Cosmos 3.

👉 Visit the new Cosmos home: https://github.com/NVIDIA/Cosmos

There you will find the latest Cosmos 3 models, technical reports, tutorials, benchmarks, and ecosystem updates.

Thank you for your support of Cosmos-Predict1. We encourage all users to migrate to Cosmos 3 for the latest state-of-the-art Physical AI capabilities.

Cosmos-Predict1 is a key branch of Cosmos World Foundation Models (WFMs) specialized for future state prediction, often referred to as world models. The tree main branches of Cosmos WFMs are cosmos-predict, cosmos-transfer, and cosmos-reason. We visualize the architecture of Cosmos-Predict1 in the following figure.

Cosmos-Predict1 Architecture Diagram

Cosmos-Predict1 includes the following:

  • Diffusion-based world foundation models for Text2World and Video2World generation, where a user can generate visual simulation based on text prompts and video prompts.
  • Autoregressive-based world foundation models for Video2World generation, where a user can generate visual simulation based on video prompts and optional text prompts.
  • Image and video tokenizers for tokenizing videos into continuous tokens (latent vectors) and discrete tokens (integers) efficiently and effectively.
  • Post-training scripts for helping Physical AI builders post-train pre-trained Cosmos-Predict1 for their applications.

News

Example Model Behavior

Cosmos-Predict Text2World

428228630-b001966c-5f5e-4927-a3fe-44d142dd0ab1.mp4

Cosmos-Predict Video2World

428228629-0bbba982-c6fd-4388-a46f-bf91ce4099ad.mp4

Getting Started

We provide a comphrehensive set of examples to illustrate how to perform inference, post-training, etc, with Cosmos-Predict1. Click a relevant example below and start your Cosmos journey.

Installation

Please refer to INSTALL.md for general instructions on environment setup.

Inference with pre-trained Cosmos-Predict1 models

Post-train pre-trained Cosmos-Predict1 models

Inference with post-trained models:

Cosmos-Predict1 Models

Cosmos-Predict1 include the following models

Diffusion models

Autoregressive models

Tokenizers

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

This model includes safety and content moderation features powered by Llama Guard 3. Llama Guard 3 is used solely as a content input filter and is subject to its own license.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

About

Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors