Google’s New Gemini Omni AI Video Model Can Do Crazy Things
Google's new Gemini Omni artificial intelligence (AI) model can do some wild things. The model's key promise is to create anything from, well, anything. [Read More]
Google's new Gemini Omni artificial intelligence (AI) model can do some wild things. The model's key promise is to create anything from, well, anything. [Read More]
Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.
Free. Unsubscribe any time. No spam, ever.

xAI has updated its Grok Imagine system to version 1.5, adding an image-to-video model that converts still images into short video clips at up to 720p resolution. The new model accepts text prompts to guide motion and style, and multiple generated clips can be joined into longer sequences.

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.