gen‑ai.news
← Back
Video

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI has released grok-imagine-video-1.5-preview, a new image-to-video model built into its Grok Imagine platform. The model takes a still image as input and, guided by a text prompt, generates video footage at resolutions up to 720p. The preview label suggests the release is still in an early testing phase, with a more polished version likely to follow.

One notable feature is the ability to stitch multiple generated clips together into longer scenes. This kind of multi-clip composition is increasingly common in AI video tools, allowing users to build more complex sequences rather than being limited to a single short output. It also gives creators more control over pacing and narrative without needing to export and edit clips in a separate application.

The image-to-video format - where a reference image anchors the visual style and content of the resulting footage - has become a standard approach among AI video generators. It gives users a concrete starting point and tends to produce more predictable results than generating video purely from text. Competitors including Runway, Kling, and Pika have offered similar capabilities for some time, so xAI is entering a fairly established segment of the market.

The 720p output resolution is functional for web use but falls short of the 1080p or higher outputs now available from several competing platforms. Whether xAI plans to raise the resolution ceiling in future updates remains to be seen. For now, the 1.5 preview positions Grok Imagine as a more complete creative tool than its earlier text-to-image-only iteration, though it will need continued development to stand alongside the more mature offerings already on the market.

Enjoy this story? Get the next one in your inbox.

Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.

Free. Unsubscribe any time. No spam, ever.

Your next read

Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.