May 23, 2026Video

Google Genie 3 Turns Street View Pins Into Walkable AI-Generated Worlds

Google DeepMind has demonstrated a new application of its Genie 3 world model that uses Street View imagery as a generation source. A user selects any location on a map, and Genie 3 produces a walkable, three-dimensional environment derived from the real-world visual data at that point. The result is explorable rather than static, distinguishing it from simple image generation or photogrammetry.

Genie 3 is a world model - a class of model trained to predict how environments change in response to actions, rather than simply generating images or video. Grounding it in Street View gives it access to a corpus of real-place imagery with consistent geographic and structural properties that a purely synthetic training set would lack. For the demo use case of explorable environments, that grounding means generated worlds reflect actual urban geometry, vegetation patterns, and lighting conditions rather than stylised approximations.

The creative applications are the obvious immediate interest: location scouting, game world generation, and interactive storytelling all have clear use for a tool that can produce explorable versions of real places. But DeepMind's framing points elsewhere. World models trained on real-place data are directly useful for training AI agents and robots that need to navigate physical environments. Street View's global coverage - accumulated over more than a decade - becomes a strategic asset for that kind of embodied AI training.

The combination of a generative world model with a proprietary geographic dataset is a meaningful competitive advantage that would be difficult for smaller labs to replicate. How far Google develops the consumer-facing creative applications versus prioritising the robotics and agent training angle will shape how this technology is perceived over the next year.

Read at The Decoder →

Share:X

Your next read

June 4, 2026Video

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI has updated its Grok Imagine system to version 1.5, adding an image-to-video model that converts still images into short video clips at up to 720p resolution. The new model accepts text prompts to guide motion and style, and multiple generated clips can be joined into longer sequences.

June 3, 2026Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

June 1, 2026Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.

Enjoy this story? Get the next one in your inbox.

Your next read

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot