May 23, 2026Video

Google's Gemini Omni Brings Any-Input Video Generation to Shorts, Workspace, and Beyond

Gemini Omni Flash is Google's most capable video-focused generative model to date, announced at Google I/O 2026. It accepts any combination of text, image, audio, and video as input and produces video output, with editing handled through natural-language conversation rather than timeline-based tools. Google positions the model as having stronger grounding in physics, historical context, and spatial reasoning than prior generations, which should reduce the kind of physically implausible motion that has made AI video easy to spot.

The model is already wired into YouTube Shorts through a new Remix feature. Users can tap a remix icon beneath any Short and prompt Gemini to restyle the clip - pixel art, anime, and found-footage horror are the demonstrated examples - or insert themselves and alter actors' appearances within the scene. Creators retain control: they can disable the remix option for their own uploads, and any generated clip carries a visible AI label alongside an embedded SynthID watermark.

Beyond consumer features, Gemini Omni is being integrated into Google Workspace through a new tool called Google Pics, aimed at image creation and editing inside productivity apps. A separate Asset Studio update adds multimodal generation to Google Ads, letting advertisers produce and test creative assets without leaving the campaign dashboard. The breadth of integrations suggests Google is treating Omni less as a standalone product and more as infrastructure spread across its existing surfaces.

The Verge's hands-on testing noted that the tools for producing realistic video require less effort than many expect, which raises the usual questions about misuse alongside the genuine utility. Google's embedding of SynthID at the output level is a direct response to that concern, though how reliably the watermark survives re-encoding and platform compression remains an open question.

Read at The Verge →

Share:X

Your next read

June 4, 2026Video

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI has updated its Grok Imagine system to version 1.5, adding an image-to-video model that converts still images into short video clips at up to 720p resolution. The new model accepts text prompts to guide motion and style, and multiple generated clips can be joined into longer sequences.

June 3, 2026Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

June 1, 2026Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.

Enjoy this story? Get the next one in your inbox.

Your next read

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot