gen‑ai.news
← Back
Video

YouTube Shorts Gets AI Remix Feature Powered by Gemini Omni

YouTube has rolled out a remix capability for Shorts that uses Gemini Omni to let viewers transform videos they encounter on the platform. Accessed through a remix icon at the bottom of any eligible Short, the feature offers a "reimagine" option where users can prompt the model to apply a visual style - pixel art, anime, found-footage horror - or alter the actual content of the clip, adding background elements, changing costumes, or inserting the user into the scene.

The self-insertion capability is the most technically significant aspect. Users can place themselves into another person's Short, which requires the model to handle both style transfer and compositional editing at the same time. All output is watermarked with SynthID, connecting it to Google's broader provenance infrastructure announced at I/O.

Creator controls are built in: uploaders can toggle whether their videos are eligible for reimagining. This matters particularly for creators who post footage of children or content they do not want stylistically altered or repurposed, though the opt-out mechanism places the burden on creators rather than defaulting to protection.

The feature is a concrete consumer-facing application of Gemini Omni's video capabilities, and it introduces interactive AI video editing to a platform with enormous reach. How YouTube moderates the output - particularly given the potential for the self-insertion tool to be used in misleading or harassing ways - will be worth watching as the feature scales.

Read at The Verge →
Share:X

Enjoy this story? Get the next one in your inbox.

Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.

Free. Unsubscribe any time. No spam, ever.

Your next read

Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.