May 30, 2026Image

Microsoft readies new MAI voice and image models for Build 2026

Microsoft is reportedly preparing to unveil several new models at its Build 2026 developer conference, all grouped under the MAI branding the company has been building out in recent months. The models include MAI-Image-2.5, a generative image model, MAI-Transcribe-1.5 for speech-to-text tasks, and MAI-Voice-2, which is described as supporting multiple languages.

The MAI line represents Microsoft's effort to develop AI models in-house rather than relying entirely on third-party providers, including its close partner OpenAI. Earlier MAI models were introduced quietly through Azure AI Foundry and Microsoft's API offerings, positioning them as practical tools for enterprise developers rather than consumer-facing products.

A multilingual voice model is particularly notable given the competitive landscape in speech synthesis and real-time translation. Accurate, natural-sounding voice generation across languages remains a difficult problem, and enterprise demand for such capabilities in products like Teams, Copilot, and customer service tooling is significant. MAI-Transcribe-1.5 likewise fits into Microsoft's broader push to improve real-time and asynchronous transcription across its productivity suite.

Build 2026 is shaping up to be a dense event for AI announcements, and the MAI model family will likely be positioned as part of Microsoft's Azure AI platform strategy. Developers using Azure AI Foundry would be a primary audience, with these models potentially available through standard API access shortly after the event. Whether the image model competes directly with offerings like DALL-E or takes a different approach - such as focusing on editing or enterprise document workflows - remains to be seen ahead of the official unveiling.

Read at TestingCatalog →

Share:X

Your next read

No image

June 3, 2026Image

Google’s Dreambeans, its weirdest-named AI tool to date, will turn your life into a cartoon

Google has introduced Dreambeans, a tool that pulls personal data from your Google account to generate AI-illustrated stories in a cartoon style. The feature represents a notable step toward using ambient personal data - photos, calendar events, and similar account content - as direct source material for generative image output. It is, by most measures, one of the more unusually named products Google has shipped.

June 3, 2026Image

A British MP is suing to see if xAI is legally responsible for the images Grok produces

A British MP has filed a lawsuit against xAI to establish whether the company bears legal responsibility for images generated by its Grok AI system. The case is part of a broader wave of scrutiny that includes investigations in the EU, the UK, and California. At issue is how far platform liability extends when an AI image generator produces harmful or problematic content.

June 3, 2026Image

Ideogram 4.0 drops as an open-weight model with native 2K resolution and improved text rendering

Ideogram has released version 4.0 of its text-to-image model as an open-weight release, featuring native 2K resolution output, bounding box layout control, and refined text rendering. On the DesignArena leaderboard, it leads all open models, sitting just below closed systems from OpenAI and Google. Commercial use of the weights requires a paid license.

Enjoy this story? Get the next one in your inbox.

Your next read

Google’s Dreambeans, its weirdest-named AI tool to date, will turn your life into a cartoon

A British MP is suing to see if xAI is legally responsible for the images Grok produces

Ideogram 4.0 drops as an open-weight model with native 2K resolution and improved text rendering