MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
MiniMax has released M3, a new foundation model that introduces the MiniMax Sparse Attention (MSA) architecture as its core technical contribution. MSA is designed to reduce the computational cost typically associated with long-context processing, allowing the model to operate over context windows of up to one million tokens without the quadratic scaling problems that standard attention mechanisms encounter at that length. For tasks involving long documents, extended conversations, or large codebases, this is a meaningful practical improvement.
The model is natively multimodal, meaning image and video understanding are built into the architecture rather than added through separate modules or adapter layers. This approach tends to produce more coherent cross-modal reasoning compared to systems where vision capabilities are bolted on after the fact. MiniMax M3 also supports computer use, which refers to the ability to interpret and interact with graphical interfaces - a capability that has become a point of competition among frontier model developers over the past year.
Agentic coding is another stated focus of M3. This goes beyond standard code completion, involving the model taking multi-step actions within a development environment - running code, reading output, and iterating toward a goal. The combination of a long context window and agentic coding support makes M3 particularly relevant for software development workflows where a model needs to hold large amounts of code or documentation in view simultaneously.
MiniMax is a Chinese AI company that has previously released image and video generation models, including the Hailuo video generation system. M3 represents a shift toward general-purpose language and reasoning capabilities alongside the company's existing media generation work. The release adds another capable long-context model to a space that includes offerings from Anthropic, Google, and others, and the MSA architecture is worth watching as a potential approach to making million-token context practical at inference time rather than just theoretically supported.

