Microsoft Launches Mu for On-Device Windows AI

Microsoft introduces Mu, a compact and efficient AI model optimized for on-device performance, now enhancing natural language interactions in Windows Settings.

Windows has unveiled Mu, an innovative, micro-sized language model designed to enhance user experience through seamless AI-powered integration in Windows Settings.

The model is built to handle complex input-output relationships, making it ideal for applications that require efficient performance and real-time responses, such as mapping natural language queries to system settings.

Available to Windows Insiders in the Dev Channel using Copilot+ PCs, Mu enhances the agent in Windows Settings, providing an intuitive interface for users.

Mu's Design: Optimized for Edge Devices

Mu is a 330-million-parameter encoder-decoder model optimized for running on Neural Processing Units (NPUs) found in Copilot+ PCs. It leverages a transformer architecture, where the encoder processes the input into a fixed-length representation, and the decoder generates output tokens based on that representation.

This design approach allows Mu to significantly reduce computational overhead by performing a one-time encoding, which reduces memory and computation requirements compared to traditional decoder-only models.

On hardware like the Qualcomm Hexagon NPU, Mu achieves a 47% reduction in first-token latency and a 4.7 times increase in decoding speed compared to similar models, making it a perfect fit for on-device real-time applications.

Tuning for Hardware Efficiency

Mu’s architecture is optimized for NPUs by aligning layer dimensions with tensor sizes and vector units, ensuring efficient computation. It also shares weights between input and output embeddings to save memory and maintain consistency.

By restricting operations to NPU-optimized functions, Mu can fully utilize the acceleration capabilities of the hardware, enabling fast, low-latency inferences for real-world applications.

Mu boosts performance using key transformer upgrades like Dual LayerNorm for stable training, RoPE for better long-context handling, and Grouped-Query Attention to cut memory use and latency, delivering speed and accuracy in a compact model.

Training and Fine-Tuning for Optimal Accuracy

Mu was trained on A100 GPUs via Azure Machine Learning using high-quality educational data and distilled from Microsoft’s Phi models for greater efficiency. Fine-tuned for tasks like SQUAD and Windows Settings, Mu delivers strong performance despite its compact size of a few hundred million parameters.

It was also optimized for on-device use through post-training quantization, converting weights and activations to 8- and 16-bit formats. This reduced memory use and boosted speed, achieving over 200 tokens per second, while maintaining accuracy across diverse hardware.

Also read: Microsoft Quietly Revolutionizes Indian Classrooms with AI-Powered Tools and Partnerships

Optimizing the Windows Settings Agent

Mu powers the AI agent in Windows Settings, enabling real-time system changes through natural language. Fine-tuned with millions of samples, it now delivers fast, accurate responses under 500ms.

Mu was trained to interpret diverse and ambiguous user queries, like “Increase brightness,” by prioritizing commonly used settings. It uses semantic search for short inputs and triggers precise actions for complex ones.