Stability AI and Arm Launch Lightweight Text-to-Audio Model for Edge Devices

Stability AI and Arm launch Stable Audio Open Small, a compact text-to-audio model optimized for real-time sound generation on edge and mobile devices, Image Credit: X | @StabilityAI

Stability AI has unveiled a new open-source text-to-audio model, developed in partnership with Arm, designed specifically for edge and mobile applications. Called Stable Audio Open Small, this compact artificial intelligence (AI) model enables quick generation of short audio clips from text prompts.

Lightweight and Optimized for Performance

Announced at Mobile World Congress 2025, Stable Audio Open Small is built for speed and efficiency. According to Stability AI, the model runs entirely on Arm CPUs and delivers high-quality output in less than eight seconds. It generates up to 11 seconds of audio, making it ideal for fast-turnaround creative tasks like drum loops, instrument riffs, and ambient textures.

With just 341 million parameters, the model is small enough to run on smartphones and edge devices, allowing developers and creators to bring real-time audio synthesis to apps and embedded systems.

Also read: Meta Unveils Latest Llama Protection Tools to Secure AI Applications

Built on Proven Architecture

The model uses a latent diffusion architecture based on transformers and was trained on 486,492 licensed audio samples. Stability AI used a T5 model for text conditioning and enhanced the generation speed using an Adversarial Relativistic-Contrastive (ARC) post-training algorithm.

This combination improves the model’s ability to follow prompts while keeping inference times short. The focus on prompt adherence ensures creators get more accurate results with fewer retries.

Free and Open-Source for All Users

Stable Audio Open Small is available under the Stability AI Community License, permitting both commercial and non-commercial use. Developers can download the model weights from Hugging Face, and access the code base on GitHub.

A Step Toward Real-Time Creativity

By shrinking the model size without sacrificing capability, Stability AI and Arm are enabling a new class of AI-powered creative tools. Whether building music apps or embedded sound engines, developers now have access to a powerful text-to-audio model optimized for real-time performance.

This launch reinforces Stability AI’s commitment to open innovation and pushes the boundaries of what's possible with on-device AI audio generation.

Related Topics

Large Language Models (LLMs)LLMs