Rafay Systems Launches Serverless Inference for AI Applications

Rafay Systems unveils its Serverless Inference platform, empowering GPU cloud providers to deliver scalable, token-based AI services—ushering in the shift from GPU-as-a-Service to AI-as-a-Service for next-gen GenAI applications.

Rafay Systems, a leader in cloud-native and AI infrastructure orchestration, has officially launched its Serverless Inference offering, providing a token-metered API for running open-source, privately trained, or tuned large language models (LLMs).

The offering enables NVIDIA Cloud Providers (NCPs) and GPU Clouds to offer multi-tenant, self-service consumption of AI applications and compute, eliminating the complexity of managing infrastructure for GPU-based systems.

Empowering GPU Cloud Providers in the GenAI Market

Rafay’s Serverless Inference allows NCPs and GPU Clouds to tap into the growing global AI inference market, expected to reach $106 billion by 2025 and $254 billion by 2030.

With this solution, these providers can now rapidly launch GenAI models, manage infrastructure seamlessly, and generate billing data for on-demand usage, all without additional cost. By automating provisioning, segmentation, and governance of complex infrastructure, Rafay is enabling its partners to provide a turnkey service for enterprises and developers, accelerating the build and scale of AI applications.

A Seamless and Scalable AI Solution

Rafay’s offering delivers key capabilities tailored for Neural Compute Platforms (NCPs) and GPU Clouds, streamlining AI infrastructure management and integration. It supports developer workflows through OpenAI-compatible APIs, enabling zero-code migration and secure, RESTful endpoints that simplify AI application deployment.

Its intelligent infrastructure features dynamic auto-scaling of GPU nodes, optimizing performance and resource use without over-provisioning. Rafay also enables detailed metering and billing through token- and time-based tracking, allowing for transparent, consumption-based pricing.

Enterprise-grade security is ensured with HTTPS-only endpoints, token authentication, and customizable quotas, meeting stringent compliance needs. Additionally, comprehensive performance monitoring provides full observability into model behavior and infrastructure health through detailed logs.

Also read: Protopia AI and Lambda Partner to Secure AI Inference Data

A Shift to AI-as-a-Service

Rafay’s Serverless Inference offering enables a significant shift from GPU-as-a-Service to AI-as-a-Service. By providing the infrastructure needed for fast and scalable GenAI workflows, Rafay is paving the way for NCPs and GPU Clouds to meet the rising demand for AI applications without the need for complex infrastructure management.

The Serverless Inference solution is available for Rafay customers and partners, and fine-tuning capabilities will be rolled out soon, further enhancing the service’s ability to deliver high-margin, production-ready AI services.

Rafay Systems Launches Serverless Inference for AI Applications

Empowering GPU Cloud Providers in the GenAI Market

A Seamless and Scalable AI Solution

Also read: Protopia AI and Lambda Partner to Secure AI Inference Data

A Shift to AI-as-a-Service

Related Topics