OpenAI Introduces HealthBench

OpenAI has unveiled HealthBench, a major initiative focused on improving how artificial intelligence responds to health-related questions. This open-source dataset is the company’s first major step into health care beyond external collaborations. HealthBench is designed to rigorously assess AI-generated medical advice and boost accuracy in digital health interactions.

HealthBench Sets New Benchmarks in Medical AI Testing

Developed with input from 262 physicians across 60 countries, HealthBench includes 5,000 simulated conversations. These scenarios test how effectively AI models handle medical inquiries. Each response is reviewed using rubrics written by physicians, ensuring evaluations are grounded in clinical judgment.

The dataset spans 26 medical specialties, from ophthalmology to neurology, and supports 49 languages, including Amharic and Nepali. Such diversity enables a realistic, global measure of AI performance in health care contexts.

HealthBench doesn’t just test language understanding — it pushes models to demonstrate real-world clinical reasoning. In one test case, an AI is asked how to respond when a 70-year-old is found unresponsive. The model advises calling emergency services, checking for breathing, and clearing airways. This response receives a 77% score after evaluation.

Also read: Google Launches AI Futures Fund to Back Promising Startups

GPT-4.1 Leads in Model Performance

OpenAI’s internal GPT-4.1 model, referred to as the o3 reasoning model, topped the performance charts. It scored 60%, outpacing Grok by xAI (54%) and Google’s Gemini 2.5 Pro (52%). All responses were scored automatically by GPT-4.1 using medically weighted rubrics.

This comparative data provides a clear look at how different AI systems perform under medical pressure. By setting a high-quality baseline, HealthBench can help guide future improvements in AI for health.

Beyond Health: OpenAI Expands ChatGPT’s Shopping Tools

Alongside HealthBench, OpenAI also enhanced ChatGPT’s web search feature. The tool now offers personalised product recommendations and is free for all users. This move strengthens OpenAI’s search capabilities, bringing it into more direct competition with platforms like Google.

With AI advancing in both shopping and health care, OpenAI is working to ensure its tools are as helpful and reliable as possible.

Related Topics

AI in Healthcare