What if an AI health chatbot told you to stay home when you actually needed emergency care?
In this episode, we put ChatGPT Health under the microscope using a clinician-authored evaluation designed to test a critical question: can an AI safely guide people on whether to go to the ER, visit urgent care, or wait it out at home? The results reveal a troubling pattern. When symptoms fall into the “middle” of the medical spectrum—uncertain but stable—the model often sounds helpful and reasonable. But when the stakes rise and subtle warning signs matter most, its judgment becomes unreliable.
We explore how ChatGPT Health is positioned as a privacy-focused workspace that can read personal medical records, summarize visit notes, and translate complex information into plain language. Those capabilities can be valuable for education and preparation. But triage is a different challenge entirely. It requires causal reasoning, clear thresholds, and a bias toward catching the worst-case scenario before it’s too late.
Two case studies highlight the gap. In an asthma scenario involving rising carbon dioxide, low oxygen levels, and poor peak flow—signals that should trigger urgent care—the model labeled the situation as only moderate. In diabetes, where the difference between routine high blood sugar and life-threatening diabetic ketoacidosis demands careful nuance, templated guidance struggled to capture the clinical reality.
The most concerning findings emerged around suicidality. Crisis response protocols are explicit: when someone expresses intent or a plan, escalation and connection to the 988 crisis line should happen immediately. Yet in several scenarios with explicit plans, those prompts never appeared—while more ambiguous statements did trigger them. Safety in healthcare can’t be optional or probabilistic.
We break down why large language models tend to gravitate toward the statistical middle, why medicine often lives in the dangerous “long tail,” and what this means for anyone using AI health tools today. AI can help you prepare for care, understand medical information, and ask better questions. But decisions about whether to seek urgent help still demand human judgment—and clear, non-negotiable safety guardrails.
If this conversation resonates, follow the show, share the episode with someone exploring health tech, and leave a quick review telling us one takeaway you had. What safety rule would you hard-code into an AI health system?
Reference:
ChatGPT Health performance in a structured test of triage recommendations
Ashwin Ramaswamy et al.
Nature (2026)
Credits:
Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/