Épisodes

  • #38 - Using AI Can Make You Look More Guilty In Court
    Apr 2 2026

    What happens when AI spots a dangerous finding on a scan and the radiologist disagrees? In theory, “human in the loop” sounds like the safeguard that keeps patients safe. In practice, it raises a far more uncomfortable question: when clinicians override AI, are they exercising sound judgment or exposing themselves to legal risk?

    We explore how AI image-reading tools are reshaping radiology and why performance metrics like “96% accurate” can be misleading in real clinical settings. False positives and false negatives do not carry the same consequences, and rare diseases can sharply reduce the real-world value of even highly capable models once prevalence and positive predictive value are taken into account. As these systems flag more normal scans, a new form of defensive medicine can emerge—one where repeatedly rejecting AI recommendations begins to feel professionally dangerous, especially when those recommendations are documented in the patient record.

    We also examine a study that placed laypeople in the role of jurors during malpractice scenarios involving missed diagnoses such as brain bleeds and lung cancer. The findings are revealing: when AI detects the pathology and the radiologist does not, jurors are more likely to assign blame. But when both the AI and the radiologist miss the finding, the physician gains little protection. The episode closes with what may actually reduce harm, including better education about the limitations of AI and a clearer understanding of these systems as imperfect clinical decision support—not a flawless second expert beside the clinician.

    References:

    Randomized Study of the Impact of AI on Perceived Legal Liability for Radiologists
    Bernstein, et al.
    NEJM AI


    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    23 min
  • #37 - Training A Neural Network On Toilet Photos
    Mar 26 2026

    What if a single smartphone photo could make colonoscopy prep more reliable? Colonoscopy can save lives through early detection of colorectal cancer, but its success depends on one stubborn detail: a clean colon. When bowel prep falls short, important findings can be missed, procedures can take longer, and patients may have to repeat the entire process. The question is simple but important: could there be an easier way for patients to know whether they are truly ready before heading to the clinic?

    In this episode, we explore research that puts artificial intelligence to work on exactly that problem. Using a smartphone app, patients take a photo of their final bowel movement and receive an immediate yes-or-no result about whether their preparation is adequate. We break down how the system works, from convolutional neural networks and expert clinician labeling to data augmentation that helps the model adapt to real-world conditions like poor lighting, different angles, and varying distances. We also unpack a key challenge in medical AI: overfitting, and why strong performance in a study does not always guarantee success in everyday use.

    The potential impact is significant. Patients in the intervention group achieved better bowel cleansing quality, suggesting a practical way to improve the consistency and effectiveness of colorectal cancer screening. At the same time, important questions remain about adenoma detection, repeat procedures, and how tools like this fit into clinical workflow. This is a fascinating example of AI solving a very human problem: reducing friction, improving preparation, and helping patients get the most out of an essential preventive test.

    References:

    An Artificial Intelligence-Guided Strategy to Reduce Poor Bowel Preparation: A Multicenter Randomized Controlled Study
    Gimeno-García et al.
    American Journal of Gastroenterology (2026)

    Design and validation of an artificial intelligence system to detect the quality of colon cleansing before colonoscopy
    Gimeno-García et al.
    Gastroenterology and Hepatology (2023)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    20 min
  • #36 - Should A Chatbot Ever Refuse To Reassure You
    Mar 19 2026

    What if the chatbot that always has an answer is actually making anxiety worse? For people living with obsessive-compulsive disorder (OCD), instant, endless reassurance can feel helpful in the moment while quietly strengthening the very cycle that keeps OCD going. In this episode, we explore why AI chatbots and large language models are designed to be responsive, agreeable, and supportive—and how those same qualities can unintentionally fuel reassurance seeking, compulsive checking, and avoidance instead of real relief.

    We break down OCD in clear, practical terms: intrusive thoughts trigger fear, compulsions bring temporary comfort, and that short-term relief reinforces the cycle over time. Whether it shows up as repeated handwashing, constant checking, or asking the same question again and again, OCD often centers on the desperate need to eliminate uncertainty. That is exactly where evidence-based treatment takes a different path. We discuss exposure and response prevention (ERP), the gold-standard therapy that helps people face doubt without falling back on rituals, and why a general-purpose chatbot may accidentally validate the opposite by offering reassurance, endorsing avoidance, or helping users “pivot” toward the answer they were hoping to hear.

    We also look at the broader mental health challenge now that people are already turning to AI for support. What responsibility do clinicians, AI companies, and regulators have? We argue that clinicians should ask directly about chatbot use, and we examine what meaningful guardrails might look like—from detecting repetitive reassurance loops to refusing to continue harmful patterns. Using a real-world germ-related prompting example, we show where chatbot advice can be useful and where it can slip into enabling OCD. This conversation will change how you think about AI, anxiety, and the line between support and harm.

    Reference:

    A transdiagnostic model for how general purpose AI chatbots can perpetuate OCD and anxiety disorders
    Golden and Aboujaoude
    Nature npj Digital Medicine (2026)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    19 min
  • #35 - How AI Image Generators Portray Substance Use Disorder
    Mar 12 2026

    What does an AI-generated image of addiction look like, and why does it so often default to darkness, isolation, and despair? As AI tools make it easier than ever to produce visuals for health education, those same tools can unintentionally reinforce stigma about substance use disorder.

    In this episode, we explore how AI image generators shape the way addiction is portrayed. Laura brings the perspective from emergency medicine and digital health, where substance use disorder is part of everyday clinical reality and where language and imagery can influence how patients are perceived. Vasanth breaks down the technical side, explaining how diffusion models create images by gradually denoising noise into structured visuals, guided by text prompts that steer what the model produces.

    That process is powerful, but it also means biases from internet training data and the connotations embedded in words can compound. The result? AI outputs that repeatedly frame addiction through dramatic “rock bottom” scenes, lone figures, and visual cues that unintentionally reinforce shame rather than understanding.

    We also look at research that systematically tests prompts and applies best-practice guidelines for more respectful depictions. The difference is striking: fewer stigmatizing signals, more human-centered imagery, and practical guardrails such as avoiding drug paraphernalia and moving beyond the isolated, ashamed figure. But sanitization has a price. For healthcare AI teams, the lesson is clear: visuals should be treated like clinical content, not decoration, with thoughtful review processes that protect dignity and support stigma-free health communication.

    Reference:

    AI-Generated Images of Substance Use and Recovery: Mixed Methods Case Study
    Heley et al.
    JMIR AI (2026)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    20 min
  • #34 - Inside ChatGPT Health: Promise, Peril, And Triage Failures
    Mar 5 2026

    What if an AI health chatbot told you to stay home when you actually needed emergency care?

    In this episode, we put ChatGPT Health under the microscope using a clinician-authored evaluation designed to test a critical question: can an AI safely guide people on whether to go to the ER, visit urgent care, or wait it out at home? The results reveal a troubling pattern. When symptoms fall into the “middle” of the medical spectrum—uncertain but stable—the model often sounds helpful and reasonable. But when the stakes rise and subtle warning signs matter most, its judgment becomes unreliable.

    We explore how ChatGPT Health is positioned as a privacy-focused workspace that can read personal medical records, summarize visit notes, and translate complex information into plain language. Those capabilities can be valuable for education and preparation. But triage is a different challenge entirely. It requires causal reasoning, clear thresholds, and a bias toward catching the worst-case scenario before it’s too late.

    Two case studies highlight the gap. In an asthma scenario involving rising carbon dioxide, low oxygen levels, and poor peak flow—signals that should trigger urgent care—the model labeled the situation as only moderate. In diabetes, where the difference between routine high blood sugar and life-threatening diabetic ketoacidosis demands careful nuance, templated guidance struggled to capture the clinical reality.

    The most concerning findings emerged around suicidality. Crisis response protocols are explicit: when someone expresses intent or a plan, escalation and connection to the 988 crisis line should happen immediately. Yet in several scenarios with explicit plans, those prompts never appeared—while more ambiguous statements did trigger them. Safety in healthcare can’t be optional or probabilistic.

    We break down why large language models tend to gravitate toward the statistical middle, why medicine often lives in the dangerous “long tail,” and what this means for anyone using AI health tools today. AI can help you prepare for care, understand medical information, and ask better questions. But decisions about whether to seek urgent help still demand human judgment—and clear, non-negotiable safety guardrails.

    If this conversation resonates, follow the show, share the episode with someone exploring health tech, and leave a quick review telling us one takeaway you had. What safety rule would you hard-code into an AI health system?

    Reference:

    ChatGPT Health performance in a structured test of triage recommendations
    Ashwin Ramaswamy et al.
    Nature (2026)


    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/



    Afficher plus Afficher moins
    25 min
  • #33 - Patients Don’t Talk Like Textbooks
    Feb 26 2026

    What if the most confident answer in the room is also the most misleading?

    Large language models can ace medical exams, yet falter when faced with a real person’s messy, incomplete story. In this episode, we explore how that gap plays out in one of medicine’s highest-stakes decisions: triage. Drawing on Laura’s experience in emergency medicine and Vasanth’s background in AI research, we unpack a new study where laypeople role-played both routine and high-risk conditions and turned to leading LLMs for advice. The surprising twist? Tiny shifts in phrasing produced opposite recommendations—“rest at home” versus “go to the ER”—revealing how sensitive these systems are to prompts, and how an agreeable tone can drown out critical clinical signals.

    We take you inside the exam room to contrast what clinicians actually do. Real diagnosis isn’t a single question and answer—it’s an evolving process. Doctors gather a history that unfolds with each response, test competing hypotheses, and scan for subtle red flags and nonverbal cues that never show up in a chat window. From the ominous “worst headache of my life” to abdominal pain that could signal gallstones—or a heart attack—Laura explains how risk-first thinking and strategic follow-ups shape safe decisions. Meanwhile, Vasanth breaks down how preference-tuned models are trained to satisfy users, not challenge them—and why linguistic confidence can increase even as clinical accuracy declines. The study’s findings are sobering: models struggled to identify key conditions, and their triage decisions were no better than basic symptom checkers.

    But this isn’t a story of hype or doom—it’s about design. Reliable medical AI must interrogate before it interprets. That means structured red-flag checks, resistance to user-led anchors like “maybe it’s just stress,” and clear, actionable next steps instead of overwhelming option lists. Calibrated uncertainty, transparent reasoning, and human oversight can transform AI from a risky decider into a valuable assistant.

    If you care about digital health, safe triage, and the future of human-AI collaboration in medicine, this conversation offers a grounded look at both the limits—and the real promise—of these tools.

    If this episode resonated, follow the show, share it with a colleague, and leave a quick review to help more listeners discover Code and Cure.


    Reference:

    Reliability of LLMs as medical assistants for the general public: a randomized preregistered study
    Andrew M. Bean et al.
    Nature Medicine (2026)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/


    Afficher plus Afficher moins
    30 min
  • #32 - When Data Isn’t Better: Rethinking Fertility Tracking
    Feb 19 2026

    What if the most reliable ways to track fertility are also the simplest? In this episode, we examine the science of ovulation timing and hold modern wearables to a high standard, comparing passive temperature and vital sign data with established methods like LH surge testing and cervical mucus observation. Drawing on perspectives from a cognitive scientist and an emergency physician, we explain what each method actually measures, how well it performs outside the lab, and where convenience falls short of accuracy.

    We begin by clarifying the fertile window and the underlying physiology, then connect that biology to signals people can track at home. Changes in cervical mucus provide a strong, real time indicator of peak fertility. Urine LH strips offer a clear 24 to 36 hour advance signal at low cost. Basal body temperature can confirm that ovulation has already occurred, but it is less helpful for predicting timing in advance. Against this foundation, we review a meta analysis of wearable data showing that temperature remains the strongest predictor, while heart rate and variability contribute only modest improvements. The conclusion is straightforward: wearables can approximate existing signals, but they do not clearly outperform simple tools for timing intercourse, insemination, or pregnancy avoidance.

    Along the way, we challenge the idea that more data and a paid app automatically lead to better outcomes. We weigh privacy risks, cost, and false confidence against the accessibility of test strips and the high signal value of mucus observations. The takeaway is a practical hierarchy. Use LH strips and cervical mucus as primary guides, add calendar context and basal temperature if useful, and treat wearables as optional conveniences rather than a definitive solution. Women’s health deserves thoughtful innovation, and sometimes real progress comes from choosing what works, not what is marketed most aggressively.

    If this episode resonated, follow the show, share it with a friend navigating fertility, and leave a review with your experience and what has worked best for you.

    Reference:

    The diagnostic accuracy of wearable digital technology in detecting fertility window and menstrual cycles: a systematic review and Bayesian network meta-analysis
    Yue Shi et al.
    Nature NPJ Digital Medicine (2026)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    20 min
  • #31 - How Retrieval-Augmented AI Can Verify Clinical Summaries
    Feb 12 2026

    Fluent summaries that cannot prove their claims are a hidden liability in healthcare, quietly eroding clinician trust and wasting time. In this episode, we walk through a practical system that replaces “sounds right” narratives with evidence-backed summaries by pairing retrieval augmented generation with a large language model that serves as a judge. Instead of asking one AI to write and police itself, the work is divided. One model drafts the summary, while another breaks it into atomic claims, retrieves supporting chart excerpts, and issues clear verdicts of supported, not supported, or insufficient, with explanations clinicians can review.

    We explain why generic summarization often breaks down in clinical settings and how retrieval augmented generation keeps the model grounded in the patient’s actual record. The conversation digs into subtle but common failure modes, including when a model ignores retrieved evidence, when a sentence mixes correct and incorrect facts, and when wording implies causation that the record does not support. A concrete example brings this to life: a claim that a patient was intubated for septic shock is overturned by operative notes showing intubation for a procedure, with the system flagging the discrepancy and guiding a precise correction. That is not just higher accuracy; it is accountability you can audit later.

    We also explore a deeper layer of the problem: argumentation. Clinical care is not just a list of facts, but the relationships between them. By evaluating claims alongside their evidence, surfacing contradictions, and pushing for precise language, the system helps generate summaries that reflect real clinical reasoning rather than confident guessing. The payoff is less time spent chasing errors, more time with patients, and a defensible trail for quality review and compliance.

    If you care about chart review, clinical documentation, retrieval augmented generation, and building AI systems clinicians can trust, this episode offers practical takeaways.

    Reference:

    Verifying Facts in Patient Care Documents Generated by Large Language Models Using Electronic Health Records
    Philip Chung et al.
    NEJM AI (2025)

    Credits:

    Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 4.0
    https://creativecommons.org/licenses/by/4.0/

    Afficher plus Afficher moins
    24 min