When AI Fails
Cautionary cases where AI caused harm, confusion, or was quietly abandoned
IBM Watson for Oncology Abandoned
Billions spent, unsafe recommendations, ultimately shut down in 2023. The canonical AI overpromise story.
Radiology AI Override Effect UX Failure
When AI was wrong, doctors who saw it missed diagnoses they would have caught alone — a hidden danger.
LLM Diagnostic Reasoning RCT No Effect
JAMA randomized trial: physicians with an LLM assistant showed no improvement in diagnostic accuracy.
The Radiologist Shortage Paradox Unintended
AI hype caused a drop in radiology applicants — potentially creating the very shortage the AI was supposed to solve.
When AI Works
Cases with measurable, documented outcomes — the practical blueprint
Ambient AI Scribing Breakout Hit
Burnout dropped 13 pts across 263 physicians in 30 days. Healthcare AI's first true breakout product category.
Cleveland Clinic Sepsis AI Clinical Win
10x reduction in false positives, 46% more sepsis cases identified. Clear workflow-integrated success.
Prior Authorization Automation Ops Win
60% of requests processed in under 2 hours vs. zero via phone/fax. Estimated 5x ROI.
AI Billing & Coding Fast Growing
Fastest-growing AI adoption category 2023→2024. Reduced claim denials, improved charge capture accuracy.
The Nuanced Middle
Cases that reveal the real variables — UX, trust, and patient experience
Patient AI Acceptance Gap Experience
Patients must want to engage with AI tools — scheduling, insurance, portals only work when trust exists first.
Physicians Excluded from AI Design Governance
70% of physicians want to be involved from design through integration — yet most AI is built without them.
Radiology Override: The UX Fix UX Insight
The same AI that caused diagnostic errors performed better when presentation was redesigned — same model, different interface.
The 100% Adoption / 53% Success Gap Survey Data
Every large health system uses ambient AI — but only 53% report high success. Adoption ≠ value.
1
The Hype Calibration
— 10 min
- Where AI stands today vs. the headlines — ground the room in reality first
- Physician AI use rose from 38% → 66% in one year — but what are they using it for?
- IBM Watson: the canonical cautionary tale — billions spent, unsafe recommendations, shut down in 2023
- Frame the thesis: AI fails not because the models are bad — it fails because it's disconnected from workflow and human experience
2
Two Experience Problems Nobody Talks About
— 12 min
- Clinician UX Problem: 70% of physicians want to be involved from design to integration — yet are routinely excluded
- Patient Experience Problem: Patients must want to engage with AI-enabled services — trust precedes adoption
- Radiology override case: wrong AI + poor presentation = worse outcomes than no AI — and the UX fix that solved it
- JAMA LLM diagnostic RCT: no improvement in reasoning — because the interface didn't fit the clinical workflow
3
Where AI Actually Works Right Now
— 15 min
- Ambient scribing: burnout dropped 13 points in 30 days — 15 to 60 min/day saved per physician
- Cleveland Clinic sepsis AI: 10x fewer false positives, 46% more cases caught
- Prior auth automation: 60% of requests processed in under 2 hours, 5x ROI
- Billing & coding: fastest growing AI category, measurable claim denial reduction
- Common thread: all solve a specific, bounded problem with clean data and human oversight built in
4
Where to Pump the Brakes
— 8 min
- Autonomous diagnosis without human oversight — risk amplification, not replacement
- Generative AI in direct patient-facing clinical decisions — hallucination risk is non-trivial
- Tools built on poor, biased, or single-system data (the Watson lesson repeated)
- The radiologist shortage paradox: hype caused the problem AI was meant to solve
5
The 5-Question Framework
— 10 min
- 1. Does this solve a real, specific problem in my workflow?
- 2. Who designed the user experience — and did clinicians have a voice?
- 3. What does the underlying data look like? Is it yours, or someone else's?
- 4. What does "human oversight" actually mean in this product?
- 5. Will my patients want to use this — and will it build or erode their trust?
6
Your Role as AI Steward
— 5 min
- Physicians need to become the "captain of the AI ship" — defining workflows and success metrics
- AMA Center for Digital Health resources available today
- Call to action: lead from within your practice, don't wait for vendors to lead for you
Survey / Policy
AMA Physician AI Sentiment Survey 2024
Physician adoption rose from 38% to 66% in one year. 75% believe AI could help with efficiency; 54% with burnout. Source: ama-assn.org
Clinical Study
JAMA Network Open — Ambient Scribe Study
263 physicians, 6 health systems, 30-day study. Burnout 51.9% → 38.8%. Improved after-hours documentation and patient attention. (Duggan et al., 2025)
RCT / Warning
JAMA — LLM Diagnostic Reasoning RCT
Randomized trial: physicians given LLM assistant showed no significant improvement in diagnostic accuracy. Workflow mismatch identified as key factor.
Case Study
IBM Watson Health — Failure Analysis
Marketed as cancer treatment AI; produced unsafe recommendations. Discontinued 2023 after billions invested. Could not contextualize individual patient data.
Health System
Cleveland Clinic Sepsis AI
EHR-embedded alert system. 10x reduction in false positives. 46% increase in sepsis cases identified. Workflow-integrated design cited as key to success.
Operational
Prior Authorization AI — Industry Data
60% of requests processed in under 2 hours (vs. 0% via fax/phone). Estimated 5x ROI. Highest near-term operational value category.
Survey
Large Health System AI Adoption Survey 2024
43 large US health systems. Ambient notes: 100% adoption activity, only 53% high success. Radiology AI: 19% high success rate.
UX Research
Radiology AI Override Effect Study
When AI was wrong, radiologist false negatives jumped from 2.7% to 26–33%. Effect reduced when AI result was hidden or region of interest was visually highlighted.
Market Report
Ambient Scribing Market — 2025
$600M in 2025. First AI category to reach true breakout in healthcare. Only category with 100% large health system adoption.
AMA / Policy
JAMA Summit Report: AI in Health Care, Today and Tomorrow (2025)
Comprehensive overview of AI state of play. Emphasizes governance, physician involvement, and evidence standards for adoption. Key policy reference.
Unintended Consequences
Radiology Shortage & AI Hype (2018–present)
2018 AI hype caused significant drop in radiology residency applications — potentially contributing to current radiologist shortage. A cautionary systems-thinking case.
Operational
AI Billing & Coding — Health System Data
Fastest-growing AI use case 2023→2024. Measurable reductions in claim denials. High ROI, bounded problem domain, clean data — a model use case.
⚠️ Needs Verification
Radiology AI Override — Exact Stats (2.7%→26–33%)
Directionally supported by automation-bias literature but a single definitive peer-reviewed source for these exact numbers should be pinpointed before CME citation. Flag for follow-up.
✅ Corrected Attribution
Ambient Scribe Burnout Study — Olson et al., JAMA Network Open (Oct 2025)
263 physicians, 6 health systems. DOI: 10.1001/jamanetworkopen.2025.34976. Separate from Duggan et al. (Feb 2025, 46 participants). The Olson study is the headline number to cite.
✅ Confirmed Source
Cleveland Clinic Sepsis AI — Partner: Bayesian Health (Sept 2025)
Cleveland Clinic Fairview Hospital. 3,330+ patients. Source: Cleveland Clinic newsroom, Sept 23 2025. Partner confirmed as Bayesian Health, not a generic vendor.
✅ Confirmed Source
JAMA LLM Diagnostic RCT — Goh et al., JAMA Network Open (Oct 2024)
Stanford. GPT-4. DOI: 10.1001/jamanetworkopen.2024.40969. LLM alone scored 92% vs. 76% for physician+LLM and 74% for physician alone — but the difference was not statistically significant.
Healthcare Operations: The Two-Lane Model
AI doesn't affect both lanes equally. Click any node to see where AI is being applied, what's working, and what the risks are. Technology (EHR/data infrastructure) is the connective tissue between every department on both sides.
AI Readiness:
High — works now
Medium — mixed results
Caution — overhyped/risky
Emerging — watch this space
Shared Root Causes & Success Factors
Root Cause #1
Disconnected from Workflow
Watson, LLM RCT, and Radiology Override all failed because AI wasn't embedded in the natural clinical process. Physicians had to leave their workflow to interact with the tool.
Watson
LLM RCT
Radiology Override
Root Cause #2
Poor Interface Design (UX)
The radiology override was caused by how results were presented, not by a bad model. The same AI performed meaningfully better with a redesigned interface. UX is clinical infrastructure.
Radiology UX Fix
Physician Exclusion
Adoption Gap
Root Cause #3
Bad or Biased Training Data
Watson was trained on synthetic cases from one US cancer center, then deployed globally. No valid data foundation = unreliable outputs at any scale. Garbage in, garbage out — at $4B cost.
Watson
Autonomous Dx Risk
Success Factor #1
One Bounded Problem
Every working AI solves one well-defined, measurable problem with clean data. Ambient scribing = documentation. Sepsis AI = one alert type. Prior auth = one workflow step. No AI succeeds by solving everything.
Ambient Scribe
Sepsis AI
Prior Auth
Billing AI
Success Factor #2
Human Always in the Loop
Every working AI keeps the physician as final decision-maker. Sepsis AI flags — doctors act. Ambient scribe writes — doctors approve. This isn't a limitation; it's the design principle.
All success cases
The Wildcard
Trust — Clinician & Patient
Cleveland Clinic's 10x false positive reduction didn't just catch more sepsis — it made nurses trust the alerts enough to act on them. Accurate AI without clinician trust delivers zero value.
Sepsis AI
Patient Trust Gap
Radiology UX
AI in Healthcare — Timeline of Key Events
Current and historical AI healthcare products by category, with market entry dates and status. Organized to mirror the talk's framework — what works operationally, what is mixed, and what failed.