AI in Healthcare: Value vs. Hype

❌

When AI Fails

Cautionary cases where AI caused harm, confusion, or was quietly abandoned

IBM Watson for Oncology Abandoned

Billions spent, unsafe recommendations, ultimately shut down in 2023. The canonical AI overpromise story.

Radiology AI Override Effect UX Failure

When AI was wrong, doctors who saw it missed diagnoses they would have caught alone — a hidden danger.

LLM Diagnostic Reasoning RCT No Effect

JAMA randomized trial: physicians with an LLM assistant showed no improvement in diagnostic accuracy.

The Radiologist Shortage Paradox Unintended

AI hype caused a drop in radiology applicants — potentially creating the very shortage the AI was supposed to solve.

✅

When AI Works

Cases with measurable, documented outcomes — the practical blueprint

Ambient AI Scribing Breakout Hit

Burnout dropped 13 pts across 263 physicians in 30 days. Healthcare AI's first true breakout product category.

Cleveland Clinic Sepsis AI Clinical Win

10x reduction in false positives, 46% more sepsis cases identified. Clear workflow-integrated success.

Prior Authorization Automation Ops Win

60% of requests processed in under 2 hours vs. zero via phone/fax. Estimated 5x ROI.

AI Billing & Coding Fast Growing

Fastest-growing AI adoption category 2023→2024. Reduced claim denials, improved charge capture accuracy.

⚖️

The Nuanced Middle

Cases that reveal the real variables — UX, trust, and patient experience

Patient AI Acceptance Gap Experience

Patients must want to engage with AI tools — scheduling, insurance, portals only work when trust exists first.

Physicians Excluded from AI Design Governance

70% of physicians want to be involved from design through integration — yet most AI is built without them.

Radiology Override: The UX Fix UX Insight

The same AI that caused diagnostic errors performed better when presentation was redesigned — same model, different interface.

The 100% Adoption / 53% Success Gap Survey Data

Every large health system uses ambient AI — but only 53% report high success. Adoption ≠ value.

The Hype Calibration — 10 min

Where AI stands today vs. the headlines — ground the room in reality first
Physician AI use rose from 38% → 66% in one year — but what are they using it for?
IBM Watson: the canonical cautionary tale — billions spent, unsafe recommendations, shut down in 2023
Frame the thesis: AI fails not because the models are bad — it fails because it's disconnected from workflow and human experience

Two Experience Problems Nobody Talks About — 12 min

Clinician UX Problem: 70% of physicians want to be involved from design to integration — yet are routinely excluded
Patient Experience Problem: Patients must want to engage with AI-enabled services — trust precedes adoption
Radiology override case: wrong AI + poor presentation = worse outcomes than no AI — and the UX fix that solved it
JAMA LLM diagnostic RCT: no improvement in reasoning — because the interface didn't fit the clinical workflow

Where AI Actually Works Right Now — 15 min

Ambient scribing: burnout dropped 13 points in 30 days — 15 to 60 min/day saved per physician
Cleveland Clinic sepsis AI: 10x fewer false positives, 46% more cases caught
Prior auth automation: 60% of requests processed in under 2 hours, 5x ROI
Billing & coding: fastest growing AI category, measurable claim denial reduction
Common thread: all solve a specific, bounded problem with clean data and human oversight built in

Where to Pump the Brakes — 8 min

Autonomous diagnosis without human oversight — risk amplification, not replacement
Generative AI in direct patient-facing clinical decisions — hallucination risk is non-trivial
Tools built on poor, biased, or single-system data (the Watson lesson repeated)
The radiologist shortage paradox: hype caused the problem AI was meant to solve

The 5-Question Framework — 10 min

1. Does this solve a real, specific problem in my workflow?
2. Who designed the user experience — and did clinicians have a voice?
3. What does the underlying data look like? Is it yours, or someone else's?
4. What does "human oversight" actually mean in this product?
5. Will my patients want to use this — and will it build or erode their trust?

Your Role as AI Steward — 5 min

Physicians need to become the "captain of the AI ship" — defining workflows and success metrics
AMA Center for Digital Health resources available today
Call to action: lead from within your practice, don't wait for vendors to lead for you

Survey / Policy

AMA Physician AI Sentiment Survey 2024

Physician adoption rose from 38% to 66% in one year. 75% believe AI could help with efficiency; 54% with burnout. Source: ama-assn.org

Clinical Study

JAMA Network Open — Ambient Scribe Study

263 physicians, 6 health systems, 30-day study. Burnout 51.9% → 38.8%. Improved after-hours documentation and patient attention. (Duggan et al., 2025)

RCT / Warning

JAMA — LLM Diagnostic Reasoning RCT

Randomized trial: physicians given LLM assistant showed no significant improvement in diagnostic accuracy. Workflow mismatch identified as key factor.

Case Study

IBM Watson Health — Failure Analysis

Marketed as cancer treatment AI; produced unsafe recommendations. Discontinued 2023 after billions invested. Could not contextualize individual patient data.

Health System

Cleveland Clinic Sepsis AI

EHR-embedded alert system. 10x reduction in false positives. 46% increase in sepsis cases identified. Workflow-integrated design cited as key to success.

Operational

Prior Authorization AI — Industry Data

60% of requests processed in under 2 hours (vs. 0% via fax/phone). Estimated 5x ROI. Highest near-term operational value category.

Survey

Large Health System AI Adoption Survey 2024

43 large US health systems. Ambient notes: 100% adoption activity, only 53% high success. Radiology AI: 19% high success rate.

UX Research

Radiology AI Override Effect Study

When AI was wrong, radiologist false negatives jumped from 2.7% to 26–33%. Effect reduced when AI result was hidden or region of interest was visually highlighted.

Market Report

Ambient Scribing Market — 2025

$600M in 2025. First AI category to reach true breakout in healthcare. Only category with 100% large health system adoption.

AMA / Policy

JAMA Summit Report: AI in Health Care, Today and Tomorrow (2025)

Comprehensive overview of AI state of play. Emphasizes governance, physician involvement, and evidence standards for adoption. Key policy reference.

Unintended Consequences

Radiology Shortage & AI Hype (2018–present)

2018 AI hype caused significant drop in radiology residency applications — potentially contributing to current radiologist shortage. A cautionary systems-thinking case.

Operational

AI Billing & Coding — Health System Data

Fastest-growing AI use case 2023→2024. Measurable reductions in claim denials. High ROI, bounded problem domain, clean data — a model use case.

⚠️ Needs Verification

Radiology AI Override — Exact Stats (2.7%→26–33%)

Directionally supported by automation-bias literature but a single definitive peer-reviewed source for these exact numbers should be pinpointed before CME citation. Flag for follow-up.

✅ Corrected Attribution

Ambient Scribe Burnout Study — Olson et al., JAMA Network Open (Oct 2025)

263 physicians, 6 health systems. DOI: 10.1001/jamanetworkopen.2025.34976. Separate from Duggan et al. (Feb 2025, 46 participants). The Olson study is the headline number to cite.

✅ Confirmed Source

Cleveland Clinic Sepsis AI — Partner: Bayesian Health (Sept 2025)

Cleveland Clinic Fairview Hospital. 3,330+ patients. Source: Cleveland Clinic newsroom, Sept 23 2025. Partner confirmed as Bayesian Health, not a generic vendor.

✅ Confirmed Source

JAMA LLM Diagnostic RCT — Goh et al., JAMA Network Open (Oct 2024)

Stanford. GPT-4. DOI: 10.1001/jamanetworkopen.2024.40969. LLM alone scored 92% vs. 76% for physician+LLM and 74% for physician alone — but the difference was not statistically significant.

Healthcare Operations: The Two-Lane Model

AI doesn't affect both lanes equally. Click any node to see where AI is being applied, what's working, and what the risks are. Technology (EHR/data infrastructure) is the connective tissue between every department on both sides.

AI Readiness:

High — works now

Medium — mixed results

Caution — overhyped/risky

Emerging — watch this space

Business Lane

Operations · Finance · Population Health

Technology

EHR · Data · Interoperability

Clinical Lane

Care · Treatment · Medicine

Shared Root Causes & Success Factors

Root Cause #1

Disconnected from Workflow

Watson, LLM RCT, and Radiology Override all failed because AI wasn't embedded in the natural clinical process. Physicians had to leave their workflow to interact with the tool.

Watson LLM RCT Radiology Override

Root Cause #2

Poor Interface Design (UX)

The radiology override was caused by how results were presented, not by a bad model. The same AI performed meaningfully better with a redesigned interface. UX is clinical infrastructure.

Radiology UX Fix Physician Exclusion Adoption Gap

Root Cause #3

Bad or Biased Training Data

Watson was trained on synthetic cases from one US cancer center, then deployed globally. No valid data foundation = unreliable outputs at any scale. Garbage in, garbage out — at $4B cost.

Watson Autonomous Dx Risk

Success Factor #1

One Bounded Problem

Every working AI solves one well-defined, measurable problem with clean data. Ambient scribing = documentation. Sepsis AI = one alert type. Prior auth = one workflow step. No AI succeeds by solving everything.

Ambient Scribe Sepsis AI Prior Auth Billing AI

Success Factor #2

Human Always in the Loop

Every working AI keeps the physician as final decision-maker. Sepsis AI flags — doctors act. Ambient scribe writes — doctors approve. This isn't a limitation; it's the design principle.

All success cases

The Wildcard

Trust — Clinician & Patient

Cleveland Clinic's 10x false positive reduction didn't just catch more sepsis — it made nurses trust the alerts enough to act on them. Accurate AI without clinician trust delivers zero value.

Sepsis AI Patient Trust Gap Radiology UX

AI in Healthcare — Timeline of Key Events

Current and historical AI healthcare products by category, with market entry dates and status. Organized to mirror the talk's framework — what works operationally, what is mixed, and what failed.

AI in Healthcare:Value vs. Hype

AI in Healthcare:
Value vs. Hype