AI tools are increasingly introduced into clinical practice — often without clear evidence of effectiveness, limitations, or failure modes. Physicians are frequently asked to evaluate these systems without a structured framework for assessing clinical value or implementation readiness. This session reviews current evidence on AI in clinical medicine, including documented successes, known failures, and the role of AI in clinical decision-making. Participants receive a practical five-question framework to guide evaluation and responsible adoption of AI tools within clinical practice.

The Landscape

AI in Healthcare by the Numbers

The market has moved faster than governance. Every physician is now encountering AI in their workflow — often without knowing it.

1,451

FDA-cleared AI medical devices as of end of 2025

FDA AI/ML Device List · Dec 2025

295

New FDA AI device clearances in 2025 alone

Innolitics 2025 Annual Roundup

80%+

Physicians now using AI — up from 38% in 2023

AMA Physician Sentiment Survey · 2026

250+

State AI healthcare bills in 2025 across 47 states

Manatt Health AI Policy Tracker · 2026

Framing

Healthcare Is Not a Monolith

AI readiness depends entirely on which part of healthcare you are talking about. Conflating the two is where most AI failures begin.

The Delivery of Healthcare

Operations · Administration · Workflow

Scheduling, billing, and prior authorization

Clinical documentation and note-writing

Supply chain, staffing, and resource allocation

Claims processing and revenue cycle

Ambient scribing and administrative summarization

AI StatusAI is already here — and working. Bounded rules, high volume, consistent structured data. Measurable and verifiable ROI.

The Practice of Medical Science

Diagnosis · Treatment · Clinical Judgment

Diagnostic reasoning and differential diagnosis

Treatment planning and patient-specific decisions

Shared decision-making with patients

Nuanced clinical judgment across complex contexts

Risk stratification with high-stakes consequences

AI StatusAI is an assistive tool here — not a replacement. High variability, high stakes, deeply judgment-dependent.

The core failure pattern: Most AI failures in healthcare happen when tools built for Column A — operations — get deployed as if they are practicing Column B medicine. The Epic Sepsis Model. The Optum risk-scoring algorithm. Both were operational-layer tools applied to clinical judgment scenarios — without the validation that decision-scale deployment requires.

Framing

Not Everything Called "AI" Is the Same Thing

Six layers, each doing something different, each carrying different risk. When a vendor says "we use AI" — ask which layer. The answer changes everything about how you should evaluate the claim.

Foundation Layer

Machine Learning (ML)

Risk scores · Readmission prediction · Sepsis alerts

The statistical foundation most clinical AI is built on. ML models find patterns in labeled training data and apply them to new inputs. The Epic Sepsis Model is ML. Most EHR-embedded risk scores are ML. Well-understood mathematically — but only as good as the data it was trained on. That is where bias enters. Nationally deployed without independent external validation is a documentation problem, not a technology problem.
Risk: Moderate — Performance depends heavily on training data quality and whether local validation was performed

Foundation Layer

Deep Learning (DL)

Image classification · EKG analysis · Pathology pattern recognition

A subset of ML using layered neural networks. Excels at unstructured data — images, waveforms, pathology slides. The majority of FDA-cleared radiology AI (the largest category of approved AI devices) is deep learning. High performance when the training images match the clinical environment. The critical failure mode: performance degradation in out-of-distribution cases that is invisible until it produces a miss.
Risk: Moderate-High — Performance gaps in edge cases can be undetectable without prospective validation in your environment

Perception Layer

Computer Vision (CV)

Radiology triage · Dermatology screening · Wound assessment

Applied deep learning focused specifically on visual interpretation. Triage prioritization in radiology — detecting critical findings on CT and X-ray and surfacing them to radiologists faster — is the most proven clinical use case in the entire AI field. FDA has cleared over 500 radiology AI devices. The clinical evidence base here is more mature than any other AI category. The key question remains: flagging (lower risk) or autonomous interpretation (higher risk)?
Risk: Lower for flagging/triage functions — increases significantly for autonomous diagnostic interpretation

Communication Layer

Generative AI / Large Language Models (LLMs)

Ambient scribes · Patient message drafts · Discharge summaries

The category driving most of the current AI conversation in healthcare. LLMs generate natural language — they predict text based on patterns, they do not reason. Ambient scribing (the top proven win in this session) is LLMs applied to a bounded operational problem: converting speech to structured notes. When LLMs are deployed for clinical reasoning or diagnosis, the evidence base does not support the claims. The GPT-4 diagnostic RCT (Goh et al., JAMA Network Open 2024) found no statistically significant improvement over unassisted physicians (p=0.60).
Risk: High when applied to clinical reasoning — proven effective only in bounded operational tasks like documentation

Grounding Layer

Retrieval-Augmented Generation (RAG)

Guideline-aware Q&A · Protocol retrieval · Internal knowledge search

RAG combines an LLM with a curated knowledge base — clinical guidelines, formularies, institutional protocols — to ground responses in verified sources rather than training data alone. Reduces hallucination compared to a standalone LLM. Increasingly used for clinical decision support where the source material can be controlled and audited. The quality of the knowledge base determines the quality of the output. "Garbage in, garbage out" applies at the grounding layer.
Risk: Moderate — Better than standalone LLMs, but only as reliable as the knowledge base it references

Executive Action Layer

Agentic AI

Multi-step autonomous workflows · Order-writing agents · Care coordination

AI that does not just generate output — it takes action. Agentic systems execute multi-step workflows, interact with EHR APIs, place orders, and coordinate across systems without human approval at each step. Currently experimental in healthcare with limited proven deployment at scale. The liability and governance questions are almost entirely unresolved. This is the layer where the regulatory patchwork is most exposed and where physician governance advocacy is most urgent.
Risk: Experimental — Governance, liability, and patient safety frameworks do not yet exist for most agentic healthcare use cases

When someone says "we're using AI" — ask which layer. A vendor claiming AI for prior authorization processing is describing Layer 1 or 2 applied to an operational task — reasonable, with a verifiable evidence base. A vendor claiming AI for autonomous clinical decision-making is describing Layer 4 or 6 applied to a judgment task — not proven at the standard of care. The language is identical. The evidence base is not.

CME Learning Objectives

After This Session, You Will Be Able To…

Mapped to ACGME competency domains · 1.0 AMA PRA Category 1 Credit™

Evaluate AI tools through the lens of workflow integration, data integrity, human oversight, and end-user experience — for both clinicians and patients.

Patient Care & Practice-Based Learning

Apply a 5-question practical decision framework to assess AI vendor claims, identify implementation readiness gaps, and distinguish evidence-based tools from marketing-driven hype.

Systems-Based Practice

Recognize the role of physician leadership in AI governance — including how to advocate for clinician involvement in AI design, procurement, and institutional oversight.

Professionalism & Communication

Formulate at least two concrete next steps to engage with AI governance, evaluation, or implementation within your own practice or institution.

Practice-Based Learning & Improvement

Primary Take-Home Tool

The 5-Question Framework

Ask these before any AI adoption decision. Each maps to a domain in the AMA AI Tool Evaluation Guide — developed by 21 specialty societies, February 2026.

What specific, bounded problem does this solve?

If the answer is vague, stop there. Operational tasks work; broad clinical claims don't.

AMA Domain 01 — Clinical Use Case & User

Where does the physician stay in the loop?

The final clinical decision must remain with a clinician. Always.

AMA Domain 03 — Risks & Mitigation

What was it trained on — and does it reflect OUR patients?

Local validation is not optional. The Epic Sepsis Model and the Optum algorithm both failed on this question — after national deployment.

AMA Domain 02 — Training & Validation Data

How were clinicians involved in building this?

Not just consulted at the end. Actually involved in design, testing, and iteration throughout the process.

AMA Domain 05 — Workflow Integration & Monitoring

What does failure look like — and who's accountable?

If no one can answer this clearly, that IS your answer.

AMA Domain 04 — Effectiveness & Performance

"AI tools that work solve narrow, bounded problems with clean data and a physician still in the loop. The ones that fail are solving a vendor's pitch deck."

Evidence-Based Wins

What Actually Works

Every AI tool that works shares three traits: it solves one bounded problem, keeps a human in the loop, and runs on representative, validated data. These three cases meet all three criteria.

51.9% → 38.8%

Burnout score reduction in 30 days — 263 physicians across 6 health systems. Removing the documentation burden was the intervention.

Ambient AI Documentation

Olson et al. · JAMA Network Open · Oct 2025

10x fewer

False sepsis alerts. 46% more cases identified. 7x more cases flagged before antibiotics were ordered. 3,330+ patients enrolled.

Sepsis Early Warning (Bayesian Health)

Cleveland Clinic · Sep 2025

60%

Prior authorizations processed in under 2 hours. Estimated 5x ROI. A bounded operational workflow — not a clinical judgment task.

Prior Authorization AI

Industry reports · 2024–2025

Know Your Exposure

The Liability Gap

AI creates liability exposure in two directions simultaneously — and the legal standard of care is still being written in real time.

Liability for Using Bad AI

If an AI tool produces an incorrect recommendation and a physician follows it without independent clinical verification, the physician may be liable for failing to catch the error. You cannot outsource clinical judgment to a vendor.

Liability for NOT Using AI

As AI becomes the standard of care in specific clinical contexts, physicians may face claims for failing to use available tools that could have caught something they missed. The precedent is still forming — but it is forming now.

Johns Hopkins researchers have noted that asking physicians to judge in real time when to trust AI is an "almost superhuman burden." No major AI malpractice cases have been resolved yet. But documentation, institutional governance, and informed vendor evaluation are your protection. Contact your malpractice carrier for your specific exposure.

Your Next Steps

Start Here

Everything on this list can be done without a budget, IT approval, or anyone's permission.

This WeekDownload the AMA AI Tool Evaluation Guide — free, 21 specialty societies, 5 evaluation domains

30 DaysAsk your institution for its AI inventory — what tools are currently in use in your department?

30 DaysAsk your EHR vendor what AI is running in your workflow — specifically what is active without your knowledge

30–60 DaysApply the 5-question framework to one AI tool you are currently using or being asked to evaluate

Next Avail.Attend an AI vendor demo — and bring the 5 questions. See how the vendor responds to each one

60–90 DaysAdvocate for an AI governance committee. If one exists, find out who is on it and whether clinicians are represented

Evidence Base

Every Claim Has a Primary Source

All DOIs and external links are live. Grouped by source type.

Peer-Reviewed Studies

Ambient AI Scribing — Burnout Reduction (Olson et al., JAMA Network Open · Oct 2025)Burnout score reduced from 51.9% to 38.8% in 30 days across 263 physicians in 6 health systems using ambient AI documentation.
DOI: 10.1001/jamanetworkopen.2025.34976 →

GPT-4 as Diagnostic Aid — RCT, No Significant Improvement (Goh et al., JAMA Network Open · Oct 2024)Randomized controlled trial: physicians using GPT-4 as a diagnostic aid showed no statistically significant improvement over controls (p=0.60).
DOI: 10.1001/jamanetworkopen.2024.40969 →

Epic Sepsis Model — External Validation (Wong et al., JAMA Internal Medicine · Jun 2021)Independent validation: AUC 0.63 (vs. 0.76 claimed), sensitivity 33% — 67% of sepsis cases missed. Deployed nationally before external validation.
DOI: 10.1001/jamainternmed.2021.2626 →

Optum/UnitedHealth Racial Bias Algorithm (Obermeyer et al., Science · Oct 2019)Algorithm used by 200M+ Americans annually used cost as a proxy for need, systematically deprioritizing equally ill Black patients. Correcting the proxy increased Black patient referrals from 17.7% to 46.5%.
DOI: 10.1126/science.aax2342 →

Epic Sepsis Model v2 — Multicenter Validation (Ostermayer et al., JAMIA Open · Dec 2024)Four-health-system multicenter validation confirming persistent performance issues with the updated model.
DOI: 10.1093/jamiaopen/ooae133 →

Health System & Organizational Sources

Cleveland Clinic / Bayesian Health — Sepsis AI Pilot (Sep 2025)3,330+ patients. Results: 10x fewer false alerts, 46% more cases identified, 7x more flagged before antibiotics.
clevelandclinic.org →

ECRI Top 10 Patient Safety Concerns 2026"AI Diagnostic Dilemma" ranked the #1 patient safety concern nationally by the nation's leading independent patient safety organization.
ecri.org →

AMA Physician AI Sentiment Survey — 2024 & 2026Adoption trajectory: 38% (2023) → 66% (2024) → 80%+ (2026 projection).
ama-assn.org →

AMA AI Specialty Collaborative: AI Tool Evaluation Guide (Feb 2026)21 specialty societies. Five evaluation domains form the basis of this session's 5-Question Framework.
ama-assn.org →

AMA STEPS Forward: Governance for Augmented Intelligence Toolkit (Aug 2025)Eight-step institutional AI governance implementation guide with CME credit. Developed with Manatt Health.
ama-assn.org/steps-forward →

IBM Watson Health Sale to Francisco Partners — January 2023Industry case study. Watson Health was sold after years of failed oncology and population health claims — a canonical example of broad clinical AI claims that did not survive real-world deployment.

Regulatory & Policy Sources

FDA AI/ML-Enabled Medical Device List1,451 cleared devices as of December 2025. 295 new clearances in 2025 alone. Maintained continuously by FDA.
fda.gov →

FDA Clinical Decision Support Guidance (January 6, 2026)Final guidance: CDS software providing recommendations a clinician can independently review now falls outside FDA device oversight. Practical effect: more AI tools entering workflows with reduced regulatory scrutiny.
fda.gov →

Arizona HB 2175 — AI in Prior AuthorizationBans AI from denying insurance claims without physician review, effective July 1, 2026. One of the first laws of its kind in the country. ArMA was a direct supporter and Arizona physicians were instrumental in its passage.
azmed.org →

Manatt Health AI Policy TrackerWeekly tracking of state and federal AI healthcare legislation. 250+ bills introduced in 2025; 33 became law. Essential reference for navigating the regulatory landscape.
manatt.com →

Key Resources

For Physicians Evaluating AI

Start with the AMA Guide. Everything else follows from it.

AMA AI Tool Evaluation Guide

ama-assn.org

→

AMA STEPS Forward Governance Toolkit

ama-assn.org/steps-forward

→

AMA Center for Digital Health

ama-assn.org

→

ECRI Top 10 Patient Safety 2026

ecri.org

→

FDA AI/ML Device Database

fda.gov

→

Manatt Health AI Policy Tracker

manatt.com

→

Arizona Medical Association (HB 2175)

azmed.org

→

Coalition for Health AI (CHAI)

coalitionforhealthai.org

→

Innolitics FDA AI/ML Roundup

innolitics.com

→

CME Disclosure

Accreditation: Presented in accordance with ACCME Standards for Integrity and Independence in Accredited Continuing Education.
Financial Relationships: Speaker has no relevant financial relationships with ineligible companies. No vendor payments, consulting arrangements, or speaker bureau relationships. Employer: Intermountain Health (non-commercial health system).
Commercial Support: No commercial support was accepted or solicited for this activity.
Content Standards: Content is nonpromotional, evidence-based, and free of commercial bias. All clinical claims cite peer-reviewed literature or primary source data.

AI in Healthcare:Value vs. Hype

Session Materials

What This Session Covers