Skip to main content
Talk & Presentation 1.0 AMA PRA Category 1 Credit™ Arizona Medical Association Annual Meeting

AI in Healthcare:
Value vs. Hype

A Practical Framework for Evaluating AI in Your Practice

Date & TimeMay 2, 2026 · 8:00–8:30 AM
VenueCreighton University Medical Campus, Phoenix, AZ
CME Credit1.0 AMA PRA Category 1 Credit™
SpeakerDavid Eitel · RRT · MHA · MSRT · RRT-ACCS

Session Materials

The take-home framework is designed to be printed and kept at your desk. The slides include all cases, data, and references from the session.

What This Session Covers

AI tools are increasingly introduced into clinical practice — often without clear evidence of effectiveness, limitations, or failure modes. Physicians are frequently asked to evaluate these systems without a structured framework for assessing clinical value or implementation readiness. This session reviews current evidence on AI in clinical medicine, including documented successes, known failures, and the role of AI in clinical decision-making. Participants receive a practical five-question framework to guide evaluation and responsible adoption of AI tools within clinical practice.

AI in Healthcare by the Numbers

The market has moved faster than governance. Every physician is now encountering AI in their workflow — often without knowing it.

1,451
FDA-cleared AI medical devices as of end of 2025
FDA AI/ML Device List · Dec 2025
295
New FDA AI device clearances in 2025 alone
Innolitics 2025 Annual Roundup
80%+
Physicians now using AI — up from 38% in 2023
AMA Physician Sentiment Survey · 2026
250+
State AI healthcare bills in 2025 across 47 states
Manatt Health AI Policy Tracker · 2026

Healthcare Is Not a Monolith

AI readiness depends entirely on which part of healthcare you are talking about. Conflating the two is where most AI failures begin.

The Delivery of Healthcare
Operations · Administration · Workflow
Scheduling, billing, and prior authorization
Clinical documentation and note-writing
Supply chain, staffing, and resource allocation
Claims processing and revenue cycle
Ambient scribing and administrative summarization
AI StatusAI is already here — and working. Bounded rules, high volume, consistent structured data. Measurable and verifiable ROI.
The Practice of Medical Science
Diagnosis · Treatment · Clinical Judgment
Diagnostic reasoning and differential diagnosis
Treatment planning and patient-specific decisions
Shared decision-making with patients
Nuanced clinical judgment across complex contexts
Risk stratification with high-stakes consequences
AI StatusAI is an assistive tool here — not a replacement. High variability, high stakes, deeply judgment-dependent.
The core failure pattern: Most AI failures in healthcare happen when tools built for Column A — operations — get deployed as if they are practicing Column B medicine. The Epic Sepsis Model. The Optum risk-scoring algorithm. Both were operational-layer tools applied to clinical judgment scenarios — without the validation that decision-scale deployment requires.

Not Everything Called "AI" Is the Same Thing

Six layers, each doing something different, each carrying different risk. When a vendor says "we use AI" — ask which layer. The answer changes everything about how you should evaluate the claim.

1
Foundation Layer
Machine Learning (ML)
Risk scores · Readmission prediction · Sepsis alerts
The statistical foundation most clinical AI is built on. ML models find patterns in labeled training data and apply them to new inputs. The Epic Sepsis Model is ML. Most EHR-embedded risk scores are ML. Well-understood mathematically — but only as good as the data it was trained on. That is where bias enters. Nationally deployed without independent external validation is a documentation problem, not a technology problem.
Risk: Moderate — Performance depends heavily on training data quality and whether local validation was performed
+
2
Foundation Layer
Deep Learning (DL)
Image classification · EKG analysis · Pathology pattern recognition
A subset of ML using layered neural networks. Excels at unstructured data — images, waveforms, pathology slides. The majority of FDA-cleared radiology AI (the largest category of approved AI devices) is deep learning. High performance when the training images match the clinical environment. The critical failure mode: performance degradation in out-of-distribution cases that is invisible until it produces a miss.
Risk: Moderate-High — Performance gaps in edge cases can be undetectable without prospective validation in your environment
+
3
Perception Layer
Computer Vision (CV)
Radiology triage · Dermatology screening · Wound assessment
Applied deep learning focused specifically on visual interpretation. Triage prioritization in radiology — detecting critical findings on CT and X-ray and surfacing them to radiologists faster — is the most proven clinical use case in the entire AI field. FDA has cleared over 500 radiology AI devices. The clinical evidence base here is more mature than any other AI category. The key question remains: flagging (lower risk) or autonomous interpretation (higher risk)?
Risk: Lower for flagging/triage functions — increases significantly for autonomous diagnostic interpretation
+
4
Communication Layer
Generative AI / Large Language Models (LLMs)
Ambient scribes · Patient message drafts · Discharge summaries
The category driving most of the current AI conversation in healthcare. LLMs generate natural language — they predict text based on patterns, they do not reason. Ambient scribing (the top proven win in this session) is LLMs applied to a bounded operational problem: converting speech to structured notes. When LLMs are deployed for clinical reasoning or diagnosis, the evidence base does not support the claims. The GPT-4 diagnostic RCT (Goh et al., JAMA Network Open 2024) found no statistically significant improvement over unassisted physicians (p=0.60).
Risk: High when applied to clinical reasoning — proven effective only in bounded operational tasks like documentation
+
5
Grounding Layer
Retrieval-Augmented Generation (RAG)
Guideline-aware Q&A · Protocol retrieval · Internal knowledge search
RAG combines an LLM with a curated knowledge base — clinical guidelines, formularies, institutional protocols — to ground responses in verified sources rather than training data alone. Reduces hallucination compared to a standalone LLM. Increasingly used for clinical decision support where the source material can be controlled and audited. The quality of the knowledge base determines the quality of the output. "Garbage in, garbage out" applies at the grounding layer.
Risk: Moderate — Better than standalone LLMs, but only as reliable as the knowledge base it references
+
6
Executive Action Layer
Agentic AI
Multi-step autonomous workflows · Order-writing agents · Care coordination
AI that does not just generate output — it takes action. Agentic systems execute multi-step workflows, interact with EHR APIs, place orders, and coordinate across systems without human approval at each step. Currently experimental in healthcare with limited proven deployment at scale. The liability and governance questions are almost entirely unresolved. This is the layer where the regulatory patchwork is most exposed and where physician governance advocacy is most urgent.
Risk: Experimental — Governance, liability, and patient safety frameworks do not yet exist for most agentic healthcare use cases
+
When someone says "we're using AI" — ask which layer. A vendor claiming AI for prior authorization processing is describing Layer 1 or 2 applied to an operational task — reasonable, with a verifiable evidence base. A vendor claiming AI for autonomous clinical decision-making is describing Layer 4 or 6 applied to a judgment task — not proven at the standard of care. The language is identical. The evidence base is not.

After This Session, You Will Be Able To…

Mapped to ACGME competency domains · 1.0 AMA PRA Category 1 Credit™

1

Evaluate AI tools through the lens of workflow integration, data integrity, human oversight, and end-user experience — for both clinicians and patients.

Patient Care & Practice-Based Learning
2

Apply a 5-question practical decision framework to assess AI vendor claims, identify implementation readiness gaps, and distinguish evidence-based tools from marketing-driven hype.

Systems-Based Practice
3

Recognize the role of physician leadership in AI governance — including how to advocate for clinician involvement in AI design, procurement, and institutional oversight.

Professionalism & Communication
4

Formulate at least two concrete next steps to engage with AI governance, evaluation, or implementation within your own practice or institution.

Practice-Based Learning & Improvement

The 5-Question Framework

Ask these before any AI adoption decision. Each maps to a domain in the AMA AI Tool Evaluation Guide — developed by 21 specialty societies, February 2026.

1

What specific, bounded problem does this solve?

If the answer is vague, stop there. Operational tasks work; broad clinical claims don't.

AMA Domain 01 — Clinical Use Case & User
2

Where does the physician stay in the loop?

The final clinical decision must remain with a clinician. Always.

AMA Domain 03 — Risks & Mitigation
3

What was it trained on — and does it reflect OUR patients?

Local validation is not optional. The Epic Sepsis Model and the Optum algorithm both failed on this question — after national deployment.

AMA Domain 02 — Training & Validation Data
4

How were clinicians involved in building this?

Not just consulted at the end. Actually involved in design, testing, and iteration throughout the process.

AMA Domain 05 — Workflow Integration & Monitoring
5

What does failure look like — and who's accountable?

If no one can answer this clearly, that IS your answer.

AMA Domain 04 — Effectiveness & Performance

"AI tools that work solve narrow, bounded problems with clean data and a physician still in the loop. The ones that fail are solving a vendor's pitch deck."

What Actually Works

Every AI tool that works shares three traits: it solves one bounded problem, keeps a human in the loop, and runs on representative, validated data. These three cases meet all three criteria.

51.9% → 38.8%
Burnout score reduction in 30 days — 263 physicians across 6 health systems. Removing the documentation burden was the intervention.
Ambient AI Documentation
Olson et al. · JAMA Network Open · Oct 2025
10x fewer
False sepsis alerts. 46% more cases identified. 7x more cases flagged before antibiotics were ordered. 3,330+ patients enrolled.
Sepsis Early Warning (Bayesian Health)
Cleveland Clinic · Sep 2025
60%
Prior authorizations processed in under 2 hours. Estimated 5x ROI. A bounded operational workflow — not a clinical judgment task.
Prior Authorization AI
Industry reports · 2024–2025

The Liability Gap

AI creates liability exposure in two directions simultaneously — and the legal standard of care is still being written in real time.

Liability for Using Bad AI
If an AI tool produces an incorrect recommendation and a physician follows it without independent clinical verification, the physician may be liable for failing to catch the error. You cannot outsource clinical judgment to a vendor.
Liability for NOT Using AI
As AI becomes the standard of care in specific clinical contexts, physicians may face claims for failing to use available tools that could have caught something they missed. The precedent is still forming — but it is forming now.
Johns Hopkins researchers have noted that asking physicians to judge in real time when to trust AI is an "almost superhuman burden." No major AI malpractice cases have been resolved yet. But documentation, institutional governance, and informed vendor evaluation are your protection. Contact your malpractice carrier for your specific exposure.

Start Here

Everything on this list can be done without a budget, IT approval, or anyone's permission.

This WeekDownload the AMA AI Tool Evaluation Guide — free, 21 specialty societies, 5 evaluation domains
30 DaysAsk your institution for its AI inventory — what tools are currently in use in your department?
30 DaysAsk your EHR vendor what AI is running in your workflow — specifically what is active without your knowledge
30–60 DaysApply the 5-question framework to one AI tool you are currently using or being asked to evaluate
Next Avail.Attend an AI vendor demo — and bring the 5 questions. See how the vendor responds to each one
60–90 DaysAdvocate for an AI governance committee. If one exists, find out who is on it and whether clinicians are represented

Every Claim Has a Primary Source

All DOIs and external links are live. Grouped by source type.

Peer-Reviewed Studies
Ambient AI Scribing — Burnout Reduction (Olson et al., JAMA Network Open · Oct 2025)Burnout score reduced from 51.9% to 38.8% in 30 days across 263 physicians in 6 health systems using ambient AI documentation.
DOI: 10.1001/jamanetworkopen.2025.34976 →
GPT-4 as Diagnostic Aid — RCT, No Significant Improvement (Goh et al., JAMA Network Open · Oct 2024)Randomized controlled trial: physicians using GPT-4 as a diagnostic aid showed no statistically significant improvement over controls (p=0.60).
DOI: 10.1001/jamanetworkopen.2024.40969 →
Epic Sepsis Model — External Validation (Wong et al., JAMA Internal Medicine · Jun 2021)Independent validation: AUC 0.63 (vs. 0.76 claimed), sensitivity 33% — 67% of sepsis cases missed. Deployed nationally before external validation.
DOI: 10.1001/jamainternmed.2021.2626 →
Optum/UnitedHealth Racial Bias Algorithm (Obermeyer et al., Science · Oct 2019)Algorithm used by 200M+ Americans annually used cost as a proxy for need, systematically deprioritizing equally ill Black patients. Correcting the proxy increased Black patient referrals from 17.7% to 46.5%.
DOI: 10.1126/science.aax2342 →
Epic Sepsis Model v2 — Multicenter Validation (Ostermayer et al., JAMIA Open · Dec 2024)Four-health-system multicenter validation confirming persistent performance issues with the updated model.
DOI: 10.1093/jamiaopen/ooae133 →
Health System & Organizational Sources
Cleveland Clinic / Bayesian Health — Sepsis AI Pilot (Sep 2025)3,330+ patients. Results: 10x fewer false alerts, 46% more cases identified, 7x more flagged before antibiotics.
clevelandclinic.org →
ECRI Top 10 Patient Safety Concerns 2026"AI Diagnostic Dilemma" ranked the #1 patient safety concern nationally by the nation's leading independent patient safety organization.
ecri.org →
AMA Physician AI Sentiment Survey — 2024 & 2026Adoption trajectory: 38% (2023) → 66% (2024) → 80%+ (2026 projection).
ama-assn.org →
AMA AI Specialty Collaborative: AI Tool Evaluation Guide (Feb 2026)21 specialty societies. Five evaluation domains form the basis of this session's 5-Question Framework.
ama-assn.org →
AMA STEPS Forward: Governance for Augmented Intelligence Toolkit (Aug 2025)Eight-step institutional AI governance implementation guide with CME credit. Developed with Manatt Health.
ama-assn.org/steps-forward →
IBM Watson Health Sale to Francisco Partners — January 2023Industry case study. Watson Health was sold after years of failed oncology and population health claims — a canonical example of broad clinical AI claims that did not survive real-world deployment.
Regulatory & Policy Sources
FDA AI/ML-Enabled Medical Device List1,451 cleared devices as of December 2025. 295 new clearances in 2025 alone. Maintained continuously by FDA.
fda.gov →
FDA Clinical Decision Support Guidance (January 6, 2026)Final guidance: CDS software providing recommendations a clinician can independently review now falls outside FDA device oversight. Practical effect: more AI tools entering workflows with reduced regulatory scrutiny.
fda.gov →
Arizona HB 2175 — AI in Prior AuthorizationBans AI from denying insurance claims without physician review, effective July 1, 2026. One of the first laws of its kind in the country. ArMA was a direct supporter and Arizona physicians were instrumental in its passage.
azmed.org →
Manatt Health AI Policy TrackerWeekly tracking of state and federal AI healthcare legislation. 250+ bills introduced in 2025; 33 became law. Essential reference for navigating the regulatory landscape.
manatt.com →

Accreditation: Presented in accordance with ACCME Standards for Integrity and Independence in Accredited Continuing Education.
Financial Relationships: Speaker has no relevant financial relationships with ineligible companies. No vendor payments, consulting arrangements, or speaker bureau relationships. Employer: Intermountain Health (non-commercial health system).
Commercial Support: No commercial support was accepted or solicited for this activity.
Content Standards: Content is nonpromotional, evidence-based, and free of commercial bias. All clinical claims cite peer-reviewed literature or primary source data.