Designing Clinical AI for Real Conditions: Built for 3 AM, Not 3 PM
Why clinical AI that works in a 3 PM demo fails at 3 AM — what frontline clinicians need from decision support: co-pilot design, governance, and trust.
Featuring Dr. Natasha Dole on The Signal Room
An excellent test of any clinical AI tool is envisioning what it would be like to have it running for one hour. During a Signal Room discussion, emergency physician, Dr. Natasha Dole, made a clear distinction: an effective tool at 2 PM may be of no use at 2 AM. The 2 AM tool is what really counts. She challenged the developers of these tools to visit her and see what happens to her cognitive load at 3 AM when her pager goes off to see the actual worst case scenarios that the tools would need to survive.
Clinical AI implementations continually fail for the same reason. They are developed and piloted in calm, well-staffed clinicians' environments, and are unleashed on overwhelmed clinicians. At Hutchins Data Strategy Consultants, we view these implementations as a design failure, rather than an adoption failure. We believe the responsibility to address these gaps between the AI tool as designed and the AI tool as used lies with the developers, and not the clinicians, who bear the brunt of these poorly designed tools to do their work and improve care.
Why Clinicians Eye Roll at the Demo
Dole was frank about the real reason technology demonstrations become tedious to physicians. It is not innate resistance to change. It is the fact that the employed tool is created without the participation of the main users. If the tool users cannot have a say about what is the most useful to them, then unnecessary work is the end product. This is immediately obvious to the overworked clinician as soon as they touch the device.
Dole's solution is to designate a team to include all relevant stakeholders from the outset of the development process. This includes clinicians and patients. A tool designed with no input is already a failure, according to Dole. Towards the end of the conversation she provided a memorable principle of her critique: build with clinicians, not to them.
Co-Pilot, Not Autopilot
Dole promotes an AI model for use in medicine and clinical practice where the technology is designed to support clinicians. A primary emergency responder always remains a human 24/7. In her view, the AI is there to act as a support tool and a digital scribe with minimal impact on a clinician's cognitive processing. The clinician maintains the authority and the decision-making role, while the AI supports them by guiding and optimizing the process.
She was clear in noting that the first 'brain' in the process is the clinician's judgment, not the AI. If a tool's result is inconsistent with what she is observing clinically, she compares the two and asks herself if they can logically fit together — if one plus one is two, or if the tool is insisting on three. If it is the latter, she seeks a second human brain's agreement. The AI is a second opinion that she will consider, but it is not the final answer, and she must maintain this discipline in order to avoid risk.
This is not only a cultural reason, but a hard accountability reason. When something happens, no one says, "The AI did it." The clinician is responsible, and in a medical-legal case, the tool is of no defense. This must inform how conservative the approach is to the introduction of these systems, and how unambiguously their boundaries are defined.
Criteria for Clinician Trust
Dole essentially clarified the standards to which a tool must adhere before she is willing to trust it — and they provide a practical framework for clinicians when deploying clinical AI. It should have undergone a clinical governance review. It must include clinicians and patients in its design. There should be published performance data and literature for the exact populations it intends to serve, with an honest exposure of its known biases and potential harms. And, it must provide a demonstrable reduction in work.
Most tools fail the last criterion. More checks, more clicks, more tasks. If tools increase the already present burdens of administrative tasks, they will be rejected by people who are, in Dole's words, drowning. Dole works against two competing pressures in medicine: overcrowding and exit block. A tool secures its place if it works against these pressures by leading flow. Dole describes the problem of adding administrative friction against these pressures for the sake of safety.
Dole is describing a judgment based on the 'three Ps': patients, people, and profit. Patients are the main concern. People are the clinicians who have to deal with the tool. Profit is the program's sustainability funding. Each of the 'three Ps' must be included in the discussions to be adequate.
Consent, Equity, and the Limits of Off-the-Shelf AI
There are artificial limitations to real-world clinical conditions. Consent is one of these. In emergency medicine, the majority of patients are unconscious and cannot consent to anything on the spot. Dole's idea is practical consent and to ensure AI use is documented in clinical notes. This is to prevent the AI tool from generated notes and to complete a form of self-disclosure, which will automatically be populated, so clinicians will not have to complete these tasks.
Equity is another concern. If an AI tool is trained on literature and datasets that are inequitable and under-represent minority groups and populations, then that tool will carry those inequities and gaps to the bedside. Dole noted that AI must serve all populations, including minority populations; otherwise, it will deepen the inequities that exist rather than provide relief to entrenching patterns. Dole explains consumer AI is to offer medical advice, and she defends her position with great care. AI, unlike a medical professional, is unable to take into account the patient's history and medications; an appropriate and differential answer that AI provides is lacking, and in a situation where it provides a confident answer, the answer is likely dangerous.
She also separates two literacies that are conflated. AI literacy is not the same as digital literacy, and clinicians need time to develop it — not training if it's a day they have free to recover. The clinicians who need these tools the most are time and pressure and lacking the most, to the greatest degree, inclination to innovate and/or disrupt the status quo in a way that will be useful, and a rollout that treats AI competence as a hobby and is for the tech-savvy will reach, or be in contact with, the most ill-educated in the field.
Technology That Disappears
Dole's vision for the future of technology is one in which technology recedes. Well-designed AI has the ability to work like a sound engineer. At a concert, sound engineers work behind the scenes. AI has the ability to work behind the scenes, and make everything else possible at an event, without having to be a focus. Dole describes the positive benefits of technology. During the pandemic, clinicians had little to no direct interaction with their patients and were surrounded by layers of protective equipment. One of the things that helped mitigate the burden of that protective equipment was a scribe that handled documentation. That technology provided the clinician with the time that had been wasted to be present with the patient, and to be a human again. Even the smallest of cognitive loads helped to improve the quality of the interaction and what that clinician was able to provide to the patient in that moment.
How Hutchins Approaches Clinical AI
Our work keeps the clinician and the patient at the center of the design rather than the technology. We help organizations involve frontline clinicians from the start, set the governance and disclosure that real clinical settings demand, insist on population-specific evidence before deployment, and judge every tool by whether it reduces load under actual operating conditions — not under demonstration conditions. The goal is responsible AI that earns trust because it was built for the hour it will actually run in.
These themes are explored throughout The Signal Room podcast, in conversations with clinicians and leaders about where AI helps at the bedside and where it gets in the way.
Authoritative sources
Have a data or AI challenge like this?
A 30-minute call is enough to tell whether we're the right fit.
Frequently asked questions
What does it mean to design clinical AI for real conditions?
Designing and testing tools for the high-stress, off-hours environment where clinicians actually work — the 3 AM emergency, not the controlled 3 PM demonstration — so the tool reduces cognitive load instead of adding to it.
Why do clinicians distrust new AI tools?
Often because they were excluded from the design. A tool built for clinicians without clinician and patient input tends to add layers of work rather than remove them, which is why demonstrations draw eye-rolls rather than adoption.
Should clinical AI make decisions?
No. The clinician remains the team leader and the accountable decision-maker at all times. AI works best as a co-pilot — a second brain that reduces load and surfaces detail — with human judgment as the first brain.
What does a clinician need before trusting a clinical AI tool?
Evidence it passed clinical governance, that clinicians and patients shaped its design, published performance data for the relevant populations, disclosed biases and harms, and proof it reduces rather than adds to the workload.