AI: The Second Set of Eyes That Doesn't Blink Under Pressure
Medical imaging AI works best as a second reader: it surfaces the case, the radiologist still makes the call. Why these tools stall on data, and how clinician trust is earned.
First published in The AI Health Pulse. Also on LinkedIn.
From the inception of the first x-ray to the advancement of MRI cross-sections, medical imaging has evolved over the past century, remaining integral to clinical diagnosis. The quality of medical images has continued to improve. However, the standards of the working environment have failed to keep pace. The demands placed on radiologists have increased the volume of studies that they are required to complete in even less time. Fatigue and volume have an irreversible effect on the limits of the human visual system.
An emergency physician that I spoke to on The Signal Room articulated the boundaries that the human visual system can no longer transcend. The aspect of lived experience is something that the AI model does not possess. Years of experience at the bedside gives a clinician the ability to remember detailed medical histories of patients with the same findings, some of whom were ultimately diagnosed with a tragic outcome, and some of whom were saved. On the other hand, the model trained on the data is only capable of pattern recognition. It is the experience that sustains the value of the work.
The value of a well-crafted model is that is reminiscent of the illumination that a second set of eyes provides for the one-thousandth case that a radiologist has to complete at the end of an exhausting work shift.
The Pressure the Numbers Conceal
Thinning staff makes diagnostic reading most difficult. When the office is understaffed or empty, a small team of radiologists examines a massive collection of studies. When a team is forced to work through case after case at that pace, the quality of their work suffers. Radiology is replete with studies of missed findings and discrepancies between readers. These are not failures of skill or character. This simply occurs when a team is forced to work at an unreasonable pace while carrying a large and detailed mental burden of existing cues that continue to fill.
The cost of an error is the greatest when the cue is the largest. This is the real situation that must be pondered when considering the role of AI in imaging. A secondary reader is not about the haste for the sake of expediency. When the environment is pushing against quality, a secondary reader is there to maintain quality.
What a Second Reader Actually Does
The honest framing is a narrow one. A model trained on large volumes of imaging can flag a pattern a busy eye may pass over and point the radiologist toward it. On a chest study it might mark a probable pulmonary embolus and say, in effect, look here before you sign. The tool does not make the diagnosis. It speeds the read and surfaces the case that needs attention sooner.
That sort of example demonstrates the importance of being timely. One instance involved a trauma center that attempted to address a situation where the center exceeded target average response times. The center utilized a CT scan expedite tool that allowed them to push a time sensitive CT to the front of the queue, allowing them to place the CT in the hands of the surgeon quicker. The tool made no critical decisions. It simply moved the time sensitive study to the front of the line, and a worker completed the task. When utilized in this manner, the model significantly reduced the risk of a real finding being overlooked, and allowed the model to work in front of the more routine studies.
All of the tools being utilized do not replace the radiologist. The model does not know the study, let alone the patient, and does not help answer the question that might be posed by the referring physician. The model helps produce a signal, but a clinician interprets that signal. The critical judgment remains with the individual who signs off the report.
The Model Is Wrong, and the Human Has to Know It
This situation is the opposite of the previous. An algorithm can illustrate a case that actually signals a benign finding where no report nor particular follow up is warranted. This situation can be realized by an experienced radiologist, having much exposure to study review. The model alone does not have this level of experience, meaning that a human being must be involved.
I've had the same experience, from the patient perspective. Blood pressure spikes, a system flags it, and it is a valid flag. For a patient whose blood pressure is well controlled and whose blood pressure is well treated, this is a normal blood pressure, and this alert is clinically meaningless. Within seconds, this is disregarded by a clinician. The model, given this context, would not have prompted a step. The ability to override is not the privilege we allow clinicians. It is the model, context blind, and decision steering.
There is a less evident, much quieter side to oversight. There is a cost to every flag a system sends to a clinician. A system that raises so many low value alerts desensitizes clinicians to the tool, and in the same way that hospital equipment alarms become background noise. The alert must have a rationale for the cost. The best designing teams satisfy the requirement that the alert be an improvement to patient care, and disable the alerts that cause workflow disruptions with no positive impact on care.
The reason many imaging tools stall after the initial demo.
The design of a tool that dazzles users at 2 P.M. can become completely worthless at 2 A.M. The time and environment differences when a tool is created (calm) versus used (overwhelmed) are apparent. When tools fail at the bedside, it is tempting to label the problem an adoption issue. It is more accurately a design issue, and that distinction is important to know who gets the ultimate call to fix it.
That pattern is also seen in the data. A model that has been calibrated on a certain population with a certain set of devices may perform differently on your setup. The imaging device and its associated protocols may be different at your location. Additionally, there is no consistent way to store and label studies, and this can vary based on the departments of the organization or departments of a recently acquired company. A short demo usually will not cover that gap. It is a long and tedious task to prep imaging data so a model can be validated against it, and in the meantime that may be the most important factor in deciding if the tool can be relied upon.
Trust is Built Post-Rollout, Not During the Announcement
A tool's cleverness is not a factor in clinicians' adoptions. Rather, a tool is perceived as convenient when it is accurate on cases sufficient times. With ambient documentation, the early skepticism was shadowed with trust once the tool learned clinician preferences and clinicians were able to adjust the tool. For example, clinicians were able to tell the tool, "you wrote this," or "the patient actually said that." Those that were skeptical, but corrected the tool, eventually became advocates, and progress was dependent on it.
There is no shortcut for trust. Tools earn trust through reliability, and clinicians need to experience this over time. Trust is earned when adjustments can be made to the tool, and outputs can be verified. Furthermore, trust is earned when tools are validated in the specific clinical environments in which they are employed. Because of these factors, nothing is gained with a product announcement.
What Health Systems Should Aim to Build
Health systems derive real value from imaging AI solutions when they no longer view them as isolated products that have been acquired. These systems actively manage the quality and consistency of their imaging data before even "buying" a model. A tool's accuracy, for example, is only the starting line, and tools can be withdrawn when they have degraded. For these systems, the workforce is prepared for the role of monitoring this performance.
Most deployments leave ownership undefined. A signal from a model still forms part of a signed study, and the question of who is accountable for that read remains. That tool can support the decision. It cannot sustain the burden of it. In the end, a radiologist who walks away from the job a little less exhausted and a patient whose issue was captured because of a second set of eyes, a pair that did not tire out, looking along with the clinician, who still made the decision, is what matters.
One signal a week. No noise.
Join healthcare leaders reading The AI Health Pulse every Monday.
Facing a challenge like this in your own system?
See how we approach healthcare AI consulting and data and analytics strategy, or book a call.