What Scalable Oversight Looks Like in Healthcare AI

Most health systems focused their initial AI oversight efforts on a limited number of applications. Each implementation was given a thorough review as there was an opportunity to do so. This oversight model fails with scale. A health system that operated a few AI models a year ago may now deploy twenty models in the emergency department, the revenue cycle, nursing, and radiology. The careful review that was effective at the beginning becomes the model that everyone goes around in order to accomplish their work.

The imperative to succeed with oversight mechanisms is no longer hypothetical. Note-taking systems that listen to and record verbal interactions during patient visits are becoming commonplace, and at least one major health system is now facing lawsuits because of their employees recording patient visits without obtaining prior explicit consent. The takeaway is not that the technology is harmful. The oversight that is designed for three tools fails for thirty tools, and the gap between the amount of technology a health system uses and the amount of oversight the organization has is where the greatest risk is.

The goal of oversight that can scale is not more process. It is a small number of systems that expand with the portfolio instead of fighting it. Some of those systems are becoming mandatory.

What cannot be seen cannot be controlled

The first is the least exciting, and the most often omitted; a singular registry for every AI tool that interacts with patient data. Not a slide from a vendor meeting. Not a list in the memory of an analyst. A record that is maintained and states what each tool does, what data is captured, what data is done with, who owns the data, what the data is used for, what consent (if any) is applicable and how the data is evaluated.

Most organizations cannot provide this on demand, and those that do, often do so with some degree of shock. Tools enter through departmental budget lines, a vendor pilot, or as a feature that was enabled within a product the organization purchased. Absent a singular view of all of the tools, the oversight is merely guessing what the purpose of the tool is. A registry turns oversight from a series of one-off reviews to something the organization can actually manage.

A front door, not a labyrinth

A registry states what is present. The front door controls what is permitted. No AI tool should interact with a patient without passing through a defined front door that answers the same questions every time. Who owns it? What does it capture and where does the data reside? Does a consent process apply? How and who will evaluate the output?

Upon first hearing this, it might sound like tediousness, and when poorly executed, it could result in that. However, the objective is the opposite. An unobstructed pathway that addresses the same fixed questions pre-deployment is calmer and faster than the alternative, which is addressing those same fixed questions in the aftermath when a clinician or patient has been impacted. The front door is not an obstacle. It is about choosing when to answer the fixed questions, rather than being forced to, and choosing to do it at your own pace.

Validation that scales with the output, not the calendar

This is the first part that breaks quite easily. AI generated documentation is not automatically correct. Summaries can have errors and omissions, and inaccuracy often concentrates around areas of a note that influence the next steps of patient care. By signing an AI generated note, a clinician is saying that the note is correct, thus the bar has remained at the same standard. What has changed is how much material a clinician is now expected to say is correct.

Oversight has to account for the math. If it takes time to check the output, that time must be absorbed by the workflow. If the volume of AI output increases without a corresponding increase in the time available to verify the output, verification ceases. The tool that was intended to reduce the workload begins to create risk. This is particularly true because of the effect of scale, and that is why the effort to validate has to be proportional to the output automation of the tools, and not to the frequency of committee meetings.

Scale leaks at the vendor terms

Most tools are from external vendors, and the majority of risk associated with those tools comes from the contract. A vendor can restrict the rights of the user to protect their own data and eliminate the risk of errors, provide vague terms regarding data retention and deletion, and offer no assurances regarding the performance of the model over time. While these terms can individually be manageable for one tool, having a dozen tools can place the risk beyond the control of any registry.

Terms that hold up at scale have similar structures. The health system controls the duration for which data is retained and the timing of data deletion. Once data is captured, it is not reused for other purposes without additional consent. Access and deletions are recorded in an unalterable format. Vendor employees cannot access identifiable recordings without specific authorization, and the system retains the right to perform audits and access the actual metrics. The contract is the beginning, not the end. Vendor behaviors are to be assessed on a continual basis rather than a single assessment at contract signing.

Oversight is a team activity with an intentional escalation process

No single function is sufficient. Clinical leadership evaluates the impact of a tool on the workflow. The Legal function assesses liability. Privacy and records leadership identify the data exposure. The IT function evaluates the system and its structures, and assesses data flows and protections. Assigning responsibility for oversight to a single function results in the absence of the other three functions of oversight.

What scale adds is the urgent need for a definitive method to signal the raising of a hand. The earliest indicators of a given problem are typically minor. It could be a comment a nurse makes during the shift. It could be a vendor update that is inconsistent with what was written in the signed contract. These weak signals are never communicated as emergencies. Therefore, it has to be easy to elevate these signals, and it has to be obvious where these signals are to be addressed. Oversight is effective when it is a predictable element of team operation and is in place before something goes wrong, as opposed to being convened after the fact.

The tools are already in the building, and the law is still catching up to them, which means for now the weight sits with the organizations deploying these systems. This is an unfortunate position to be in and simultaneously an interesting opportunity. Oversight that is built on purpose, while time is still on our side, costs much less than oversight reconstructed in the middle of a lawsuit or a safety assessment review.

Scalable oversight is not a check on innovation. It is the mechanism that allows a health system to add the next ten tools while tracking the first ten tools, and enables leaders to have confidence in the tools and the outputs they produce. The systems that construct these structures now will continue to be able to work as their AI tools and systems grow. The ones that choose to wait will have to spend time unnecessarily justifying to others why no oversight was in place.

Christopher Hutchins Founder and CEO, Hutchins Data Strategy Consultants

Tags: AI Health Pulse newsletter · healthcare AI · AI in healthcare · AI oversight at scale · clinical AI registry · ambient AI documentation · AI governance

What Scalable Oversight Looks Like

What cannot be seen cannot be controlled

A front door, not a labyrinth

Validation that scales with the output, not the calendar

Scale leaks at the vendor terms

Oversight is a team activity with an intentional escalation process

One signal a week. No noise.

Facing a challenge like this in your own system?

What Scalable Oversight Looks Like

What cannot be seen cannot be controlled

A front door, not a labyrinth

Validation that scales with the output, not the calendar

Scale leaks at the vendor terms

Oversight is a team activity with an intentional escalation process

One signal a week. No noise.

Facing a challenge like this in your own system?

Continue exploring

Read more on Insights

On the Signal Room podcast

More from Chris Hutchins