When the Patient Becomes the Training Data

In January 2025, the Office for Civil Rights (OCR) for the Department of Health and Human Services updated the HIPAA Security Rule for the first time in over two decades. The OCR's announcement contained a subtle acknowledgment. The OCR now anticipates that healthcare institutions will track where every AI tool interacts with patient data, and will include these tools in the same risk assessment as they would for servers and laptops. The OCR was trying to address a problem that healthcare institutions had to address autonomously.

Similarly, a study in BMJ Health and Care Informatics reported that 18% of the general practitioners in the U.K. had used an unapproved and unsecured AI tool outside the NHS to draft a note or a letter. These were faster and work was due.

The Data Is Gone

For example, once a clinician posts a patient story in a public chatbot, that information is sent out of the secure network and onto a server that is beyond the health system's control. Without a business associate agreement, that can be considered a legal violation, and that's just the easy part of the problem. The data does not come back. Once a public AI model has read a segment of a patient record, it cannot be recalled. It cannot be removed. It is impossible to verify if it will be reused in a subsequent response to another user at a later date.

This illustrates the problem of AI and privacy in general, and how it extends beyond the data breaches that health systems are becoming increasingly familiar with. For example, a stolen laptop is a data breach with a defined scope that has a defined response. Once a model has AI absorbed patient data, that exposure is endless, because there is no limit to the system in which the data now exists. For the past two years, ECRI (formerly known as the Emergency Care Research Institute) has listed AI related risks as the highest health care technology related risks, and it is largely due to this type of exposure, which is caused by no efforts to "break" a system. In many cases, it has the appearance of an effort to improve the efficiency of "good" work.

Speed is the key factor in the popularity of new tools. The end of the day is especially hectic for doctors and other health care professionals, especially if there is an opportunity to save that health professional time. You can guarantee that the tool will be back in use tomorrow, as a colleague will have learned about it in the meantime. For IT professionals in charge of data security, it is even more concerning as the data considered sensitive by them has already been collected. The trend has been established through every consumer technology product that has been legally introduced to hospitals. For health care professionals with a deadline, simply blocking access to a given web page will not deter them. It will simply be pushed out of sight into a more secure location.

There is a person behind every record.

When I was a back-office medical biller, I was really far removed from dealing with patients directly. I got a firsthand view at the part of healthcare that patients did not get to see. For example, one time a patient called in worried sick, because an error in billing on our end made her bill double overnight. The answer was simply a duplicated billing code, and with a little bit of honest plain speaking on my part, the issue was resolved. She did not thank me. What she remembered was the error was fixed.

The concept of a database being a collection of people has helped me bring focus to the importance of patient-data confidentiality. After all, every row of data represents a person who is never even given the choice of being a data point in a hypothetical and, let's face it, foreign data experiment. Patient-data confidentiality is not simply a box that can, and should, be ticked. It manifests in the identity of each patient and each model. The model that leaks is a broken promise. It is also, most importantly, legally and ethically a liability. If a model leaks, it is a broken promise to each woman on the phone.

That promise is also the most important and the most valuable asset. When a system has a patient with a trust and a promise, that patient will tell the truth. The truth is what makes that record valuable and worth something to future patients, clinicians and models.

Shadow AI is the most established risk. The less obvious one is the 'Approved AI Tools' clause. A health system can sign for an approved AI tool and they still could be completely in the dark about what happens to their data after it leaves the building. The data could be used to train their models. In their AI Risk Management Framework, the National Institute of Standards and Technologies explains that a system cannot manage a risk they have never expressed. Presently, for most systems, that still holds true when we talk about AI.

OCR has now indicated that the map is required. With the proposed rule, a list of patient data traffic technologies - including those AI tools that interact with the technology - is requested, and the list must inform the risk assessment in place of being in a binder. Constructing that list is slow and tedious, but it is the necessary work. A tool that no one has documented is a tool that no one is watching.

Contracting language is where the greatest liability and risk is - most people do not see it. A vendor clause that allows the use of de-identified patient data for product enhancement sounds benign, until you learn how the data was de-identified and who verified it. Patient data can be sent by a subvendor, three levels downstream, to locations that no one on the original signature page intended to send it. These risks are not evident in a product demonstration. They will be referenced in a breach report.

What Health System Leaders Must Construct

AI tool use must be treated as a concern that is never closed rather than as an unsigned memo. Name someone who is accountable for knowing the destination of patient data when an AI tool is operating, and empower them to turn the AI tool off. List both sanctioned and unsanctioned AI tools that interact with patient data, and include them in the risk assessment. Establish the policy that predates the arrival of the next tool, to ensure the protection of the data that never leaves the organization.

The most complicated work is done by people. Staff resort to using their own tools when the provided tools are slow. These are not slow problems that should be solved by sending an email. These are better solutions that offer a clear and fast way to address the real question that every clinician has: what alternative options exist that won't compromise the safety of our patients? People tend to protect such things. Employees will safeguard data better if they understand the risks of a chatbot more than any website that has been blocked.

Technology is always evolving, and with each passing day, there are new tools that are even more advanced than their predecessors. The one thing that remains consistent is the commitment behind the record. Someone placed their trust into the system to keep information about their personal health safe, and the system is the custodian of that trust, regardless of whether a model is integrated or not.

Context and Sources

This edition is based on the OCR proposed amendment to the HIPAA Security Rule, BMJ Health and Care Informatics findings regarding physician use of unsanctioned chatbots, the National Institute of Standards and Technology AI Risk Management Framework, and health technology hazards from ECRI. It carries on the threads of issue 28, The Consent Crisis, issue 31, Shadow AI: The Symptom, Not the Threat, and issue 33, The Consent Record Is Now Evidence.

Christopher Hutchins Founder & CEO, Hutchins Data Strategy Consultants

Tags: healthcare AI privacy · patient data security · shadow AI · ePHI · HIPAA Security Rule · healthcare AI governance · AI risk management · CISO AI privacy risk

When the Patient Becomes the Training Data

The Data Is Gone

There is a person behind every record.

What Health System Leaders Must Construct

Context and Sources

One signal a week. No noise.

Facing a challenge like this in your own system?

When the Patient Becomes the Training Data

The Data Is Gone

There is a person behind every record.

What Health System Leaders Must Construct

Context and Sources

One signal a week. No noise.

Facing a challenge like this in your own system?

Continue exploring

Read more on Insights

On the Signal Room podcast

More from the Pulse

More from Chris Hutchins