Healthcare Data Quality: The Cost Behind Every AI Decision
Why AI exposes data quality problems instead of hiding them — the leadership, prevention, and governance discipline that makes healthcare data fit for use.
Featuring Danette McGilvray on The Signal Room
Organizations routinely skip over a question that will cost them, and I mean cost them dearly, when starting to implement AI systems. During a recent Signal Room episode, data quality specialist Danette McGilvray, characterized the situation best. When a new capacity comes to an organization, it becomes the responsibility of the leadership to evaluate capability and determine if the organization should pursue it. If the answer to the prior question is affirmative, then the organization has to determine the approach to maximize the new capability and the means to mitigate any potential negative impact. She says that leadership over and over again fails to respond to these basic questions. More importantly, the data required to answer the question has been neglected.
McGilvray says that this is a problem she has been dealing with for a couple of decades, and because of that she considers herself a second-generation pioneer because she built her practice on the works of Tom Redman, Larry English, and Rich Wang. The central piece of her warning is that when it comes to healthcare and AI, McGilvray argues that organizations are more likely to spend millions on technology with a reluctance to spend even a small fraction of that on the data that will be the backbone of the systems that the technology is designed to support. This is a trend that has become more prevalent at Hutchins Data Strategy Consultants in recent years, and we now see it as a leading indicator of a failing AI-driven change initiative.
AI Doesn't Expose Bad Data, It Conceals It
The old saying that refers to bad data being input to a system and the resulting output also being bad data, "garbage in, garbage out", was based on the assumption that the system was showing some form of garbage that was obvious to the user. This has changed since the introduction of generative AI systems. Now we can expect the system to produce a well-structured answer that is complete and confident. In reality the answer may be fundamentally based on an assumption that is not warranted and may not even be relevant to the question. This is a real danger of the underlying quality problem that we are currently dealing with in AI systems.
McGilvray cited several examples of AI systems generating citations and references that appeared to be real but were entirely fictitious. Clinicians and analysts searching for errors of a more obvious nature are unlikely to catch fictitious citations as the AI will show no indication that anything is wrong. The same phenomenon is present with bias. AI systems are intended to mimic the recommendations of the training data. She explained that AI systems are not capable of bias. However, AI systems trained on biased data remain biased and repeat the recommendations of the data.
This led to a distinction that is useful for all future deployments. She would prefer a model that says data has historically been XYZ as opposed to one that says we should be doing ABC.
Quality Is Not Slower. Skipping It Is.
The assumption that not addressing data quality leads to a faster outcome is a huge fallacy. Things done correctly the first time may seem slow, but the lack of attention to the fundamentals derails the project. This leads to far more expensive, time-consuming rework in the future.
She explained the economics using something understood in good quality work. A defect that costs $1 to fix in the design phase costs roughly $10 in the next phase, and about $1,000 when it reaches the testing phase. After a defect enters production, the multiplier is no longer a theory if the company cannot progress further. She described a large ERP migration, where the sites that performed the necessary data quality work were able to close their books in a few weeks after go-live. In contrast, the site that refused to perform the work at all was unable to ship product and ended up pulling people across the world for emergency calls in the middle of the night.
The frame of reference she described is the one that communicates well to executives. Data quality debt is the same as technical debt. Bypassing a data quality task means incurring a real liability, which will cost more to address in the future, whereas the costs to address it will continue to grow gradually until the data quality debt is addressed and makes itself known.
Prevention Over Fire-fighting
In correcting data problems, most organizations expend the majority of their efforts after bad data has already caused issues, and then working to find and fix the bad data. One of McGilvray's challenges to leaders is to focus on prevention and the root cause, which she described as the distinction between fire prevention and firefighting. Correction is a never-ending and reactive effort. Prevention is upfront and significantly cheaper. Eventually, prevention is the only way to decrease the number of issues.
Success relies on the type of work she describes as unglamorous. Having shared metadata is essential. Shared metadata includes having the same names for fields, the same definitions, and having a record for value lists and codes. This allows teams to communicate using the same data, without possibly differing in interpretation. This governance brings teams to the table for the initial conversation, and "People who had never met, all depending on the same data, finally aligned" because this provided them a reason, and a forum, to discuss. Although none of this is exciting, it is essential for the model to be built on.
Quality Serves the Mission, Not Itself
McGilvray never does data quality for the sake of data quality. It is always about the organization and the goals they seek to achieve, which can be delivering safer care or making safer decisions. Quality work earns its place only when it is tied to that outcome, and a quality report whose only audience is the data team has lost the plot.
Any potential change must be coupled with a human dimension to ensure that it will be effective. Whenever a quality program is implemented, it makes changes to roles, systems, processes, and training. Those changes will only become lasting changes when the various forms of leadership understand and appreciate the human dimension and actively work toward it. When leadership is willing to address the issues they face, those changes will also become lasting changes. She recalls a system owner early in her career who discovered tens of millions in unexpected costs and chose to share the problem widely rather than bury it, on the principle that nothing changes if no one will name what is broken. That, she said, is what a real leader does.
How Hutchins Approaches Data Quality
Our work treats data quality as a leadership and operating problem, not a tooling purchase. We help organizations make the case for investing in data at the scale they already invest in technology, embed prevention at the source rather than chasing errors downstream, and build the shared definitions and governance forums that keep quality from decaying. The objective is always tied to a decision the organization is trying to make better — never quality for its own sake.
This work is inseparable from data governance and from the readiness that determines whether a given AI use case can be supported at all. Quality is the layer that decides whether the model you deploy can be trusted — or merely sounds like it can.
These conversations are at the center of The Signal Room podcast, where leaders who have lived the cost of poor data describe what it takes to make information fit for use.
Authoritative sources
Have a data or AI challenge like this?
A 30-minute call is enough to tell whether we're the right fit.
Frequently asked questions
What is healthcare data quality?
The degree to which clinical and operational data is fit for the purpose it is being used for — accurate, complete, consistent, and trustworthy enough that decisions and models built on it hold up.
Why does AI make data quality more urgent?
AI presents its output with confidence regardless of the quality underneath. Poor data no longer surfaces as an obvious error; it is laundered into a fluent, plausible answer that is harder to question.
Is fixing data quality later cheaper than fixing it early?
No. The cost of a defect escalates the further downstream it is caught — a small fix at the source becomes expensive rework in testing and far more expensive once it halts the business. Prevention is consistently cheaper than cleanup.
Whose job is data quality?
It is a leadership responsibility, not only a technical one. Quality work triggers changes to roles, processes, and training, and only sustained leadership support keeps those changes in place.