AI Explainability in Healthcare: Building Bias Out by Design
Why explainability and human-in-the-loop design belong in clinical AI from day one — how transparent reasoning surfaces algorithmic bias instead of hiding it.
Featuring Keshavan Seshadri on The Signal Room
All AI clinical models can be right and produce accurate answers, but may still be considered unusable. An AI model can generate an accurate answer, and provide clinicians no means of assessing whether they should trust the AI. There can be no reasoning, no confidence, and no clarity as to what information the AI relied on. During a Signal Room conversation, Keshavan Seshadri, a machine-learning engineer, argued that this gap is a design decision, not a deficiency. This means that explainability and transparency should be included in the design of the model, rather than added later.
Seshadri argued that good clinical AI has a specific goal of improving patient outcomes, and that the more context a model has about a patient and the clinical problem they are facing, the better. We, as Hutchins Data Strategy Consultants, have observed the same problems, with the result that models are of high quality and good design, but do not function in the real world because of the lack of evident reasoning to the end user.
The Dimensions of Context That Clinical Models Require
Seshadri subdivided healthcare context into four types. This breakdown is useful when scoping a clinical model. First, there is patient context. This includes the patient's medical and surgical histories, the patient's demographics (including their age), and the patient's conditions. Next, there is task context. The difference in the requirements for diagnosis and those for an operation or triage is an example. Then, there is the context of available human resources. Finally, there is institutional and regulatory context. These are the policies and regulations within which a system is required to operate.
The goal is not to maximize the volume of data the model ingests. Instead, the goal is to provide the model with the same situational awareness a human would possess. A suggestion or piece of advice, given without an understanding of the circumstances within the constraints of regulation and the character of human resources, is missing the context that makes a human clinician's judgment safe.
This can be an Integrating Tool, but Not an Autonomous One
Seshadri's analogy for systems as they exist today was the model's completion to a university degree. It is an impressive completion, but not to the level of a licensed practitioner who has completed a decade-long apprenticeship. This model can assist with a variety of tasks, but is not a specialist. Treating the model as a specialist skips the evaluation and human oversight that are required in order to bridge the gaps.
That gap is best managed by making the model honest about its own uncertainty. He argued for outputs to be bounded by a level above which the system defers to a human rather than stating, with confidence, the system's answer. He was particularly concerned about the risk side in the context of medicine because the costs of being wrong are not even. A false positive, or erroneously identifying the presence of a disease, results in additional tests, expenses, and worry. Missing a condition that is present, and is thus a false negative, can be much more severe. A model tuned without regard to that asymmetry optimizes the wrong thing. In his framing, deferment should be proportional to the clinical risk; a potentially harmful diagnosis is at one level, and a surgery is at a much higher level.
Explaining the Model Catches Bias
Seshadri's argument that the explainability of a model can be linked to the fairness of that model is the best defense against the introduction of hidden biases due to the training data. In the context of a large model that has been trained on internet data, and the bias that the data contains is potentially untraceable to a decision that cannot be inspected, that bias becomes intrinsically part of the model.
Explainability helps address this challenge. If an AI system provides an explanation for its reasoning, reviewers (human or otherwise) may examine its logic and evaluate whether it has been skewed. Bias that has been explained can be caught. Bias that is unexplained cannot be uncovered. For this reason, he viewed transparent logging, the reasoning of the model step-by-step, the selection of Model Evaluation and Tools and Steps, as the foundations of the system, rather than as optional features. He also identified feedback loops, reinforcement learning from human feedback, as the formal mechanism for returning the identified issue to the model, so that the behavior of the model is improved instead of repeated.
Correct Answer, Incorrect Context
There is also a subtler failure that pure accuracy gauges never capture: a response can be factually correct and still be wrong for the moment it lands in. A clinically accurate answer delivered without regard to the person's emotional or relational context can do harm, even when the information is true. He argued that the designers of these systems want a positive impact, so that intent has to extend to how an output is communicated, not just whether it is correct. In a setting where a result could affect a person's life, contextual correctness is part of the requirements, and not a favor.
Three Principles to Apply in Design
Seshadri's design principles demonstrate excellent guidelines to apply to any clinical AI. Understand where and how the AI model was trained — be aware of the data and the populations and geographies it actually represents. Avoid repurposing the AI model to perform tasks for which it was not trained or validated. Integrate explainability, transparent logging, evaluation, and guardrails into the AI from the start, as in the highly regulated and high-risk area of healthcare, these are fundamental requirements.
How Hutchins Approaches Explainability and Bias
Our work insists on these properties before a model reaches a clinician, not after. We help organizations specify the context a model needs, set confidence and deferral thresholds that respect the asymmetry of clinical risk, and require transparent reasoning so that bias can be seen and corrected rather than quietly perpetuated. We treat explainability as the mechanism that makes responsible AI and humane clinical design real rather than aspirational.
These themes are explored throughout The Signal Room podcast, in conversations with the people building AI systems about how to make their reasoning visible — and their bias catchable.
Authoritative sources
Have a data or AI challenge like this?
A 30-minute call is enough to tell whether we're the right fit.
Frequently asked questions
What is AI explainability in healthcare?
The ability of an AI system to expose its reasoning — what it weighed, how confident it is, and why it reached a recommendation — so a clinician can validate or override it before any action is taken.
Why does explainability help with algorithmic bias?
When a model explains its reasoning, a human reviewer can see biased logic and flag it. Bias hidden inside an unexplained decision stays invisible; bias that is articulated can be caught and corrected.
Can explainability be added to a model after it's built?
Not well. It has to be factored in during design — through logging, transparent reasoning, evaluation, and guardrails — rather than retrofitted onto a finished black box.
How much should clinical AI defer to humans?
Deferral should scale with risk. Lower-stakes support can run with confidence thresholds and review; high-stakes decisions like surgery or real-time intervention should keep a human in the loop by default.