Insight · AI in drug discovery

AI in Drug Discovery: Why Verification Is the Real Bottleneck

AI generates drug candidates faster than ever, which multiplies what must be verified. Why the bottleneck moved to validation, and the role of data integrity.

Featuring David Finkelshteyn on The Signal Room

Creatively speaking, AI in drug discovery is a generational leap in models that can construct novel molecules at a scale unachievable by a human chemist. However, on a Signal Room conversation, David Finkelshteyn, a builder of AI verification systems designed for pharmaceutical research, argues that is not the right place to look for the bottleneck. It would not make sense to him to generate molecules and leave them unverified because the ground truth is not a model's output. It is actual physical molecules that have been synthesized and tested.

It matters because it inverts the location of the 'hard work'. AI systems have made the first step, which is coming up with candidate solutions, almost costless. The work involved in verifying that a generated candidate is both safe and effective has become more difficult and has yet to change. We see the same thing in healthcare AI at Hutchins Data Strategy Consultants. The model is almost never the bottleneck. The bottleneck is verification and the underlying data.

Discovery Without Verification is Value-less

Finkelshteyn put forward that discovery and verification should be in tandem, in the computer and in the lab. Progressing from hundreds of competing hypotheses to a handful or a few dozen molecules requires synthesis and testing. A produced design is not a solution, it is a hypothesis. After this, each proposed solution progresses along the traditional drug development process. This traditional process takes each proposed solution through synthesis, in vitro testing, in vivo testing, and the selection of a clinical target audience before it is tested on people. Along this process, the proposed solution can fail at any of these steps, which does not depend on how the molecule was proposed or designed.

AI adds breadth, but Finkelshteyn claims it adds depth because it helps search an almost infinite number of compounds that humans could never form through pure invention. This in and of itself is a giant leap forward in the whole field of computational and machine learning. However, the same novelty that makes AI a game changer is what makes verification tougher. A compound that differs from previously known compounds has almost no historical information attached to it. It might fold in an unexpected way, not be accepted by the body, or provoke an allergic reaction, and because it resembles nothing tested before, there is little to be predicted from. It all comes down to testing.

The Regulator Doesn't Care How You Found It

One of the more comforting inputs provided by Finkelshteyn is in relation to the fears surrounding AI-directed drugs in regard to the regulatory path. For the purposes of lead generation, the FDA or the EMA do not seem to care how a molecule has been discovered — an AI lead still goes through the same tests and evaluations as any other novel molecule. Personally, he has said the fact that validation criteria for AI have also not been set (or lowered) is an indicator that he is more confident rather than more restricted. A threshold is a threshold, and the way a molecule has been discovered does not adjust it.

However, he also provided a lot of clarity to the difference between AI that proposes a molecule versus AI that makes a choice. A tool that participates directly in a clinical decision — his example of an approved assistant for histology scoring — is treated differently, and transparency and oversight are increased, since in that instance the model's reasoning is a part of the clinical act.

Fail Faster, Not Generate More

If there is no limit on AI generation, what is the cost of its use? Finkelshteyn argued that the cost is justified if the objective is to enable faster failure. In other words, the cost of one hour of computing time is justified if the goal is to determine that a candidate is not even close to passing a machine learning threshold, in contrast to committing multiple months of human labor to achieve a much later failure. This is particularly true when the vast majority of candidates that make it to a drug candidate stage eventually fail, with a high cost associated with every failure. In this case, reducing the time to a failure (a no) is as valuable as reducing the time to a success (a yes).

That worth comes about when trust comes about, which is where he placed discipline. When a task becomes more complex, the model needed becomes more complex; when the model becomes more complex, more opaque it becomes – resulting in neural networks with hundreds of billions of weights with reasoning that cannot be inspected directly. A scope discipline helped mitigate this: specifying the use cases that a model is allowed to work in. No matter how good a protein-folding model is at folding, he claimed, you do not let it predict a cancer drug – it was not trained, designed, or validated for that. He then complemented this with some basic validation 'hygiene' – that most data scientists would recognize: data that fits a rational distribution, and a strict separation of training and testing data, to ensure that the model is validated and that the performance is not the result of data leakage.

The Bottleneck Beneath the Bottleneck

When it came to the biggest constraint, he did not name an algorithm. When pressed, he said it was data integrity – which he claimed was the biggest bottleneck in AI for drug design, in healthcare, and in data science, in an overarching sense. His baseline was that the system is fully traceable and auditable, logging everything, so that at every moment of the pipeline, you can look back and know what happened and how the data was changed. In the absence of that, most forms of validation are theater: what cannot be reconstructed cannot be defended.

He applied the same discipline when dealing with teams and language models in this line of work. When teams work with language models, they are prone to hallucination in cases when they lack knowledge, or otherwise, within a given context, the models become powerful analytic tools. His observation is that context diminishes hallucination. When working with language models, the goal should be to assist these models in analyzing the data and the references to be supplemented, instead of providing the data. In his opinion, the greatest short-term potential is offered by language models and related tools that can replace the burden of oversight and documentation. These tools cannot fabricate evidence or push an unsafe drug through. Rather, these tools can assist scientists by liberating their time so that they can engage their intellect in the process of creating new medications.

How Hutchins Approaches AI in Drug Discovery

Our work focuses on the part that decides whether AI in research pays off: the verification loop and the data integrity beneath it. We help organizations build pipelines that are traceable and auditable end to end, hold each model to a defined and validated use case, keep training and validation data honestly separated, and treat generative tools as analysts of trustworthy data rather than sources of invention. It is data quality and disciplined verification — not raw generative capacity — that turn AI ambition in life sciences into something defensible.

These themes run throughout The Signal Room podcast, where practitioners building AI for pharmaceutical and life-sciences work describe what separates a real result from a confident guess.

Authoritative sources

Have a data or AI challenge like this?

A 30-minute call is enough to tell whether we're the right fit.

FAQ

Frequently asked questions

What is the real bottleneck in AI drug discovery?

Verification, not generation. AI can propose vast numbers of candidate molecules quickly, but each still has to be synthesized and tested in vitro, in vivo, and ultimately in humans. The ground truth is physical, and that is the slow, costly part.

Does AI change how drugs get approved?

No. Regulators evaluate a molecule's properties and safety, not how it was discovered. An AI-designed compound goes through the same validation as any other, which is reassuring rather than limiting.

Why are AI-generated molecules harder to verify?

Because novelty cuts both ways. A compound unlike anything tested before has little historical safety data to draw on, so it demands more testing, not less.

What is the deeper constraint behind verification?

Data integrity. A verification pipeline is only trustworthy if it is fully traceable and auditable — every transformation logged — so you can always trace a result back to where it came from.