
I’ve seen a lot of talk recently on the possible roles of AI in clinical research. I say “roles” plural, because really it’s not a simple case of whether or not AI will enter the clinical research arena (it already has to some extent), but really how best can AI be utilized to actually improve the process.
In theory, an AI could be used in many of the roles that are currently human. With certain limitations and caveats, AI has been shown to be fast and effective at producing written content or organizing data – both things that are vitally important in the design and reporting of clinical trials. AI models can also be trained to identify and track signals in safety data or find patterns in protocol deviations that might otherwise be too subtle to see. An AI might find an efficient CRA scheduling pattern for site visits to maximize on-site time and minimize travel, and a predictive model based on social media posts and web searches might predict the next location for an infectious disease outbreak so that local study sites can be brought online quickly.
But those limitations and caveats I mentioned above are the weak link in all things AI-related, and I do not for second think that we’re at the point where AI can be considered anything other than yet another tool to draw upon when performing clinical research.
Google’s experiment with using Reddit as a data source for training its Gemini AI is a lesson in the absurdity of allowing an AI model to teach itself. Without human oversight, and lacking any kind of “common sense” to filter out clearly ridiculous text, the kind of answers that were generated to fairly mundane questions ranged from amusing to dangerous. In our context, using AI to perhaps assist in generating protocols, informed consent documents, or any kind of literature, by training it on a database of existing examples, would potentially have the same risks. Aside from the fact that the protocols are proprietary information for each sponsor, protocols are notorious for requiring amendments to fix errors or provide clarifications. They are inherently imperfect, as much as we all try to get them perfect! Even if the protocol doesn’t have errors as-written, there are often compromises based on limitations of time, pre-existing data, drug tolerability, subject preferences, or any of a myriad of reasons to choose a particular timeline or set of measurable outcomes. In fact, as research should build upon what has gone before, it makes no sense to simply use old examples in the hope of drafting a novel approach that might be easier, faster, or cheaper to execute.
My experiences in AI training over the last year have been eye-opening, because of the sheer number of people and hours required to properly try to craft and steer an AI towards even a semblance of mediocrity. It was amusing because there the “contamination” of data was at times in both directions – as an excess of incorrectly formatted math problems were offered up an examples, the model started producing solutions written using the same incorrect format, and as the humans were increasingly exposed to AI-generated content, their own human content began to look more and more AI-generated, such that it began to get flagged as such and rejected!
With regards to clinical trials in particular, one area that I found concerning was with medical knowledge and reasoning. It was very obvious that very few actual medical experts were contributing to the process, as it would have been prohibitively expensive to pay for them. With limited knowledge and experience, training examples were overly simplistic and incorrect, and often relied on basic pattern recognition without a logical understanding of why something is what it is. Historically, AI models have been very good at pretending to be smart, while actually falling way down on standard IQ assessments. If the people being used to guide and train the factual knowledge and logical reasoning of AI in these specialized areas aren’t themselves highly knowledgeable and intelligent, then the model truly is doomed. After all, we’ve already established that they do a very poor job of training themselves!
My point here is that, even if we are able to fund and task the experts to help train up AI models on the kind of specialized work required in clinical trials, it’s still going to require (knowledgeable, experienced, intelligent) human oversight to “check their work”. As such, AI might indeed save us a ton of time and typing, but I don’t think it’s going to be the case of asking “Hey Siri, draft me a phase 1 dose escalation study to find the maximum tolerated dose of generimab” and simply emailing it off to the FDA. There is a very real risk of the proponents of AI over-promising and under-delivering. In the specific area of medical monitoring and subject safety, I would not trust an AI to be able to apply the same clinical reasoning skills of a trained physician, no matter how capable they might appear in offering up diagnoses or treatment plans (to be fair, most AI training steers specifically away from this – but my concerns stem from specific attempts to address this niche). Although I have previously argued that safety monitoring is surprisingly specialty-agnostic, compared to protocol design and study start-up activities, one thing we all have in common is having had the same kind of experience-based training. We’re also at the end of the bell curve that has typically been separate from the AI models, where being able to logically reason and apply our knowledge is far more important than simply knowing facts.
Lastly, and this is a huge concern, is the phenomenon of “hallucination”. This is the situation where an AI model, faced with an inability to answer a question due to a lack of factual knowledge or an inability to logically reason its way through a problem, simply makes something up. Even when explicitly instructed NOT to do this, and to instead say something like “I’m sorry, I’m afraid I can’t do that…”, AI seems to be inherently a people-pleaser. This could result in false statements in investigator brochures or consent forms, in inaccurate study timelines as enrollment rates are hallucinated based on non-existent data, or in inaccurate medical coding or protocol deviations. AI-generated hallucinations are by their very nature IMPOSSIBLE to detect from an accurate outcome, unless you know more and better than the AI. And if that is the case, then why are you using the AI to do the work in the first place…?
With all of that said, do I think that we’ll see more AI in clinical research? Yes, I do. Most importantly, AI is quite simply getting better. Recently, and for the first time, an AI model demonstrated an IQ that was above 100 – suggesting some truly useful logical reasoning is being performed. As an industry, AI software companies can learn from the errors and hopefully train up their models on legitimate content with legitimate guidance, and if the tasks are highly specialized and limited then the advantage of an accurate, fast, tireless entity is obvious. Even in areas where human oversight is expected and required, there’s nothing to stop AI from being a useful screening tool and an assistant to tasks such as medical monitoring or safety signal detection.
I am leery of putting too much stock into the promises of AI just yet, but clearly several groups are working on trying to provide meaningful solutions to the clinical research world and things are likely to improve. One thing is for sure though, they won’t improve unless people are willing and able to put the work into refining and optimizing the AI models for these purposes. If you get the opportunity to contribute to this work in any way, I say go for it!