The unseen risk of generative AI in healthcare

The discourse surrounding GenAI focuses on accuracy, bias and reliability, but these are only symptoms of a deeper change. The greater risk lies in how GenAI’s presentation of problems to clinicians affects cognitive reasoning. Archie Cotterill, Trainee Clinical Scientist – Clinical Scientific Computing in Medical Physics at Maidstone and Tunbridge Wells NHS Trust, explores .

Summary:

Current GenAI systems present structured information to users, shaping the initial problem before they begin independent reasoning and acting as an external influence
GenAI frames and constrains the questions asked rather than assisting in reaching answers, which is both impactful and hard to mitigate in the healthcare sphere
In healthcare, consulting with an LLM has been demonstrated to influence clinicians’ decisions, potentially reducing accuracy by up to 11.3% when its outputs are incorrect
GenAI impacting inexperienced clinicians is a significant risk, raising concerns that it may cause automation bias to become a default method of reasoning and eroding expertise
Safe use of GenAI depends on existing domain expertise which allows humans to accept or reject information; use of GenAI risks eroding that expertise, with far reaching impact

Hallucinations and systemic biases are among the most commonly discussed risks of generative artificial intelligence (GenAI), especially in the field of healthcare. Whilst valid concerns, the emphasis on these as primary risks is misleadingly narrow-minded, as it frames the problem as that of incomplete, faulty or biased data. This implies that the resolution is simply model refinement and better, more complete data, while overlooking the more fundamental shift: GenAI is not just providing information, but interpreting, structuring, prioritising, and presenting that information to the user before they have begun their own reasoning.

As a result, the much more subtle, and novel, risk is not the presence of incorrect information, but that GenAI alters the starting point of the cognitive processes underpinning decision making, constraining the range of possibilities explored. Hallucinations are dangerous not simply because they are wrong, but because when they occur within information presented as a starting point they are seen as trustworthy points to begin reasoning, contaminating the root of cognition.

The unique risks of GenAI in healthcare

Commonly cited risks of clinical AI deployment include the introduction of anchoring and automation biases. Anchoring bias describes the natural tendency of a person to disproportionately rely on an initial piece of information, whilst automation bias refers to the greater degree of trust afforded to results generated by automated systems . Similarly, concerns surrounding de-skilling of the workforce and subsequent over-reliance on technology are frequently raised. While these risks are important, they are not novel risks uniquely associated with AI; they are well-established phenomena across a wide range of decision support systems (a common example being drivers automatically following sat-nav instructions, despite clear, contradictory signage) and are longstanding considerations associated with automation more broadly.

The novel risk that distinguishes GenAI arises from its ability to automate aspects of a task previously completely reliant on human cognition. Unlike traditional deterministic systems, whose behaviour is well-defined, bounded, and predictable for a given input, AI systems are inherently non-deterministic, producing variable outputs that cannot be defined in terms of a set of fixed rules. More importantly, the role of GenAI in practice is not limited to the execution of a predefined set of tasks, but extends to structuring, framing and prioritising the information on which a decision is based. These systems therefore enter the decision-making process earlier than traditional tools, shaping the initial problem representation rather than simply supporting the resolution.

In healthcare, this means that while previously a clinician would have been required to perform their own implicit analysis of a problem in order to seek advice from a colleague or consult relevant literature, current systems can present a fully structured interpretation before independent reasoning has begun. This renders the beginning stages of cognition susceptible to external influence as the formulation of the reasoning framework is no longer the domain of the clinician. This leads to what could be described as ‘cognitive workflow contamination’ (CWC).

Some of the consequences of CWC have already been empirically demonstrated. Several studies show that consultation with an LLM significantly influences a clinician’s final decision. In one of the more striking examples, when clinicians initially decided against treatment for a synthetic patient and the AI model disagreed, nearly two thirds revised their initial decision in favour of the AI recommendation. Similarly, though AI assistance may increase clinical accuracy by up to 2.9% when the GenAI outputs are correct, that accuracy may decrease by up to 11.3% when these outputs are incorrect.

These findings clearly demonstrate that the influence of AI is not neutral, but directional. The impact occurs at the earliest stages of cognition and reasoning, meaning alternative interpretations may never even be considered — not due to being evaluated and rejected, but due to falling outside of the initially presented reasoning framework. The risks imposed by CWC cannot therefore solely be considered to be those of error or anchoring, but of constraint: the narrowing and pre-definition of the reasoning framework within which decisions are made.

The further impact of cognitive workflow contamination

Whilst the current effects of CWC are troubling, they likely only represent a fraction of the potential impact. A more significant risk presented is how these systems affect inexperienced clinicians and those in training. It has been argued that the introduction of GenAI into medical education may induce automation bias as a default method of reasoning. Furthermore, when clinicians lack specific task expertise, their ability to discount inaccurate AI generated advice is limited. The implication is that exposure to CWC during education would not only reduce the understanding developed by the students, but also impair their ability to independently gather, evaluate and prioritise information without AI assistance.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

A common sentiment is that an effective safeguard against the influence of GenAI is the presence and approval of a senior, experienced clinician before any action is taken. However, this assumption overlooks two key issues. Firstly, such experience is not static. Unless maintained, it diminishes over time — an effect seen in legacy systems, such as those programmed in COBOL, where aging expertise has led to systemic fragility. If CWC influences training, the effects will inevitably propagate forward, eventually resulting in senior clinicians predisposed to automation bias as a consequence of their formative practice.

Secondly, as CWC operates by constraining the initial reasoning framework rather than by introducing incorrect information, the effects may remain invisible to the supervising clinician. If a viable solution were to be excluded at the outset, it would, unless independently reintroduced, be unable to be reviewed or corrected. Unless senior clinicians are consistently engaged in reconstructing each situation from first principles, this safeguard could never be considered comprehensive.

Safe use of GenAI requires domain expertise

The combination of these effects introduces a self-reinforcing dynamic. Safe use of GenAI is contingent on domain expertise — specifically, the ability to independently construct and challenge the reasoning framework. Through the structuring of the reasoning framework prior to independent reasoning, GenAI reduces the need for these skills in routine practice. Over time, this risks inhibiting the development and maintenance of the very expertise required for critical evaluation of its outputs. As a result, users become increasingly dependent on such systems, whilst also being less capable of recognising when they are incorrect.

Traditional concerns such as automation and anchoring bias, as well as workforce deskilling, remain valid — however, these typically arise when the skills in question become less necessary due to automation of the task. What distinguishes GenAI is that it risks eroding the very skills that remain essential for its own safe use: by influencing the structuring of the reasoning space itself, it does not simply affect the answers reached, but plays a role in determining the questions asked .

This shift has implications across all domains in which expert judgment is required. Without a recognition of that, there is a risk of decisions appearing to be expert-led whilst having been shaped by an external, unseen influence from the outset.