How AI could affect security

Software Strategy Consulting and Cydrill Software Security recently presented a talk for the BCS Nottingham and Derby Branch on the topic of AI security and how generative AI tools open up new vulnerabilities that IT professionals need to be aware of. This article provides a summary.

Software Strategy Consulting (SSC) are a business and technology consulting company with particular relevance when enterprise software is a component of the problem space. Cydrill is a global training provider for software security.

The aim of the talk was to show how susceptible AI systems are to a variety of attacks, and what this means for the security of software development. The general attitude towards generative AI and AI-driven coding assistants seems to be ‘move fast and break things’: widespread adoption, but not enough awareness of the potential risks.

What are the underlying security problems of generative AI tools?

Behind each GenAI tool there is a language model built via machine learning — and these models don’t always work the way we want them to. GIGO (garbage in, garbage out — poor training data creates poor models), catastrophic forgetting and overfitting are known issues, of course, but AI can be also hacked. Securing machine learning is a relatively new discipline. Among others, we recommend checking out the NCSC Machine Learning Principles and ATLAS for a comprehensive overview.

By creating so-called adversarial samples, an attacker can mislead AI models (such as by becoming invisible to cameras) , or 3D-printing a plastic turtle the AI mistakenly classifies as a rifle; this attack is also called ‘evasion’. The analogue of this attack in LLMs is jailbreaking — sending a specific sequence of characters that makes the model produce harmful responses while bypassing guardrails. By ‘poisoning’ (manipulating) some of the training data, attackers can put a backdoor into models, making them do the attacker’s bidding when encountering specific input later. LLMs are particularly susceptible — an attacker only needs to corrupt ~0.1% of their pre-training data to compromise them. Since these models pull their data from the non-curated public internet, poisoning is more feasible than it looks.

With the right prompting, the attacker can trick the model into reproducing its training data verbatim, potentially exposing sensitive or private information such as credit card numbers or home addresses. This attack (model inversion) has been demonstrated in practice against commercial LLMs. By asking the right questions, the attacker can also learn enough about the neural network behind a target model to create an illicit clone (model stealing). This has also been successfully demonstrated against real models.

Defence techniques — such as adversarial training — exist, but right now it is much easier to attack these models than it is to defend them. To quote Nicholas Carlini (Anthropic, ex-Google): ‘We are crypto pre-Shannon’.

Have there been any attacks against actual GenAI tools?

Quite a few. Large scale reliance on these tools opens them up as a novel attack vector for hackers to compromise a lot of AI-generated code simultaneously, essentially creating a single point of failure.

Some recent examples are supply chain attacks via malicious models, observed in the wild at the Hugging Face model repository; dependency confusion attacks via package hallucination as demonstrated against multiple models including GPT-4 and DeepSeek; exploiting automatic processing of hidden instructions in commit logs or issuing tickets to steal code from private repositories as demonstrated against GitLab Duo; revealing sensitive proprietary source code and bypassing output filters via model inversion as demonstrated against GitHub Copilot; and attacking agentic systems to make LLM powered agents do the attacker’s bidding and compromise systems from the inside as described by OWASP.

Can generative AI really have a negative effect on the security and robustness of code?

GenAI has been observed to boost developer productivity. AI code assistants do especially well with repetitive work such as boilerplate code. They can also help with third-party APIs — either to understand them or to generate glue code.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

But GenAI is not without its downsides. These assistants don’t truly understand the code — they just generate suggestions by predicting the next likely sequence of tokens. If the model was not trained with the solution to a particular type of task, or solving the task requires understanding the developer’s own code (such as unit test generation), GenAI is unlikely to perform well. Worse yet, instead of failing, the model will ‘hallucinate’ — that is, generate responses that are incorrect but look plausible, by generating code that looks correct but references non-existent classes, libraries and functions. Hallucination is not a bug or an oversight, it’s a core part of how LLMs operate, as explained by OpenAI founder Andrej Karpathy himself on X [formerly Twitter].

We should also consider potential long-term consequences of GenAI. Though at this time it’s too early to predict its full impact. Use of these tools turns developers into code reviewers and reverse engineers (whether they want that or not), and ‘vibe coding’ (accepting LLM-generated code without understanding it) inevitably leads to maintainability issues and technical debt. Not to mention that the trend of senior developers relying on AI to take care of simpler tasks can cut off junior developers from opportunities. This could become a sustainability problem in a few decades once current senior developers start retiring. If people stop writing their own code, what will be used to train future models? Trying to train subsequent models with AI-generated data has been shown to cause ‘model collapse’.

What is ‘responsible AI‘ and how can it help?

Responsible AI (or trustworthy AI) is an important framework to address AI risks. According to NIST, there are seven principles:

Validity and reliability: expect hallucinations — generated code may be non-functional or vulnerable
Safety: always provide human oversight https://www.bcs.org/articles-opinion-and-research/navigating-the-risks-of-shadow-ai/ in GenAI driven processes that have safety aspects, such as health data
Security and resiliency: prepare for AI threats and follow secure software development practices
Accountability and transparency: be aware of legislation in the area such as the EU AI Act, and only commit AI-generated code if you accept responsibility for it)
Explainability and interpretability: LLMs should be treated as probabilistic black boxes, their own explanations can’t be relied on — ‘reasoning’ and ‘chain-of-thought’ models can still hallucinate and then attempt to justify that hallucination.
Privacy: data used to train or fine-tune LLMs should be anonymised and undergo data minimisation to mitigate model inversion attacks
Fairness with mitigation of harmful bias: keep in mind that models have been trained mainly on open source code and thus will have a bias when suggesting libraries or APIs

If you would like to discover more about the vulnerabilities introduced by AI, how to code against them, or secure coding practices in general, you can contact the authors.