Luis Muñoz-González and Emil C. Lupu, from Imperial College London, explore the vulnerabilities of machine learning algorithms.

In the big data era, advances in machine learning and artificial intelligence have produced a disruptive change in society. Many services and systems rely on data-driven approaches leveraging the huge amount of information available from a multitude of sources, including sensors, devices and people.

The use of machine-learning facilitates the automation of many tasks and brings important benefits in terms of new functionality, personalisation and optimisation of resources.

Machine learning has shown its extraordinary capabilities, even outperforming humans in some tasks. For instance, in 2014, at the Royal Society, Eugene Goostman, an AI computer program, was the first machine to arguably pass the Turing test. Developed to simulate a 13-year-old boy, this program managed to fool 33 per cent of its interrogators into thinking it was human.

In March 2016, DeepMind’s AlphaGo, a machine-learning system designed to play Go, an ancient Chinese strategy game, beat the human world champion Lee Sedol with a final score of four games to one in favour of AlphaGo.

Despite these impressive achievements and the rapid development of new techniques, especially in the context of deep learning, it has been shown that machine learning can also be vulnerable and behave in unexpected ways. For example, by the time AlphaGo defeated Lee Sedol, Microsoft deployed Tay, an AI chat bot designed to interact with youngsters on Twitter.

Although Tay started behaving in a very cool way, using jargon and expressions common amongst young people, Microsoft had to shut down the service 16 hours after its launch. The reason? Tay had posted many offensive tweets demonstrating an extremely racist and sexist behaviour.

The chat bot was designed to mimic the language of young people, and learn from its exchanges with them. When some users started posting offensive messages, Tay learnt to behave in an inappropriate way. These offensive tweets poisoned the data used by the system to re-train and update the learning algorithm.

The reality is that machine learning and AI systems are vulnerable and can be abused, offering cyber criminals a great opportunity to conduct illicit and highly profitable activities. As one of the main components behind many systems and services, machine learning can be an appealing target for attackers, who may gain a significant advantage by exploiting the vulnerabilities of the learning algorithms. Furthermore, attackers can also use machine learning as a weapon to compromise other systems.

Adversarial examples

Attackers can leverage blind spots and learn the weakest parts of the learning algorithms at test time, when the system is already deployed. They can craft malicious examples that produce intentional errors in the machine learning system. These are known as evasion attacks. The vulnerabilities that allow this to occur are twofold:

First, when the data pre-filtering or outlier detection are insufficient, attackers can craft data points that differ significantly from the data expected and on which the system has been trained. When encountering unexpected samples the behaviour of the learning algorithm can be quite unpredictable if the model is not designed specifically to handle these cases.

Second, in most applications, the data available to train the learning algorithm is not sufficient to learn perfectly the true (ideal) underlying model for the task it is designed to solve, as it is not possible to fully characterise the data distribution. An attacker can take advantage of the regions where the true and the learned models diverge to craft malicious examples that produce errors in the system.

Christian Szegedy and his colleagues at Google discovered some intriguing properties in deep neural networks and other popular machine learning algorithms. They showed that small adversarial perturbations in the data can produce significantly different outcomes in the learning algorithm.

They defined the concept of adversarial examples: malicious data points that aim to produce an intentional error by adding some small adversarial noise that makes the samples (apparently) indistinguishable from genuine data points. This has been shown to be quite effective to deceive state-of-the-art deep-learning algorithms in computer vision, as well as in other application domains.

Despite ongoing efforts from the adversarial machine learning research community to mitigate this vulnerability, detecting adversarial examples and defending against evasion attacks remain an open problem. One of the main obstacles is the transferability property of adversarial examples: an attacker does not require full knowledge of the system in use but can craft adversarial examples by building a surrogate model somewhat similar to the one used by the defender.

It has been shown that adversarial examples crafted for the surrogate model are likely to be effective and effective attacks can, therefore, be created without knowing the details of the system being attacked.

Compromising data collection

Poisoning attacks are considered one of the most significant emerging security threats for data-driven technologies. Many systems and online services rely on users’ data and feedback on their decisions to update and improve the performance of the model, adapting to changes in the environment or in the user’s preferences. Clearly, a malicious user may provide the wrong feedback to gradually poison the system and compromise its performance over time.

Similarly, when we collect data from sensors and IoT devices spread in the physical environment, a compromised device may lie about the data it reports and multiple users or sensors may collude and coordinate their poisoning of the data towards the same objective.

Machine learning systems are vulnerable to data poisoning. When an attacker can inject malicious samples into the dataset used to train the learning algorithm, such samples can be deliberately crafted to subvert the learning process and teach the learning algorithm a bad model.

Such attacks can be indiscriminate, reducing the overall performance of the machine learning system or can be targeted at specific classes of samples. Compromising even a small fraction of the training set can make the system unusable in some cases.

Data poisoning can also be performed to facilitate subsequent evasion attacks. Even if the malicious data injected by the attacker does not produce an error, the performance of the system is degraded when this data is used for re-training or updating the learning algorithm. For example, in a spam filter application, a learning algorithm typically uses the text in the header and the corpus of the emails to classify spam from good email. An attacker can send malicious emails containing words common in good emails as well as words common in spam emails.

Thus, even if these malicious emails are correctly classified as spam, when the system is re-trained, some of the words that where considered before as indicative of good email, will now be considered as typical of spam. Then, possibly, genuine emails containing those good words will be incorrectly classified as spam in future.


The vulnerabilities of machine learning algorithms have fuelled a growing interest in adversarial machine learning, a research area that lies at the intersection of cyber security and machine learning. There have been significant advancements to understand the security properties of the learning algorithms, although defending against these attacks remains a significant challenge.

Furthermore, there is a need for a general design methodology for secure machine learning systems that can offer indicators or performance guarantees of machine learning systems in the presence of sophisticated attacks. Traditional design methodologies that rely on measuring the performance on a separate test dataset are no longer valid when considering the presence of a smart adversary.

Addressing these security challenges is necessary and a priority, to ensure the successful adoption of machine learning technologies and their deployment in systems that can be trusted across many application domains.

The authors can be contacted at: