"

4.1 Introduction

While adversarial attacks cannot change a model’s training process and can only modify the test instance, data poisoning attacks, on the contrary, can manipulate the training process (Figure 4.1.1). Specifically, in data poisoning attacks, attackers aim to manipulate the training data (e.g., poisoning features, flipping labels, manipulating the model configuration settings, and altering the model weights) to influence the learning model. It is assumed that attackers can contribute to the training data or have control over the training data itself. The main objective of injecting poison data is to influence the model’s learning outcome. Recent studies on adversarial ML have demonstrated particular interest in data poisoning attack settings.

poisoning attacks on machine learning models
Figure 4.1.1 A generalized illustration of poisoning attacks on machine learning models. Graphic in  A Survey on Poisoning Attacks Against Supervised Machine Learning, Wenjun Qiu, FDEd (CAN).
Figure 4.1.1 Description

A data poisoning attack on a machine learning pipeline. It is divided into two main sections: the Training Stage (highlighted with a red dashed border) and the Inference Stage (highlighted with a blue dashed border).

In the Training Stage, an attacker injects poisoned data into the training dataset, which also includes normal training data. These are combined and fed into the next phase. In the Inference Stage, testing data is input into a (corrupted) learning model, the result of training on the poisoned dataset. The corrupted model produces a prediction, which is evaluated by a predicted score.

 

 

Figure 4.1.2 A generalized illustration of poisoning attacks on machine learning models. Image by Ximeng LiuLehui XieYaopeng WangJian ZouJinbo XiongZuobin Ying, CC BY 4.0
Figure 4.1.2 Description

A data poisoning attack on a machine learning model. In the training stage, an attacker injects poisoned data alongside legitimate training data, resulting in a corrupted learning model. During the inference stage, testing data is fed into the corrupted model, leading to manipulated predictions. The process concludes with an evaluation of the predicted score.


ML Attack Models: Adversarial Attacks and Data Poisoning Attacks” by Jing Lin, Long Dang, Mohamed Rahouti, and Kaiqi Xiong is licensed under an Attribution-NonCommercial-ShareAlike 4.0 International, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.