"

3.1 Introduction

An evasion attack is a test time attack in which the adversary’s goal is to generate adversarial examples, which are defined as testing samples whose classification can be changed at deployment time to an arbitrary class of the attacker’s choice with only minimal perturbation. In the context of image classification, the perturbation of the original sample must be small so that a human cannot observe the transformation of the input. Therefore, while the ML model can be tricked to classify the adversarial example in the target class the attacker selects, humans still recognize it as part of the original class (Figure 3.1.1).

Figure 3.1.1 Adversarial attack process. Image by Liang, H.; He, E.; Zhao, Y.; Jia, Z.; Li, H, CC BY 4.0
Figure 3.1.1 Description

A diagram illustrating an adversarial attack on a neural network-based image classification model. The process begins with an ‘Original example’ image of a kitten. A perturbation, represented as a tiled blue pattern, is added to the original image to create an ‘Adversarial example,’ which still visually appears as a kitten. This adversarial example is then input into the ‘Target Model,’ a deep neural network, which incorrectly classifies it as a ‘Dog’ instead of a ‘Cat.’ The diagram also shows different attack strategies, including a ‘White-box attack,’ which involves constrained optimization, gradient optimization, estimated decision boundary adjustments, and evolution strategies, as well as a ‘Black-box attack’ relying on an attack strategy plot with a 3D graph.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.