3.7 Mitigating Evasion Attack
Machine learning models, despite their remarkable capabilities, are increasingly vulnerable to evasion attacks, where adversaries craft subtle perturbations to input data to deceive the model during deployment. These attacks pose significant risks, especially in critical applications such as autonomous driving, cybersecurity, and healthcare. To address these challenges, researchers have developed various defense mechanisms to fortify models against adversarial threats. From the wide range of proposed defenses against adversarial evasion attacks, three main classes have proved resilient and have the potential to provide mitigation against evasion attacks: Adversarial Training, Randomized Smoothing, and Formal Verification.
1. Adversarial Training:
Introduced by Goodfellow et al. (2015) and further developed by Madry et al. (2018), it works by incorporating adversarial examples—intentionally misleading inputs—into the training process. This approach helps the model learn to correctly classify both clean and adversarial data. While it significantly improves robustness, it can reduce performance on standard (clean) inputs and demands high computational resources due to the repeated generation of adversarial samples. Interestingly, adversarially trained models often develop representations that align more closely with human perception.
2. Randomized Smoothing:
Randomized smoothing offers a probabilistic guarantee of robustness. This technique adds Gaussian noise to inputs and averages the model’s predictions to produce a smooth, robust output. This method allows for certifiable defense against ℓ₂-norm bounded attacks and has even been applied at scale on complex datasets like ImageNet. More recent innovations combine this approach with denoising diffusion models to improve certified accuracy across a wider range of inputs.
3. Formal Verification:
Another method for certifying the adversarial robustness of a neural network is based on techniques from FORMAL METHODS. Early tools like Reluplex used satisfiability modulo theories (SMT) to analyze small neural networks. Later methods, such as AI2, DeepPoly, ReluVal, and Fast Geometric Projections (FGP), expanded these techniques to deeper and more complex architectures using abstract interpretation and geometric methods. While formal verification holds great promise, it is often limited by scalability, high computational demands, and restrictions on the types of supported model operations.
“Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations” by Apostol Vassilev, Alina Oprea, Alie Fordyce, Hyrum Anderson, National Institute of Standards and Technology – U.S. Department of Commerce. Republished courtesy of the National Institute of Standards and Technology. Modifications summarized the three points.