3.8 Chapter Summary

Bestan Maaroof

3.8 Chapter Summary

Key Takeaways

Evasion Attacks is about Fooling Models at Inference Time.
- Goal: Manipulate input data to cause misclassification during model inference.
- Key Properties: Black Box vs. White Box: Does the attacker need model access? Targeted vs. Untargeted: Force a specific wrong class vs. any wrong class. Example: A self-driving car misclassifies a stop sign as a speed limit sign due to adversarial stickers.
Some Attack Methods (Evasion) Fast Gradient Sign Method (FGSM)
- Idea: Use the gradient of the loss to craft perturbations.
- Strengths: Simple, fast, widely used.
- Weakness: Less effective against robust models.
Projected Gradient Descent (PGD)
- Idea: Iterative FGSM with bounded perturbations.
- Strengths: More powerful than FGSM.
- Weakness: Computationally expensive.
Carlini & Wagner (C&W) Attack
- Idea: Optimize perturbations to minimize human detectability.
- Strengths: Highly effective, hard to defend against.
- Weakness: Requires significant computational resources.
Adversarial Training
- It involves injecting adversarial samples into the training process.
- It helps models recognize and resist adversarial attacks.

OpenAI. (2025). ChatGPT. [Large language model]. https://chat.openai.com/chat
Prompt: Can you generate key takeaways for this chapter content?

Key Terms

BIM: Generates perturbations by increasing the loss function in multiple small steps.
Black-box attack: The attacker cannot access the model’s structure or parameters and relies only on input-output observations.
Digital Attack: Manipulating input data, such as uploading a crafted PNG file to bypass detection.
Fast Gradient Sign Method (FGSM): Proves that the existence of adversarial examples is caused by the high-dimensional linearity of deep neural networks.
Iterative Attack: Multiple iterations refine the adversarial example for a more effective attack at the cost of increased computation time.
L-BFGS: The first anti-attack algorithm for deep learning.
Non-Targeted Attack: The adversarial example only needs to be misclassified, regardless of the incorrect class. Also known as error-generic attacks or indiscriminate attacks.
One-step Attack: The adversarial example is generated in a single step using minimal computation.
Physical Attack: Altering the environment to influence sensor data, such as obstructing a camera’s view.
Projected Gradient Descent (PGD): An iterative attack (multi-step attack) that IFGSM/FGSM influences.
Specific Perturbation: Each input is modified with a unique perturbation pattern.
Targeted Attack: The adversarial example forces the model to misclassify an input as a specific target class. Also known as error-specific attacks.
Universal Perturbation: The same perturbation is applied to all inputs.
White-box attack: The attacker fully knows the model, including its architecture, parameters, and training data.
Zeroth Order Optimization (ZOO): It does not exploit the attack transferability of surrogate models, but it estimates the value of the first-order gradient and the second-order gradient.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

License

Share This Book