3.8 Chapter Summary
Key Takeaways
- Evasion Attacks is about Fooling Models at Inference Time.
- Goal: Manipulate input data to cause misclassification during model inference.
- Key Properties: Black Box vs. White Box: Does the attacker need model access? Targeted vs. Untargeted: Force a specific wrong class vs. any wrong class. Example: A self-driving car misclassifies a stop sign as a speed limit sign due to adversarial stickers.
- Some Attack Methods (Evasion) Fast Gradient Sign Method (FGSM)
- Idea: Use the gradient of the loss to craft perturbations.
- Strengths: Simple, fast, widely used.
- Weakness: Less effective against robust models.
- Projected Gradient Descent (PGD)
- Idea: Iterative FGSM with bounded perturbations.
- Strengths: More powerful than FGSM.
- Weakness: Computationally expensive.
- Carlini & Wagner (C&W) Attack
- Idea: Optimize perturbations to minimize human detectability.
- Strengths: Highly effective, hard to defend against.
- Weakness: Requires significant computational resources.
- Adversarial Training
- It involves injecting adversarial samples into the training process.
- It helps models recognize and resist adversarial attacks.
OpenAI. (2025). ChatGPT. [Large language model]. https://chat.openai.com/chat
Prompt: Can you generate key takeaways for this chapter content?
Key Terms
- BIM: Generates perturbations by increasing the loss function in multiple small steps.
- Black-box attack: The attacker cannot access the model’s structure or parameters and relies only on input-output observations.
- Digital Attack: Manipulating input data, such as uploading a crafted PNG file to bypass detection.
- Fast Gradient Sign Method (FGSM): Proves that the existence of adversarial examples is caused by the high-dimensional linearity of deep neural networks.
- Iterative Attack: Multiple iterations refine the adversarial example for a more effective attack at the cost of increased computation time.
- L-BFGS: The first anti-attack algorithm for deep learning.
- Non-Targeted Attack: The adversarial example only needs to be misclassified, regardless of the incorrect class. Also known as error-generic attacks or indiscriminate attacks.
- One-step Attack: The adversarial example is generated in a single step using minimal computation.
- Physical Attack: Altering the environment to influence sensor data, such as obstructing a camera’s view.
- Projected Gradient Descent (PGD): An iterative attack (multi-step attack) that IFGSM/FGSM influences.
- Specific Perturbation: Each input is modified with a unique perturbation pattern.
- Targeted Attack: The adversarial example forces the model to misclassify an input as a specific target class. Also known as error-specific attacks.
- Universal Perturbation: The same perturbation is applied to all inputs.
- White-box attack: The attacker fully knows the model, including its architecture, parameters, and training data.
- Zeroth Order Optimization (ZOO): It does not exploit the attack transferability of surrogate models, but it estimates the value of the first-order gradient and the second-order gradient.