"

3.8 Chapter Summary

Key Takeaways

  • Evasion Attacks is about Fooling Models at Inference Time.
    • Goal: Manipulate input data to cause misclassification during model inference.
    • Key Properties: Black Box vs. White Box: Does the attacker need model access? Targeted vs. Untargeted: Force a specific wrong class vs. any wrong class. Example: A self-driving car misclassifies a stop sign as a speed limit sign due to adversarial stickers.
  • Some Attack Methods (Evasion) Fast Gradient Sign Method (FGSM)
    • Idea: Use the gradient of the loss to craft perturbations.
    • Strengths: Simple, fast, widely used.
    • Weakness: Less effective against robust models.
  • Projected Gradient Descent (PGD)
    • Idea: Iterative FGSM with bounded perturbations.
    • Strengths: More powerful than FGSM.
    • Weakness: Computationally expensive.
  • Carlini & Wagner (C&W) Attack
    • Idea: Optimize perturbations to minimize human detectability.
    • Strengths: Highly effective, hard to defend against.
    • Weakness: Requires significant computational resources.
  • Adversarial Training
    • It involves injecting adversarial samples into the training process.
    • It helps models recognize and resist adversarial attacks.

OpenAI. (2025). ChatGPT. [Large language model]. https://chat.openai.com/chat
Prompt: Can you generate key takeaways for this chapter content?

Key Terms

  • BIM: Generates perturbations by increasing the loss function in multiple small steps.
  • Black-box attack: The attacker cannot access the model’s structure or parameters and relies only on input-output observations.
  • Digital Attack: Manipulating input data, such as uploading a crafted PNG file to bypass detection.
  • Fast Gradient Sign Method (FGSM): Proves that the existence of adversarial examples is caused by the high-dimensional linearity of deep neural networks.
  • Iterative Attack: Multiple iterations refine the adversarial example for a more effective attack at the cost of increased computation time.
  • L-BFGS: The first anti-attack algorithm for deep learning.
  • Non-Targeted Attack: The adversarial example only needs to be misclassified, regardless of the incorrect class. Also known as error-generic attacks or indiscriminate attacks.
  • One-step Attack: The adversarial example is generated in a single step using minimal computation.
  • Physical Attack: Altering the environment to influence sensor data, such as obstructing a camera’s view.
  • Projected Gradient Descent (PGD): An iterative attack (multi-step attack) that IFGSM/FGSM influences.
  • Specific Perturbation: Each input is modified with a unique perturbation pattern.
  • Targeted Attack: The adversarial example forces the model to misclassify an input as a specific target class. Also known as error-specific attacks.
  • Universal Perturbation: The same perturbation is applied to all inputs.
  • White-box attack: The attacker fully knows the model, including its architecture, parameters, and training data.
  • Zeroth Order Optimization (ZOO): It does not exploit the attack transferability of surrogate models, but it estimates the value of the first-order gradient and the second-order gradient.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.