5.7 Chapter Summary

Bestan Maaroof

5.7 Chapter Summary

Key Takeaways

Backdoor attacks manipulate training data to embed hidden triggers, causing misclassification when the trigger is present during inference.

Attack Scenarios:

Outsourced Training: A malicious trainer injects a backdoor.
Transfer Learning: A pre-trained model contains a backdoor inherited by fine-tuned models.
Federated Learning: Malicious participants submit poisoned updates.

Attack execution relies on high accuracy on clean samples while achieving a high success rate on poisoned samples.

Types of backdoor attacks vary from simple patch triggers to complex functional and semantic triggers.

Types of Triggers:

Patch Triggers: Visible patches (e.g., stickers on traffic signs).
Clean-Label Attacks: Labels remain unchanged, making detection harder.
Semantic Triggers: Natural-looking modifications (e.g., glasses on faces).

Mitigation Strategies:

Data Sanitization: Detecting and removing poisoned samples.
Trigger Reconstruction: Identifying triggers via optimization (e.g., NeuralCleanse).
Model Inspection & Sanitization: Pruning suspicious neurons or fine-tuning.

Challenges:

Stealthy attacks (low poisoning rates, dynamic triggers) evade detection.
Federated learning requires specialized defences due to decentralized threats.

OpenAI. (2025). ChatGPT. [Large language model]. https://chat.openai.com/chat
Prompt: Can you generate key takeaways for this chapter content?

Key Terms

Attack Execution: Any input containing the trigger will be misclassified as the target class during inference.
Backdoor Attack Federated Filter Evaluation (BaFFLe): A validation-based defence protocol that adds an extra evaluation phase during each round of federated training.
BaFFLe: Adds an extra validation phase to each training round by using global models that have been trained in earlier rounds as a reference to the next round, so any major changes can be detected.
Clean-label Backdoors: The attacker does not change the labels of the poisoned samples, making the attack stealthier.
Dynamic Backdoors: The trigger’s location or appearance varies across different samples, making it harder to detect.
Functional Triggers: The trigger is embedded throughout the input or changes based on the input.
Gaussian noise: Deliberate addition of random noise drawn from a Gaussian (normal) distribution to the model updates.
Identifying and Down-weighting Malicious Updates: These algorithms focus on detecting and diminishing the influence of malicious client updates during aggregation.
Label Manipulation: The labels of the poisoned samples are changed to the target class. (Sometimes, the label could be unchanged; instead, a feature collision strategy is used.)
Model Training: The model is trained on the poisoned dataset, learning to associate the trigger with the target class.
Patch Trigger: The trigger is a small patch added to the input data. For example, a sticker on a stop sign could cause an autonomous vehicle to misclassify it.
Resistant Aggregation Without Malicious Client Identification: These methods do not attempt to identify malicious clients.
Semantical Triggers: This is a physical perceptible trigger and, hence, is plausible.
Trigger Embedding: The attacker selects a trigger (e.g., a small patch, a specific pattern, or a noise pattern) and embeds it into a subset of the training data.
Trigger Reconstruction: A mitigation strategy that focuses on identifying and reconstructing the backdoor trigger.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

License

Share This Book