5.3 Backdoor Attack Scenarios

In this section, we review possible attack scenarios in which a Backdoor attack can be executed, each with its unique challenges and implications:

Outsourced Training:

Scenario: A user aims to train a model using a training dataset but outsources the training process to an external trainer. The trainer returns the trained model, which the user verifies using a validation dataset.
Attack: A malicious trainer returns a backdoored model that meets the accuracy requirements on the validation set but misclassifies inputs containing the backdoor trigger.
Implications: The user may unknowingly deploy a compromised model, leading to potential security breaches.

Scenario: A user downloads a pre-trained model from an online repository and fine-tunes it for a new application using a private validation set.
Attack: The pre-trained model is backdoored, and the fine-tuned model inherits the backdoor, causing misclassification of triggered inputs while maintaining high accuracy on clean data.
Implications: The user’s application may be compromised, leading to incorrect predictions and potential security risks.

Scenario: Multiple participants collaboratively train a model without sharing their private data. The central server aggregates updates from participants to improve the model.
Attack: A malicious participant submits poisoned updates, embedding a backdoor into the joint model. The model behaves correctly on clean data but misclassifies triggered inputs.
Implications: The integrity of the federated learning process is compromised, and the model may be used to carry out targeted attacks.

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License