2.4 Adversary’s Model and Attack Scenario
The adversary’s model and attack scenario is an application-specific issue in pattern recognition. Designers rely on predefined attack scenario guidelines to strengthen system defences. The adversary operates strategically to achieve a specific goal, leveraging their knowledge of the classifier and ability to manipulate data. This model is built on the assumption that the adversary makes rational decisions to maximize their success.

Adversary’s Knowledge
The adversary’s knowledge can be categorized based on:
- Training data used by the classifier.
- Feature set influencing classification decisions.
- Type of decision function and learning algorithm employed.
- Feedback mechanisms are available from the classifier.
It is important to make realistic yet minimal assumptions about which system details can remain entirely private from the adversary.
Review Images
Review the images from Machine Learning Security: Threat Modelling and Overview of Attacks on AI by Battista Biggio
Adversary’s Goal
The adversary’s objective is to violate security principles such as integrity, availability, or privacy.
- Their attacks may be targeted (focusing on specific data) or indiscriminate (aiming for widespread disruption).
- In indiscriminate attacks, the goal is to maximize the misclassification rate of malicious samples.
- In targeted privacy violations, the adversary aims to extract confidential information from the classifier by exploiting class labels.
- For privacy violations, the goal is to minimize the number of queries required to gather sensitive information about the classifier.
Adversary’s Capability
The adversary’s level of control over the training and testing data in each phase is determined by:
- Training Phase: Influence model at training time to cause subsequent errors at test time (poisoning attacks, backdoors)
- Testing Phases: Manipulate malicious samples at test time to cause misclassifications evasion attacks, adversarial examples
We can define the level of control through the following concept:
- Attack influence may be causative (affecting training data) or exploratory (gathering information to bypass defences).
- The extent to which class priors (probability distributions of different classes) are altered.
- Control which training and testing samples in each class can be modified and how many.
- Application-specific constraints, such as ensuring that malicious samples retain their intended functionality