"

2.4 Adversary’s Model and Attack Scenario

The adversary’s model and attack scenario is an application-specific issue in pattern recognition. Designers rely on predefined attack scenario guidelines to strengthen system defences. The adversary operates strategically to achieve a specific goal, leveraging their knowledge of the classifier and ability to manipulate data. This model is built on the assumption that the adversary makes rational decisions to maximize their success.

Figure 2.4.1 “Adversary’s 3D Model”, Fanshawe College, CC BY-NC-SA 4.0.

Adversary’s Knowledge

The adversary’s knowledge can be categorized based on:

  • Training data used by the classifier.
  • Feature set influencing classification decisions.
  • Type of decision function and learning algorithm employed.
  • Feedback mechanisms are available from the classifier.

It is important to make realistic yet minimal assumptions about which system details can remain entirely private from the adversary.

Review Images

Review the images from Machine Learning Security: Threat Modelling and Overview of Attacks on AI by Battista Biggio

Adversary’s Goal

The adversary’s objective is to violate security principles such as integrity, availability, or privacy.

  • Their attacks may be targeted (focusing on specific data) or indiscriminate (aiming for widespread disruption).
  • In indiscriminate attacks, the goal is to maximize the misclassification rate of malicious samples.
  • In targeted privacy violations, the adversary aims to extract confidential information from the classifier by exploiting class labels.
  • For privacy violations, the goal is to minimize the number of queries required to gather sensitive information about the classifier.

Adversary’s Capability

The adversary’s level of control over the training and testing data in each phase is determined by:

  • Training Phase: Influence model at training time to cause subsequent errors at test time (poisoning attacks, backdoors)
  • Testing Phases: Manipulate malicious samples at test time to cause misclassifications evasion attacks, adversarial examples

We can define the level of control through the following concept:

  • Attack influence may be causative (affecting training data) or exploratory (gathering information to bypass defences).
  • The extent to which class priors (probability distributions of different classes) are altered.
  • Control which training and testing samples in each class can be modified and how many.
  • Application-specific constraints, such as ensuring that malicious samples retain their intended functionality

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.