"

2.2 Categories of Attacks

Based on their nature, attacks are organized into groups according to three categories, namely their influence, their specificity, and their ability to violate security (Barreno et al., 2010; Liu et al., 2018; Yuan et al., 2019).

Category 1: Influence

An influence attack aims to influence the classifier. The influence can be done in two ways:

Causative

In a causative attack, the attacker has the capability to modify the distribution of training data. The attacker accesses the training data and manipulates the number of samples in a way that degrades the accuracy of the classifier when retraining the model. This manipulation can be performed by adding malicious samples or removing certain data. To carry out this attack, the attacker must have access to the location of the training data. This type of attack is also known as a “data poisoning attack” (Liu et al., 2018; Baracaldo et al., 2017).

Typically, a causative attacker uses different techniques to modify the training data distribution, e.g., dictionary attacks, focused attacks, etc. A dictionary attack is a technique based on dictionary words to attack the model. This technique is used in text classification models, especially when the attackers do not know any information about text data (Nelson et al., 2008). A focused attack is typically focused on one type of text. For example, if attackers want to classify spam emails related to the lottery, the attackers use words related to that email only (Nelson et al., 2008).

Exploratory

In exploratory attacks, the attacker explores the decision boundary of the model. The aim is to gain information about the training and test datasets and to identify the decision boundary model. This can be done by sending tons of inquiries to the model and obtaining information about the statistical features of the training data (Imam & Vassilakis, 2019). Knowing these features and the decision boundary enables the preparation of malicious input, resulting in incorrect classification after being passed to the model (Liu et al., 2018; Rigaki & Garcia, 2020; Sherman, 2020).


Category 2: Specificity

Depending on the specificity, the attack is further divided into two groups (Yuan et al., 2019):

Targeted

In a targeted attack, the attacker focuses on one particular case and tries to degrade the model’s performance in that particular case (Sagar et al., 2020). One example is converting ham information to spam information (Peng & Chan, 2013). The ham (i.e., normal) email should be classified as normal, but the attacker modifies the input to classify the ham as spam. The attacker focuses only on the ham class. At a deeper level, the attacker may only focus on a specific type of ham instance.

Indiscriminate

In indiscriminate attacks, the attacker targets all types of instances of a particular class (Siddiqi, 2019). The attacker intends to degrade the model performance, e.g., classify normal emails as spam.


Category 3: Security Violation

Based on the nature of security violations or security threats, attacks can be categorized into three further classes:

Integrity

Integrity attacks form an attack whose main intention is to increase the number of false negative cases (Barreno et al., 2010). In the example of ham versus spam classification, an integrity attack consists of classifying as many spam samples as possible as ham.

Availability

In an availability attack, the attacker increases the number of false-positive cases instead of increasing the number of false-negative cases (Barreno et al., 2010). In the case of ham and spam classification, the ham class will be flooded with spam cases. Note that integrity and availability are equivalent to binary classification.

Privacy

The attacker violates privacy; the intention(goal) is to obtain confidential information from the classifier.


Trustworthy machine learning in the context of security and privacy” by Ramesh Upreti, Pedro G. Lind, Ahmed Elmokashfi & Anis Yazidi is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.