"

6.2 Types of Privacy Attacks

Data Reconstruction Attacks

Data reconstruction attacks are the most concerning privacy attacks as they have the ability to recover an individual’s data from released aggregate information, focusing on reversing aggregated statistical data to recover individual records. The pioneering work of Dinur and Nissim (2003) demonstrated that an adversary could reconstruct private details from query responses with sufficient computational resources. Subsequent research has refined these methods, reducing the number of queries needed for effective reconstruction (Dwork & Yekhanin, 2008).

A notable example is the U.S. Census Bureau’s investigation into reconstruction risks, which led to the adoption of differential privacy techniques for the 2020 census (Garfinkel, Abowd, & Martindale, 2019).

For machine learning, model inversion attacks attempt to recover representative training data samples by leveraging confidence scores and gradients. Recent advances, such as reconstructor networks, have further improved the fidelity of recovered data (Balle, Cherubin, & Hayes, 2021).

Additionally, the natural tendency of deep neural networks to memorize training data exacerbates reconstruction risks. Research has shown that networks can retain exact data points from their training set, increasing the potential for adversarial exploitation (Zhang et al., 2021).

Membership Inference Attacks

Membership inference attacks aim to determine whether a specific record was part of a training dataset. This poses severe privacy risks in sensitive domains like healthcare, where knowledge of dataset inclusion could be exploited for discrimination or re-identification.

Key attack techniques include:

  • Loss-based attacks: Infer membership by analyzing prediction confidence (Yeom et al., 2018).
  • Shadow model attacks: Train surrogate models to mimic target behaviour and learn membership patterns (Shokri et al., 2017).
  • LiRA (Likelihood Ratio Attack) uses statistical methods to infer membership with high precision (Carlini et al., 2022).
  • Label-only attacks: Operate under minimal information conditions, relying solely on model outputs (Ye et al., 2022).

Several open-source libraries, such as TensorFlow Privacy (Song & Marn, 2020) and ML Privacy Meter (Murakonda & Shokri, 2020), provide tools for evaluating membership inference vulnerabilities.

Model Extraction Attacks

Model extraction attacks seek to replicate proprietary machine-learning models by analyzing their responses to input queries. This is particularly relevant in Machine Learning as a Service (MLaaS) environments, where service providers wish to keep model parameters confidential.

Tramèr et al. demonstrated that adversaries could approximate a model’s decision boundary using repeated queries, effectively replicating its behaviour. Although exact model duplication is often infeasible, functionally equivalent models with comparable accuracy can still be extracted.

Attack methodologies include:

  • Mathematical extraction: Directly approximating model parameters based on the mathematical formulation of the operations performed in deep neural networks allows the adversary to compute model weights algebraically.
  • Learning-based attacks: Selecting optimal queries to accelerate extraction by using a learning method for extraction. For instance, active learning can guide the queries to the ML model for more efficient extraction of model weights, and reinforcement learning can train an adaptive strategy that reduces the number of queries.
  • Side-channel attacks: Exploiting hardware vulnerabilities to infer model details. Side channels allow an attacker to infer information about a secret by observing nonfunctional characteristics of a program, such as execution time or memory, or by measuring or exploiting indirect, coincidental effects of the system or its hardware, like power consumption variation and electromagnetic emanations, while the program executes. Such attacks often aim to exfiltrate sensitive information, including cryptographic keys.

Model extraction poses a severe security risk, as stolen models can be used for adversarial attacks or to circumvent proprietary barriers.

Property Inference Attacks

Unlike membership inference, which targets individual records, property inference attacks seek to deduce aggregate dataset attributes, such as demographic distributions or class imbalances. Ateniese et al. (2015) introduced these attacks, framing them as a distinguishing game where adversaries infer whether a dataset exhibits a specific property. Such attacks have been demonstrated against various architectures, including neural networks, federated learning models, and generative adversarial networks. Recent studies have explored data poisoning techniques to enhance property inference, allowing attackers to amplify specific dataset properties for easier extraction.

Inference-Based Attacks exploit patterns in data distributions and model outputs to extract sensitive information. These attacks can be particularly damaging when adversaries partially know the training set or can manipulate model inputs. Common inference-based attacks include:

  • Attribute inference attacks: Where attackers predict missing attributes of records based on observed data. For example, predicting political affiliation from movie ratings. Another example is a healthcare provider releasing an anonymized dataset containing patient medical records but removing names and addresses. However, attackers use known demographic patterns (age, gender, and zip code) to predict the likelihood that a record belongs to a specific patient and infer their missing attributes, such as disease status.

Real World Example

A real case scenario is the Netflix Prize dataset (2006). Researchers showed that by combining the anonymized Netflix movie rating dataset with publicly available IMDb data, they could infer private user preferences and identities.
  • Feature leakage: Exposing latent features that may reveal underlying sensitive attributes. For example, Gender Leakage in Face Embeddings. A possible scenario could be a facial recognition system generating embeddings (numerical vectors) to identify individuals. Even if gender labels are removed, attackers can train a classifier on the embeddings to predict gender with >90% accuracy. Because the embeddings encode latent features (e.g., facial structure) that are correlated with gender.
  • Linkage attacks: Combining multiple datasets to infer private information about individuals. For instance, a public dataset of taxi rides includes anonymized driver IDs and timestamps. A separate dataset includes social media posts from drivers discussing their shifts. An attacker could re-identify drivers and track their movements by linking the two datasets based on timestamps and locations.

Real World Example

Real-World Case: The AOL Search Data leak (2006) – Researchers linked anonymized search queries to individual users by cross-referencing them with other publicly available datasets, revealing personal identities. Inference-based attacks highlight the challenge of ensuring privacy when datasets and models inadvertently expose information beyond intended outputs.

 


Adapted from “Adversarial Machine Learning A Taxonomy and Terminology of Attacks and Mitigations” by Apostol Vassilev, Alina Oprea, Alie Fordyce, & Hyrum Anderson, National Institute of Standards and Technology – U.S. Department of Commerce. Republished courtesy of the National Institute of Standards and Technology.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.