6.1 Introduction

Bestan Maaroof

6.1 Introduction

Privacy attacks have emerged as a major threat to data security, enabling adversaries to infer sensitive details from information collected from user records. These attacks exploit vulnerabilities in statistical datasets and machine learning models to reconstruct private data, infer membership in datasets, or extract model parameters. This section explores key categories of privacy attacks and their implications, along with mitigation strategies.

Figure 6.1.1 Scope of the survey: the interplay between security, privacy, explainability, transparency, fairness, robustness and machine learning in the context of trustworthy AI. Image by Ramesh Upreti, Pedro G. Lind, Ahmed Elmokashfi & Anis Yazidi, CC BY 4.0

A real-world concern

The Android operating system is one of the most widely used operating systems in smartphones, wearable devices, IoT devices, etc. Google needs a lot of user data to provide different types of features and a better user experience on the Android OS. However, it has been a big challenge for Google to collect user data due to privacy issues, new laws, and the complexity of storing and processing user data. Moreover, several studies show that more data will result in a better model. Therefore, Google needed an efficient solution to deal with these problems.

Solution: Federated Learning

In 2016, a research team from Google came across a new solution to preserve privacy while leveraging the data from its users’ devices. They coined this new approach, Federated Learning (FL), as shown in Fig. 6.1.2.

Figure 6.1.2 The different steps in FL training. Step 1: The server has the initial global model. Step 2: the server sends the initial model to all connected devices; Step 4: each device receives a copy of the global model; Step 4: each device trains the model with local data and updates the model parameter based on local loss; Step 5: each device sends model parameters back to the server, Step 6: server aggregates model parameters received from multiple clients and updates the global model. Image by Ramesh Upreti, Pedro G. Lind, Ahmed Elmokashfi & Anis Yazidi. CC BY 4.0

FL is the new paradigm in the ML family. In FL, the user no longer needs to share the data, as the data is always with the user. FL introduces the concept of sharing the model parameters instead of data. Therefore, it is also called the learning-by-parameters approach. The server creates and shares the global model with all users in this approach. Then, each user trains the model with local data on their own device and sends the model training parameters to the server. The server receives the parameters from each user, applies the aggregation to the parameters and updates the model parameters. The updated set of parameters will be shared with all users for the next round. The process will continue until convergence, after some pre-defined number of iterations or in a periodic fashion.

An important feature is that the training process is shifted from the central server to each user device (local device). Initially, FL was introduced for smartphone applications by Google, but its applicability is equally important in many contexts, e.g., hospitals, banks, and the Internet of Things.

Read about treats in an ML-based system

Trust Model

The context of its deployment largely determines the trust model of any ML-based system as it relates to the trust placed in the relevant actors. We can think of several actors relevant to a deployed ML-based system to abstract a bit.

First, there are data owners, the owners or trustees of the data/environment the system is deployed within, e.g., an IT organization deploying a face recognition authentication service.
Second, system providers construct the system and algorithms, e.g., the authentication service software vendors.
Third, there may be consumers of the system’s service, e.g., the enterprise users.
Lastly, there are outsiders who may have explicit or incidental access to the systems or may be able to influence the system inputs, e.g., other users or adversaries within the enterprise.

Note that multiple users, providers, data owners, or outsiders may be involved in a given deployment.

A trust model for the given system assigns a level of trust to each actor within that deployment. Any actor can be trusted, untrusted, or partially trusted (trusted to perform or not perform certain actions). The sum of those trust assumptions forms the trust model and identifies how bad actors may attack the system.

“Trustworthy machine learning in the context of security and privacy” by Ramesh Upreti, Pedro G. Lind, Ahmed Elmokashfi & Anis Yazidi is licensed under Creative Commons Attribution 4.0 International, except where otherwise noted.

Trust Model from SoK: Security and Privacy in Machine Learning by Nicolas Papernot, Patrick McDaniel, Arunesh Sinha & Michael P. Wellman, used under Fair Dealing for Educational Purposes (Canada).

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

A real-world concern

Solution: Federated Learning

Trust Model

License

Share This Book