"

3.6 Adversarial Example in Physical World

Physical adversarial attacks often involve altering an object’s visual attributes, such as painting, stickers, or occlusion. They are broadly divided into two categories: (1) two-dimensional attacks and (2) three-dimensional attacks.

Figure 3.6.1: A sticker that makes a VGG16 classifier trained on ImageNet categorize an image of a banana as a toaster. Image by Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer, FDEd (CAN)

Everything is a Toaster: Adversarial Patch

One of my favourite methods brings adversarial examples into physical reality. Brown et al. (2017) designed a printable label that can be stuck next to objects to make them look like toasters for an image classifier. Brilliant work!

This method differs from the methods presented so far for adversarial examples since the restriction that the adversarial image must be very close to the original image is removed. Instead, the method completely replaces a part of the image with a patch that can take on any shape. The image of the patch is optimized over different background images, with different positions of the patch on the images. Sometimes, it is moved, sometimes larger or smaller, and rotated so that the patch works in many situations. Ultimately, this optimized image can be printed and used to deceive image classifiers in the wild.

Never bring a 3D-printed turtle to a gunfight – even if your computer thinks it is a good idea. Robust adversarial examples

The next method is adding another dimension to the toaster: Athalye et al. (2017) 3D-printed a turtle designed to look like a rifle to a deep neural network from almost all possible angles. Yeah, you read that right. A physical object that looks like a turtle to humans looks like a rifle to the computer!

Video: “Synthesizing Robust Adversarial Examples: Adversarial Turtle” by Synthesizing Robust Adversarial Examples [0:21] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.

The authors have found a way to create an adversarial example in 3D for a 2D classifier that is adversarial over transformations, such as all possibilities to rotate the turtle, zoom in and so on. Other approaches, such as the fast gradient method, no longer work when the image is rotated or the viewing angle changes. Athalye et al. (2017) propose the Expectation Over Transformation (EOT) algorithm, which generates adversarial examples that even work when the image is transformed. The main idea behind EOT is to optimize adversarial examples across many possible transformations. Instead of minimizing the distance between the adversarial example and the original image, EOT keeps the expected distance between the two below a certain threshold, given a selected distribution of possible transformations. The expected distance under transformation can be written as:

[latex]\mathbb{E}_{t\sim T}[d(t(x'),t(x))][/latex]

where x is the original image, [latex]t(x)[/latex] the transformed image (e.g. rotated), [latex]x'[/latex] the adversarial example and [latex]t(x')[/latex] its transformed version. Apart from working with a distribution of transformations, the EOT method follows the familiar pattern of framing the search for adversarial examples as an optimization problem.

Should I Stop or Speed Up? Road sign

Eykholt et al. (2017) proposed a white box adversarial example attack against their own trained road sign recognition models. They trained several CNN models, including LISA-CNN and GTSRB-CNN models, to recognize road signs, which were then used as target models. They proposed two kinds of perturbation mounting methods for the road sign scenario. The first is a poster-printing attack, in which the attacker prints the adversarial example generated by C&W attacks and other algorithms as a poster and then overlays it on the real road sign, as presented in the video below.

​The left side is a video of a perturbed Stop sign, and the right side is a clean Stop sign. The classifier (LISA-CNN) detects the perturbed sign as Speed Limit 45 until the car is very close to the sign. At that point, it is too late for the car to reliably stop. The subtitles show the LISA-CNN classifier output.

The second is the sticker perturbation attack, in which the attacker prints the perturbations on the paper and then pastes it on the actual road sign.

 

Video: “Subtle Poster Drive-By Demo (LISA-CNN)” by Road Signs [0:09] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.

The left-hand side is a video of a true-sized Stop sign printout (poster paper) with perturbations covering the entire surface area of the sign. The classifier (LISA-CNN) detects this perturbed sign as a Speed Limit 45 sign in all tested frames. The right-hand side is the baseline (a clean poster-printed Stop sign). The subtitles show LISA-CNN output. Both ways have proved effective, according to their experiments.


“Everything is a Toaster” and “Never bring a 3D-printed turtle to a gunfight…” from “Adversarial Examples” in Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

“Should I Stop or Speed Up? Road sign” based on excerpts from “A survey of practical adversarial example attacks” by Lu Sun, Mingtian Tan & Zhe Zhou, licensed under a Creative Commons Attribution 4.0 International Licence and “Robust Physical-World Attacks on Deep Learning Visual Classification” by Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song, used under Fair Dealing for Educational Purposes (Canada).

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Winning the Battle for Secure ML Copyright © 2025 by Bestan Maaroof is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.