Figures

Aditi Gupta; Katie Harding; Sree Gayathri Talluri

12 Figures

Read time: 6 minutes

Overview

As we saw in the flow chart introducing this part, figures (charts, diagrams, illustrations, plots) are shown to represent many forms of data and its trends, including relationships between different variables. This chapter will describe different types of figures and also cautionary examples of misrepresenting data.

Sections in this chapter

Essential Parts of a Figure
Diagrams, Photos, Animations, and Flow Charts
Bar Graphs, Pie Charts, and Histograms
Box Plots
Scatter Plots and Line Plots
Sankey Charts
Spectra

Essential Parts of a Figure

There are several things that almost every figure should contain (Figure 12.1):

Bar graph entitled "Growth of Nanowires from Materials A-C at 10-40 degrees Celsius". The x-axis is growth temperature (in degrees celsius) and the y-axis is length of nanowire (nm). It compares three materials: Material A, Material B, and Material C. — **Figure 12.1.** Example of a figure and it’s key parts.

A title
Labelled axes with units if applicable
Appropriate axes scales
Appropriate precision
A legend (separate or integrated)
A caption

If you are missing any of the above, your figure at best will be confusing, and at worst will mislead the reader and misrepresent your data.

Below are some common mistakes in graphic design. Many of these will be addressed below or in the chapter on Accessible and Ethical Design.

The wrong type of figure for the data
Axis scaled inappropriately (e.g., 0-10 when max. value was 6)
Labels and text are too small
Colours are too light for printing
Colours are difficult to distinguish in greyscale
Colours are difficult to distinguish for colour-blind persons
Axis label is missing units
Axes numbers have the wrong precision; too much (10.000, 11.000, 12.000) or too little

Diagrams, Photos, Animations, and Flow Charts

These are figures that don’t necessarily contain data but convey important information about an instrument, procedure, or process. These types of figures (Figures 12.2, 12.3) are useful in the Introduction to explain an abstract concept, or in the Methods section to show how something works.

Figure showing the structure of the coronavirus SARS-CoV-2. — **Figure 12.2.** Representation of Coronavirus

Figure showing process of photosynthesis. A plant receives sunlight, carbon dioxide, and water, and produces oxygen.

Bar Graphs, Pie Charts, and Histograms

These three types of graphs all show amounts, either frequencies across groups (Bar Graph) or relative to a whole (Pie Chart or Histogram). Bar charts are a clear way of presenting a trend in data. In a vertical bar chart (more common), the measured variable is on the Y-axis while the categorical variable is on the X-axis. Bar charts can be clustered like in Figure 12.1 or stacked like in Figure 12.4A to show different subsets of data.

Histograms show the frequencies (Y-axis) of an occurrence within the data of data split into bins. In Example Figure 12.4B, the data contains a list of hail pellet masses in grams, and the data plots the frequency that the hail pellet was ranged from 1-2.5 g, 2.5-4 g, etc. The selection of bin size can change how the data looks and should be carefully considered. The Pie Chart in Figure 12.4C shows the same data in a different way, but the designer chose to focus on the idea that most hail pellets are 3-6 grams in mass.

Three figures. Figure A is a stacked bar chart entitled "Projected annual deficit" showing the projected deficit in millions of dollars CAD for four years. Each bar includes three categories: Essential, under review, and removed. Figure B is a histogram entitled "Distribution of hail pellet mass" showing the frequency of different masses of hail pellets (in grams). Figure C is a pie chart which shows the frequency of three different sizes of hail pellets. — **Figure 12.4.** (A) Example of a stacked bar chart. Notice that the categories are sorted logically (essential, under review, removed). (B) Examples of Histograms and (C) Pie Charts.

Bar charts should be used to depict amounts or frequencies, but not means. If you are trying to depict means of your data (i.e. multiple measurements averaged within separate categories), it’s better to use a Box Plot or another type of graph that shows the distribution of data.

Box Plots

Box plots (also called Box and Whisker Plots) and related graphs (Stem and Leaf, Violin) effectively show the distribution of data within categories (Figure 12.5). They contain information about the spread of the data; it’s quartiles, the median, the mean, and outliers.

A box plot entitled "Seasonal temperatures in Fake City, Ontario". The x-axis is year and the y-axis is temperature (in degrees C). It compares the weather in three time periods (Winter, Spring/Fall, and Summer), for the years 2010, 2015, and 2020. The mean, median, and outliers are all indicated. — **Figure 12.5.** Features of a Box Plot

Box Plot vs. Bar Chart

Look at the figure below (Figure 12.6) to see the differences between a Box Plot and Bar Chart when showing a collection of data, and it’s mean. The Box Plot contains much more information than the Bar Chart, in the same amount of space.

Figure entitled "Seasonal temperatures in Fake City, Ontario" and showing a box plot and bar chart to represent temperatures in 2010. Much more information is shown on the box plot, including the mean and median temperature for each season. — **Figure 12.6.** Comparing a Box Plot of fake temperature data in the year 2010 versus mean temperatures shown in a Bar Chart.

Scatter Plots and Line Plots

Scatter plots are correlations of two variables plotted as X and Y coordinates (Figure 12.7). Data is usually plotted in this way when we are interested in knowing if there is a relationship between the changes in the two variables. In most cases, there is an independent variable (plotted on the X-axis) and a dependent variable (plotted on the Y-axis).

Scatter plot entitled "Relation of bowling scores, strike frequency". The x-axis is strikes per 3-game set. The y-axes are score for set of 3 games and equivalent 1-game score. — **Figure 12.7.** An example of a scatter plot showing a linear relationship between strike frequency and bowling scores.

**Figure 12.7.** An example of a scatter plot showing a linear relationship between strike frequency and bowling scores.

A regression line or curve indicates a statistically valid relationship between the two variables, and the regression equation with the statistical parameters are given. Scatter Plots are often misused in this way to imply causation between the two variables (see below).

Correlation vs Causation and Misrepresenting Data

This example of shoddy data analysis was posted by Markus Eichhorn (@markus_eichhorn): ^[1]

“Cucumber consumption has a negative association with COVID deaths. But stay off the lettuce. Yes, it’s another shoddy COVID preprint!”

Figure 12.8. Plot showing deaths per million population vs. grams of cucumbers consumed per day.

Scatter plots can have a connecting line between the data points (rather than a regression line). This line can only be added if each point in the series of related values is from the same source and is dependent on the previous values. DO NOT connect the dots when the measurements are made independently!

Line Plots are similar to scatter plots, but the data on the X-axis can be ordinal rather than continuous. In the example in Figure 12.9, the X-axis of both plots is time. For the Scatter Plot, the time is in minutes and is a continuous scalar variable. In contrast, in the Line Plot the variable is the month, which can is ordinal (has a proper order) but the amount of time within and between each month can vary slightly.

Figure displaying a scatter (line plot) on the left and a line plot on the right. The Scatter (Line) Plot on the left is entitled "Length of Nanowire over Growing Time". It has growing time (minutes) on the x-axis and length of nanowire (nm) on the y-axis. The plot compares three materials: Material A, Material B, and Material C. The points for each are connected with lines. The line plot on the right is entitled "Number of publications in each discipline". The x-axis shows the months from January - April, and the y-axis is number of publications. The plot compares numbers of publications in Chemistry, Physics, and Math. The points for each are connected by lines. — **Figure 12.9.** An example of different types of line scatter plots with scalar (continuous) and ordinal variables on the X-axis.

The use of different symbols and different line types (solid, dash, dots) for each data set helps the reader in distinguishing between the different sets. Line plots can also be made that don’t show individual data points with markers. These graphs are useful for showing trends, particularly when the independent variable is time.

Sankey Charts

Sankey Charts are like diagrams mixed with flow charts and are sometimes called Sankey Diagrams. These types of figures can very effectively show large amounts of data with several variables (see the UK Energy Flowchart below). They are best for showing how something changes over time or throughout a process.

Charts can effectively show large amounts of data.

The annual UK energy flow chart, in the format of a Sankey diagram, shows a lot of data including amounts of energy (as equivalents of the mass of oil), relative amounts, how it is generated, supply versus consumption, import versus export… Can you imagine trying to see all this information in a black-and-white table?

UK Energy Flow Chart with data shown in million tonnes of oil equivalent.

https://www.gov.uk/government/collections/energy-flow-charts

Spectra

Chemical spectra are also sometimes presented as figures within scientific communications. With spectra, it is important to make sure that the resolution is sufficient when the figure is reduced to the size it will appear in the printed paper. This is particularly true of axis labels. In the spectrum, label the protons (as they are characterized in your experimental), and ensure that your X-axis is clear with units indicated (Figure 12.10).

NMR spectrum with issues numbered — **Figure 12.10.** Example of a poorly presented spectrum. Here are ways it could be improved: 1) Add a label or compound name or structure to identify the spectrum; 2) Label the protons as they are characterized in your experimental section; 3) Include units and start the spectrum at 0 ppm; 4) The X-axis and Y-axis labels are too small and difficult to read; 5) End the spectrum at 10-11 ppm.

Susana C Fonseca, Ioar Rivas, Dora Romaguera, Marcos Quijal-Zamorano, Wienczyslawa Czarlewski, Alain Vidal, Joao A Fonseca, Joan Ballester, Josep M Anto, Xavier Basagana, Luis M Cunha, Jean Bousquet. Association between consumption of vegetables and COVID-19 mortality at a country level in Europe. DOI: https://doi.org/10.1101/2020.07.17.20155846 (accessed 2022-09-18).

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Science Communication Toolkit Copyright © by Aditi Gupta; Katie Harding; and Sree Gayathri Talluri is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.