Iteration over Arrays and Spaces

Rick Sellens

6 Iteration over Arrays and Spaces

Most of the solution spaces we look at in Mechanical Engineering are regular, like a rectangular table of values with every row having the same number of columns and every column having the same number of rows, with a value in each location. We will concentrate on regular, orthogonal shapes like this, although nothing prevents you from extending these ideas to irregular shapes.

One Dimensional Arrays

The C code from the Arduino Learning Sequence “S2_Arrays_of_Inputs” code includes declarations of two arrays:

#define NCH 3    // The number of analog channels to read, starting with A0, must be <= 6
unsigned a[NCH]; // Declare arrays etc. as globals so we can conveniently see them everywhere
float aS[] = {0, 0, 0, 0, 0, 0}; // uninitialized variables make for hard to identify errors!

a[] will have three unsigned integer elements with indices from 0 to 2, with no specified initial value. It has three elements because [NCH] makes it explicit what the size must be.

aS[] will have six floating point elements, each initialized to zero, with indices from 0 to 5. It has six elements because there were six elements present in the initializer, implying a size of six.

We can achieve the same result in Python using the numpy library. numpy arrays have huge advantages over simple Python lists for engineering calculations, so we will ignore lists and use numpy arrays of numbers whenever we can, like in Jupyter on our notebooks. We need to import all the functions from numpy before we can use zeros() and astype() to build our arrays.

from numpy import *
a = zeros(3).astype(int)
aS = zeros(6)

Iterating with a for Loop

One big advantage of using arrays comes from repeating the same operation on multiple members of an array with the same C code:

for (int i = 0; i < NCH; i++){ 
    a[i] = analogRead(A0 + i);
    aS[i] = w * a[i] + (1 - w) * aS[i];
}

In this for loop the integer index i takes on the value 0 before starting the loop (int i = 0). If we are still in range, meeting the condition (i < NCH) the code block inside the loop’s curly braces runs. When execution reaches the end of the loop it increments the index by one (i++) and goes back to the top to repeat the test. Eventually the test fails and execution moves on to the code that follows the closing curly brace. If we are still in range, the loop is repeated. As the loop runs, it reads analog values from pins A0, A1, A2 on an Arduino and stores the results in a[0], a[1], a[2], the three elements of the array. On each pass it also updates aS[] with an exponentially smoothed value for each channel that depends on the weighting factor w defined elsewhere.

In Circuit Python we can achieve the same thing on a supported microcontroller with:

adc = [analogio.AnalogIn(board.A0), 
       analogio.AnalogIn(board.A1), 
       analogio.AnalogIn(board.A2)] 
a = [0,0,0]
aS = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
for i in range(0,3):
    a[i] = adc[i].value
    aS[i] = w * a[i] + (1.0 - w) * aS[i]

adc[] is a standard Python list array of three analog input channels and a is the list array where we will store the integer results. Circuit Python runs on small memory space microcontrollers and doesn’t support the sophistication of the numpy library. The syntax of the for statement is different, but the result is the same.

Two Dimensional Arrays

2D arrays are simple in concept, but can be a little more difficult to operate on than 1D arrays. A spreadsheet is a 2D array with rows and columns and has been referred to by column dominant addresses since Visicalc, entrenching the idea that C6 refers to the value in the third column and the sixth row. Just about every other array paradigm uses a row dominant convention, and zero base indexing so that b[2][1] refers to the value in the third row (row index [2]) and the second column (column index[1]). This is just a convention and you can map these dimensions any way you want, but they will almost always print out row dominant, with b[0][0] at the top left.

Python

Python has rich array capabilities with tuples, lists, and numpy arrays all providing indexed elements for different uses. We can make a two dimensional Python list array by attaching multiple one dimensional lists together like this.

b = []
for i in range(0,4):
    for j in range(0,3)
        a[j] = adc[j].value
    b.append(a.copy())
print (b, b[2][1])

b starts out as an empty list, with no elements. Note the nested for loops. This code has an inner loop just like the last one that loads 3 new values into a. The outer loop runs a total of four times, appending a copy of a onto b each time so that the printed result looks something like this:

[[57136, 27040, 27952],
[57184, 28832, 29120],
[57152, 29712, 29856],
[57152, 25536, 29792]]

29712

The first row is row 0, then 1, then 2. The first column is column 0, etc., so b[2][1] is the value in the third row and the second column. In Circuit Python we need to be cautious with 2D arrays because they will very quickly take up more storage space than is available on the microcontroller. On a computer with GigaBytes of space we can make large numpy arrays without much to worry about.

from numpy import *
c = ones( (4,3) )

would be the same shape as b, and full of floating point values of 1.0. (4,3) is a tuple describing the shape of the array, that gets passed to the numpy function ones(). A tuple is a group of multiple values that can be passed around in Python like a unit, sent to functions as an argument, or returned by functions when one value is just not enough.

d = ones( (1024,1024) ) * 3.141592654

would be much larger, with more than a million floating point elements in total, each normally taking 8 bytes, thus 8 MegaBytes of storage space. That’s either enormous, or not much, depending on your perspective. All of those elements would be set to the value 1.0 * 3.141592654, hinting at some of the power of broadcast operations on numpy arrays, providing capabilities similar to MATLAB or the venerable and notoriously unreadable APL.

Although individual elements of b, c, and d can all be addressed as b[2][1] or d[273][451], the data representation beneath the surface is different, so use caution if you mix lists with numpy arrays. You can explicitly visit each element in turn by iteration with for loops

for i in range(0,1024):
    for j in range(0,1024):
        d[i][j] = i * 1000 + j

The code above will start with row 0 and then assign column values to to the all the elements in that row, then move on to row one, etc. It will start at the top left and sweep through the array left to right, then top to bottom, one element at a time.

for j in range(0,1024):
    for i in range(0,1024):
        d[i][j] = i * 1000 + j

will start at the top left, sweep all down the first column from top to bottom, then move to the next column, etc. The order in which it sweeps through the array will only matter if the results depend the other values in the array, or in other situations where order of execution of related instructions is important like the updating of some video displays. The order you move through the space will be important in sampling systems as detailed below.

C

There’s less variety in native C implementation and the only way to operate on arrays is to explicitly visit each element in turn:

double d[10][12];     // 10 rows, 12 columns 2D array of doubles with no initialization
for(int i = 0;i < 10;i++){
    for(int j = 0;j < 12;j++){
        d[i][j] = 1000 * i + j;   // assigns a value to each element in turn
        Serial.print(d[i][j],1); 
        Serial.print(", ");
        delay(100);
    }
    Serial.print("n");
}

will work its way element by element left to right and top to bottom. With the print() statements and delay() you can watch it happen on the serial monitor window of your Arduino.

Sampling Data Spaces without Arrays

Sometimes we don’t have, or need, nice tidy tables of data. We might have a set of analog temperature sensors spaced vertically every 25 cm to measure temperature from floor to ceiling in a room. When the heat goes on, the room will warm up, but will it warm fastest at the top of the room, or more uniformly. To know, we would turn on the heat and watch for an hour.

Each sensor has a continuous analog output, so we could take almost infinitely many measurements over the course of an hour if we had the right equipment. Our data space is continuous in time (t), but we will need to make it discrete by choosing times to measure.

On the other hand, we might have only eight distinct sensors at discrete vertical (z) locations in the space (and only one set of sensors, so only one (x,y) location). Our data space is already discrete in height (z).

Sampling all eight every millisecond for an hour, we could record an array of [3600000,8] temperature data points occupying about 55 MB of storage, a rather enormous data set to answer our relatively simple questions. We really don’t need that much data and it would be difficult to fit into a microcontroller based system. It would probably be enough to track averages minute by minute, bringing us down to [60,8] for less than 1K of summary data. How do we sample during each of those minutes without storing an enormous amount of data?

for(i = 0; i < 1000; i++)
  for(j = 0; j < 8; j++)
    data[i][j] = analogRead(A0 + j);

would read and store the data in an array for summing and averaging later, still a very large amount of data for what will reduce to 8 average values.

for(j = 0; j < 8; j++) sum[j] = 0;
for(i = 0; i < 1000; i++)
  for(j = 0; j < 8; j++)
    sum[j] += analogRead(A0 + j);
for(j = 0; j < 8; j++) sum[j] /= 1000;

would calculate the sums on the fly and finish with just the eight average values for that sampling period. Does it matter which order we put the for loops in? Let’s reverse the order and look at the consequences:

for(j = 0; j < 8; j++) sum[j] = 0;
for(j = 0; j < 8; j++)
  for(i = 0; i < 1000; i++)
    sum[j] += analogRead(A0 + j);
for(j = 0; j < 8; j++) sum[j] /= 1000;

In both examples there will be 1000 samples summed up in each element of sum[]. The first example will take 8 sensor measurements, then 8 more sensor measurements, then 8 more until it has taken a thousand of each, spread evenly over the course of the sampling period. The second example will take 1,000 samples from the first sensor, then 1,000 from the second, and so on over the course of the sampling period. All of the samples from the first sensor will be measured in the first few moments of the period, while all the samples of the eighth sensor will be taken in the last few moments. The averages will not be representative of the entire period, but just a fraction of that period. That could be a problem if the values are changing with time due to either noise or an unsteady heating process.

Sample as uniformly as possible

When taking discrete samples as a summary of a much larger data space, arrange your sampling to be as representative of the entire space as possible.

Exercises

Write some Arduino sampling code with these characteristics

create an integer array channels[] of four analog input channels and initialize it to contain A5, A1, A0, and A4 in that order
create an array result[] of a data type suitable for holding the average of 1000 analogRead() values from those channels
write a function fourAverage(type result[], int channels[], int n, int nAvg) that averages nAvg readings from each of the n analog channels[] and returns the average values in result[]

Test your code to convince yourself the function is performing as you expected. See if you can design the function so that it doesn’t need any additional variables besides some index counters. Hint:

(1) $\begin{equation*} \bar{x} = \frac{\sum x}{n} = \sum\frac{x}{n} \end{equation*}$

Remember that C passes arrays by reference, so any changes you make in the function will go back to the original calling code

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License