For Loops and Its Applications in Machine Learning

Moeez Omair; Rutwa Engineer

2 For Loops and Its Applications in Machine Learning

Moeez Omair and Rutwa Engineer

Learning Objectives

In this module, we will explore a fundamental concept in programming: the For Loop. They are widely used in various programming tasks, from simple iterations to complex data processing in machine learning (ML) and artificial intelligence (AI). This module will help you to:

Understand the basic purpose and syntax of For Loops
Extend your knowledge with nested For Loops and loop control statements
Learn the applications of For Loops in machine learning applications
Analyze potential ethical concerns with the blind usage of For Loops

Introduction

What are For Loops?

For Loop: repeats the sequence of code for a pre-defined number of times.

For example, if I would like to output a integers 1 to 5 in Python, then:

# try it!


x = 5
for i in range(x):
    print(i)

Another example we can create involves multiplication as a for loop of repeated addition operations. For instance, (3 [latex]\cdot[/latex] 4) is the same as adding 3 four times (3 + 3 + 3 + 3).

Here’s how you can use a for loop to perform multiplication by repeated addition:

# try it!

a = 3
b = 4
result = 0
for i in range(b):
    result += a
print(result)

Extending For Loops

Nested For Loops: A For Loop inside another For Loop.

# try it!
x = 5
y = 10
for i in range(x):
    for j in range(y):
        print(i, j)

Loop Control Statements:

Loop control statements are responsible for managing the flow of loops. There are multiple types of control statements which we will go through below. by allowing you to start, stop, or skip iterations

Break: Terminates the loop.

Continue: Skips the current iteration and moves to the next.

Pass: Does nothing and moves to the next iteration.

# try it!
x = 10
y = 5
for i in range(x):
    if i == 3:
        break # replace with 'pass' and 'continue'
    print(i)

Machine Learning Applications

Data Preprocessing

For loops are used to iterate over datasets to clean, normalize, and prepare data for machine learning models.

Cleaning Data: This process which involves correcting any inconsistencies to improve the quality of the preliminary data. Examples of such inconsistencies could include data collection errors such as missing values or massive outliers.

Normalizing Data: This process adjusts the data so that it fits within a specific range, like 0 to 1, will maintaining the relative distribution of the datapoints. For example, if you have a set of numbers, normalizing them would spread them out evenly between 0 and 1.

_{(These are concepts you may encounter in STA107 and/or STA256)}

Training Models

For loops are used to iterate over epochs and batches during the training of machine learning models.

Epoch: One complete pass through the entire training dataset.

Batch: A subset of the training data used to train the model in one iteration.

Using batches allows us to conduct parallel computing over multiple subsets of data, allowing us to speeding up the training process.

_{(These are concepts you may encounter in CSC311)}

Hyperparameter Tuning

For loops are used to iterate over different sets of hyperparameters to find the best model configuration.

Hyperparameters: These are parameters determined before the learning process begins, such as learning rate, batch size, and number of epochs.

Essentially, hyperparameters are modified and experimented in order to create a model that both:

models the current data to a high degree of accuracy
provides a margin to accept new datapoints

An example of a hyper-parameter is polynomial feature mapping. This hyperparameter gradually increases the complexity of the function estimating

A model is mapping a polynomial to a set of datapoints. We witness an underfit at first where the polynomial poorly models the data between polynomial degrees 1 to 4. Next, we witness a good fit between polynomial degrees 5 to 8. Finally, we witness an overfit for polynomial degrees 9 and above. — (gbhat.com 2021)

(Example) Application of a For Loop to Train a Model

for epoch in range(num_epochs):


for batch in data_loader:

# code here

A related example you may have learned or will learn in MAT135 is the Taylor series. While the mathematics behind the operation is different (pun intended), the concept is similar. Both processes involve approximations to a true function or distribution representing the data.

This animated image demonstrates taylor series in motion, mapping it using higher values of N resulting in a good approximation of the function. — (Wikipedia)

Ethical Issues

Lack of Subjectivity/Context:

For loops can lead to ethical issues in AI applications if not performed correctly as:

Bias: If the data used in the loop is biased, the output will also be biased.
Fairness: For loops do not inherently consider fairness or ethical implications.
Comprehension: The repetitive nature of for loops can make it difficult to understand how decisions are made.

Example #1:

Back in 2016, The Guardian reported on a study conducted on an AI model responsible for judging a beauty contest. In order to do so, the AI model, Beauty.AI, relied on large datasets of photos to build an algorithm that assessed beauty. The findings showed that the model strongly favoured lighter-skinned individuals (The Guardian 2016). This exemplifies the ethical issue that arises when complex calculations, such as those conducted within For Loops, could result in undesirable results if overlooked. Any biases present in the data will be perpetuated by the loop, which means that if the previous winners within the dataset predominantly contains images of lighter-skinned individuals, the loop will repeatedly reinforce this bias.

Example #2:

In another example, we can refer to scoring methodologies used in the Olympics. In Gymnastics specifically, a contestant’s average score is evaluated after removing the highest and lowest scores (NBC Olympics 2024). This methodology is beneficial from multiple aspects as it:

maintains consistency: the remaining scores are better representative of the consensus of judges
reduces bias: any overly favourable or harsh take from a single judge may be removed and bear no influence to the average

This introduces an important technique which could be utilized in machine learning applications as it can promote fairness in judgement. In a For Loop, it can be implemented by organizing the data from low to high on some scale and using loop control statements to skip the first and final iteration of the loop.

References:

Gymnastics 101: Olympic scoring, rules and regulations. NBC Olympics. (2024, March 13). https://www.nbcolympics.com/news/gymnastics-101-olympic-scoring-rules-and-regulations

Polynomial regression under fit, good fit, over fit. gbhat.com. (2021, July 21). https://gbhat.com/machine_learning/polynomial_regression_fit.html

The Guardian. (2016, September 8). A beauty contest was judged by Ai and the robots didn’t like Dark skin. The Guardian. https://www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-black-people

Wikipedia. Taylor series. Wikipedia. https://en.wikipedia.org/wiki/Taylor_series

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Introducing Critical Algorithmic Literacies in Computer Programming Copyright © by Rutwa Engineer; Moeez Omair; Alisha Hasan; Adelina Patlatii; Lanz Angeles; Sana Sarin; Madhav Ajayamohan; and Izabelle Marianne is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.