11 Understanding environmental impact of computation through sorting algorithms
Izabelle Marianne and Rutwa Engineer
Learning Objectives
Learning objectives:
- Learn the implementation of BubbleSort and Insertion Sort.
- Investigate the computational efficiency of BubbleSort, Selection sort and Insertion Sort using the time module.
- Gain experience using Python to analyze real world data
- Explain how algorithm efficiency relates to energy consumption.
- Understand the effects of increased energy consumption and e-waste disposal
- Examine current strategies to help minimize negative effects of energy consumption and e-waste disposal
Introduction:
In this module you will learn the fundamentals of sorting algorithms while integrating critical and inclusive pedagogies. This module will focus on Ecopedagogy, where you will explore sorting algorithms by learning their effects on environmental systems. You will explore how these algorithms behave in practice and how computational cost relates to energy use
This module encourages you to not only code for correctness but also for sustainability. As developers, you need to be aware of how your code impacts users and the planet.
Introduction to sorting:
Sorting is a functionality that most of us have come across, whether it’s you filtering Reddit posts, UberEats sorting delivery options by proximity or Spotify using your most played songs to create your Wrapped playlist. Behind each of these scenarios, is a sorting algorithm. For example, Uber makes use of an algorithm that models the possible routes as a graph and prioritizes the shortest route, this is something you will learn in CSCS263 and CSC363.
The sorting algorithms that these companies use have to be fast in order to work on the large datasets they are working with, both for better user experience and the focus of this module – less energy consumption. In this module you will be introduced to more basic sorting algorithms: Bubble Sort, Selection Sort and Insertion Sort.
In Activity One, you will be introduced to common implementations of these sorting algorithms and will be tasked with making modifications to the code so that it takes less time on certain inputs. After this, we will go through the motivation as to why we optimize code – specifically why we do it for the environment. As well as, you will get to see how current technologies, despite utilizing very fast algorithms, still have significant effects on the environment. After this, Activity 2 will lead you through a task where you will be sorting a large dataset on different countries based on their energy consumption through smartphones and the e-waste they generate.
The Sorting algorithms:
Bubble sort – Simple but at what cost: Sorts a list by repeatedly comparing neighbouring elements and swapping if the pair is out of order. This is done ‘n’ times, where ‘n’ is the length of this list (in short, it is called ‘n passes’). The figure below shows how the second and third pass look.
Notice that with each pass, the tail of the list (the boxes highlighted in black) is in sorted order and this sorted section grows. So we need to make sure that in each pass we do not mess up the sorted end of the list and also ensure the largest element bubbles to the correct index.
Think about it, in the first pass, the largest number must bubble to the last index in the list. In the second pass, the second largest must bubble to the second last index of the list. By the time the nth pass is reached the nth largest (the smallest!) must be in the nth largest index (index 0!). In the figures below you can see how in the first pass, the largest lament bubbles to the correct index.
After n passes, there is nothing left to sort, so we are done.
Selection sort, more careful with swaps: This sorting algorithm has a similar idea to bubble sort, instead of bubbling up the largest element it will select the smallest element and move it to its correct place. Hence during this algorithm there is a sorted section that will grow from index 0, to ensure we do not mess up this section, we always consider only the unsorted list when finding the minimum element.
This sorting algorithm makes use of two loops, where the outer loop runs n -1 times. In the first pass, the inner loop will check through the array to find the minimum element and move that to index zero. The sorted section is only index 0 now. Then in the second pass the inner loop will only check from the index 1 to find the minimum element and will move it to index 1 if required. Now the sorted section is index 1 and 0. This will continue until all n elements are in their correct position. The figure below illustrates this process.
Insertion sort, the best for now: The idea behind insertion sort is also maintaining a sorted and unsorted part of the list. With each pass the sorted section becomes larger while the unsorted section becomes smaller.
We compare the first element in the unsorted section (called the key) with every element in the sorted section. If the key is smaller than any value in the sorted section that means it is in the wrong place and must be shifted right to make space for this key which is smaller than it. This shift to the right is done by copying the value and pasting it at the index to its right. Once a smaller element than the key is found the key is copied one index to the right in its correct place. Since we shifted all larger numbers to the right we can be sure that we have not copied over and lost any values. This ensures that the unsorted element finds its correct place in the sorted section, shown below.
This process continues until the unsorted section has a length of 0 – there are no more unsorted elements.
Activity One: Designing for eco-efficiency
For this task, you will be looking at the Python implementation of the Bubble Sort and Insertion Sort algorithms previously described. The code has comments to help you link what we have discussed to actual Python code.
Part 1: The first subtask is to modify the Bubble Sort algorithm so that it does not perform unnecessary swaps. The hint offers a good starting point to think about a case where a certain number of swaps is enough to indicate the list is completely sorted. Once you have figured it out, select the number of swaps your modified Bubble sort takes on a sorted list.
Part 2: The second subtask asks you to look at two implementations of Insertion sort, the first makes use of the algorithm we have described in this module while the second makes use of an algorithm described in the website given in the docstring. Take some time to read how the second implementation works. Once you are done reading both, select which version is faster, why it is faster and what type of list is it faster for.
Correlating time complexity with environmental cost
You have now had an experience learning how to optimize code and how to recognize optimized code. Why is optimization important, especially in regards to environmental impact? Algorithms that take longer to complete require increased CPU power, if the CPU works harder it requires more electrical power. This power comes from power plants that most often burn fossil fuels and other non-renewable sources to generate this power. These effects will be shown through real life examples.
AI, a technology that you have lots of experience with. While it is a revolutionary tool, its environmental impacts are concerning.
- Training a single large AI model can emit as much carbon as five cars over their entire lifetime.
- AI relies on data centers that house servers running 24/7 to process and store data. By some estimates, data centers contribute approximately 1% of global electricity demand – a number that’s only climbing.
- Due to the rapid innovation in the field, AI has a Short Hardware Lifecycle, which means hardware becomes obsolete quickly, leading to an increase in e-waste. Disposal of these components often involves toxic chemicals that leach into soil and water, harming wildlife and communities.
Cloud computing, allows for the immediate access of computing services like servers, data storage, software etc. It too has deep environmental impacts:
- Cloud centres (facilities that provide storage infrastructure for shared data) rely on a constant supply of electricity to power their hardware and AC units for cooling. In order to ensure services are instantaneous anywhere, data centres are hyper redundant. So if one system fails, another immediately replaces it. These redundant servers carve out a significant chunk of energy for cooling and maintenance.
- The cloud currently has a carbon footprint that is greater than the entire airline industry. Consuming 200 TWh annually, the electrical consumption of a single data centre is equivalent to that of 50,000 homes.
Two things we will highlight from these statistics:
Energy consumption specifically of mobile phones
The energy required to support these devices is something we must pay attention to if we are to move towards a sustainable future. There are two ways energy can be consumed in a phone, charging the device and data transfer over the internet.
Charging a smartphone typically provides 10W of power. People charge them on average once a day, for 2 hours. Using the formula [latex]energy=power \times time[/latex]. In this case energy use per day by phone is [latex]10 \times 2 = 20 Wh[/latex] and per year is [latex]20 \times 365 = 7.3 Wh[/latex].
Energy consumed by data transfer, based on the statistic that a smartphone user downloads 10GB of data per month, and data transfer takes 0.2 KWh (watt hours) per GB of data. So per year we have [latex]10 \times 0.2 \times 12 = 24 KWh[/latex].
So in total we will assume that a phone consumes [latex]31.3 KWh[/latex] per year.
E-waste
E-waste (discarded electronic items) when left in a landfill will heat up. This will cause toxic compounds such as lead, cadmium and beryllium into the air. This could then leak into the groundwater – affecting aquatic and terrestrial species. Additionally, exposure to this e-waste can lead to adverse health effects like changes in lung function and respiratory issues.
Activity two – Timing experiment
We will now work with a data set that describes the number of smartphones sold (in millions) and E-waste generation (in tonnes) in several different countries from 2015-2025. The data set is linked here, and has other columns that may be of interest to you – like the percentage of the population using smartwatches and 5G. In this task you are sorting the countries (with the year) based on the amount of e-waste produced and energy consumption of phones.
Part 1: The first task is just to get you used to the time module and timing your code, give it a go on the small data set. Look carefully at the code snippets that deal with timing the code, especially the function from the time module, the comments in the code explain what they are for. Once you feel like you are comfortable with the code, add additional calls to the timing function so that it measures the time to sort the mini dataset using insertion sort, selection sort Timsort – print out these values. Timsort is Python’s inbuilt sorting algorithm. It is a hybrid sorting algorithm in that it uses insertion sort and mergesort (which you will learn in CSC148) – it is much faster than the sorting algorithms you have learnt today.
Part 2: Once you are comfortable with the mini dataset, take a look at the code that will read the dataset (a csv file). You will need to read in the columns “Smartphone Sales (Millions)” and “E-Waste Generated (Metric Tons)” and save it into a dictionary where the key is of the form Country_Year – we have done most of this for you simply follow the comments in the code for your specific tasks.
Part 3: The task requires you to sort countries by the energy consumed via smartphones and e-waste generated. So when you are saving values to the two dictionaries, you would need to modify the values you obtain for one of the columns. Once you have created the two appropriate dictionaries, sort and time them using insertion sort, bubble sort and Timsort, and print the sorted dictionary as well. To print the appropriate key with sorted value, you will need to use a helper that we have provided.
Once you have obtained your sorted dictionary, answer the following countries:
Activity 3: Examining solutions
Look into these websites
- https://shift.com/blog/news/the-carbon-footprint-of-the-internet/
- https://news.climate.columbia.edu/2018/08/27/growing-e-waste-problem/
and read about current strategies in place to tackle energy consumption of phones and how to properly deal with e-waste. After you are done, select all the solutions mentioned.
Now why did we want you to time your code. Think about it, you are using a sorting algorithm to identify an environmental problem – identifying top e-waste producers or energy consumers. But this task itself takes up a lot of energy as we previously discussed how time complexity relates to environmental cost. If the algorithm we choose is inefficient (regardless of what it is used for) especially on large datasets, it is contributing to the same problem we are trying to solve.
Reflection question: We are trying to identify areas of software design that contribute to environmental harm using tools that also contribute to it. What are the responsibilities of software engineers when creating programs?
References: