Ch. 3.5: Hardware Bottlenecks and Overhead

Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; Andrew Schoenrock

19 Ch. 3.5: Hardware Bottlenecks and Overhead

As previously described in the first section of this chapter, a computing cluster is composed of many hardware and software components. The previous sections described how parallel programming design patterns can be used to improve the overall performance of a program at the software level. However, there are limitations within the other components of the cluster, namely the hardware components which affect performance as well. When the hardware is limiting the performance of a program it is referred to as a hardware bottleneck. There are five primary hardware bottlenecks: network latency, Storage I/O, Memory limitations, Processor speed and thermal constraints. Not all these bottlenecks will have an impact on the overall performance of every HPC project, however they should be ruled out when attempting to optimize a given parallel program.

In a distributed computing environment (ie. an environment with programs whose processes run on different nodes), nodes must communicate with each other via a network connection. Delays in communication over a network is called network latency. If these delays become too large, they can form a bottleneck and constrain overall performance.

Interacting with data on a hard drive is a relatively expensive operation in terms of time and this can cause a Storage I/O (input/output) bottleneck. This can result when a program reads data from the disk in preparation for analysis or when writing results to the disk. These bottlenecks can further compoundwhen Network File Storage (NFS) is used. The time required to access the disk is increased due to the network latency discussed previously. There exist other types of I/O bottlenecks, for instance if data needs to be analyzed in real-time from several sensors. The data from the sensors would have to read from a port, such as a serial port, which could form a bottleneck.

The memory available to a process can also form a bottleneck referred to as a Memory Limitations bottleneck. If there is insufficient capacity to store required data or the speed of access is slow, memory can prevent the system from quickly accessing the data that it needs. This type of bottleneck refers to the Random Access Memory (RAM) available to the process. The inefficient use of the various caches discussed previously, while processing data stored in RAM in a random order for instance, can also form bottleneck.

Processor Bottlenecks occur when a processor is not powerful enough to handle the amount of data being processed, which causes it to limit the speed of computations. This type of bottleneck can be particularly problematic for a program designed using Task Parallelism. If one of the tasks in the pipeline is particularly processing intensive, it will form a bottleneck, decreasing the speed at which the entire program executes.

Finally, there are thermal constraints which can cause bottlenecks. Processors (CPUs and GPUs) as well as memory and storage devices generate heat as they run. If this heat is not dissipated it can damage the equipment. To prevent damage from taking place the performance of the hardware maybe throttled, meaning it is not allowed to run at peak performance, to allow the equipment cool down.

In addition to these hardware bottlenecks, there are also effects known as “overheads” which contribute to inefficiencies of a parallel program. Overheads refer to any operations or resource usage that does not directly relate to the computational task being performed. These can include communication overhead which is the time spent moving data between nodes, cores, and memory, or synchronization overhead which occurs when a thread or process needs to wait for another to finish. Both overheads are similar to the hardware bottlenecks discussed above. However, there is a third type of overhead which doesn’t have a direct connection to hardware bottlenecks. Context switching occurs when a new process/thread is created by the OS or when the OS pauses one process/thread to work on another. This results from the need to save and restore the state of the current and new processes/thread.

The following table summarizes the differences between hardware bottlenecks and overheads.

	Hardware Bottlenecks	Hardware Overheads
Definition	A limiting factor that slows down performance	Extra computational effort required to complete a task
Effect	Causes underutilization of other hardware components.	Increases resource consumption without improving performance.
Examples	Slow CPU, limited memory bandwidth, network congestion	Cache management, interconnect delays, synchronization costs.
Solution	Upgrade hardware, balance workloads, optimize software	Reduce inefficiencies, improve hardware design, optimize algorithms

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Advanced Research Computing using Digital Research Alliance of Canada Resources Copyright © by Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; and Andrew Schoenrock is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

License

Share This Book