"

19 Ch. 3.6: Limitations of Parallel Speedup

The impact of parallelization on a program can be measured using a metric called speedup. The speedup ([latex]S[/latex]) of a program is the ratio of the execution time of a program when it is run on a single processor ([latex]T_1[/latex]) to the execution time when run on [latex]N[/latex] processors ([latex]T_N[/latex]):

[latex]S = \frac{T_1}{T_N}[/latex]     (Eqn. 1)

Naively one might assume running a program designed using parallel design patterns on 2 cores would halve the time required to run the program, or running on 4 cores would require a quarter of the time. This is the theoretical upper limit of the improvement to overall performance which could be achieved and is called linear speedup. However, this can only rarely be achieved since there will always be some portion of the program that can only be executed sequentially.

This effect is summarized by Amdahl’s Law which demonstrates that the parallel speedup of any program is limited by the portion of the code that must be run sequentially. If the fraction of time spent running sequential code is [latex]FS[/latex] and the time spent running code which can be parallelized is [latex]FP[/latex] then the total time to run the program on a single core can be written as,

[latex]T_1 = (FS + FP) T_{exc}[/latex]

Where [latex]T_{exc}[/latex] is the time require to execute the program on a single core. When run on [latex]N[/latex] cores, only the parallel portion of the code will be affected so the time to run the program will be,

[latex]T_N = (FS + \frac{FP}{N}) T_{exc}[/latex]

Plugging these values into equation 1,

[latex]S=[T_{exc} (FS + FP)] / [T_{exc} (FS + \frac{FP}{N}][/latex]

Since [latex]FS[/latex] and [latex]FP[/latex] are fractions of the total time they should sum to 1, and [latex]T_{exc}[/latex] appears in both the numerator and denominator so it can be cancelled out. This leaves the following:

[latex]S = \frac{1}{FS + \frac{FP}{N}}[/latex]

In the limit that [latex]N[/latex] goes to infinite. That is, if a large enough number of processors are applied the to program the speedup becomes:

[latex]S = \frac{1}{FS}[/latex]

Which demonstrates that the parallel speedup of any program is dominated by the amount of time spent performing operations that must be done sequentially.

There are other practical considerations that Amdahl’s Law does not account for. These were discussed in the section on hardware bottlenecks and overheads. For instance, adding more worker threads to the problem will not reduce the network latency which slows down a program running on a distributed cluster. In fact, having more processes running could increase the communication overhead.

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Advanced Research Computing using Digital Research Alliance of Canada Resources Copyright © by Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; and Andrew Schoenrock is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.