"

14 Ch. 3.1: The Anatomy of a Computing Cluster

image

Figure. The components of a computing cluster. Each level has a one-to-many relationship with the level below it. For example, the Cluster is comprised of many nodes. The dark grey levels represent components of the cluster that exist in the real world as hardware, whereas the light gray levels exist only as software components.

The components of a High Performance Computing cluster can be thought of as a hierarchy, where each level is comprised of one or more instances of the preceding level. At the top of the hierarchy is the cluster itself, which comprises the entire system. It is composed of many interconnected computers (nodes) which communicate via a high-speed network. The cluster could be contained in a single room or distributed across a wide geographical area.

The next level down are the nodes. A node is a single computer or server within the cluster that contains its own processors, memory, and storage, and typically runs its own instance of an operating system. It could either be a physical server or a virtual machine. There are various types of nodes, such as compute nodes (which perform computations) or storage nodes (which handle data storage and retrieval). The processors within a node can communicate efficiently with each other. Each compute node will contain one or more processors which in turn will contain one or more cores. These processors could be standard CPUs, or more specialized processors such as GPUs, TPUs, or FPGAs. Each core within a standard CPU processor consists of an Arithmetic Logic Unit (ALU), Control Unit and Registers (memory) allowing it to run instructions independently. The cores can be viewed as the fundamental unit of computation.

The next two levels of hierarchy after cluster, nodes and processors/cores are not physical pieces of hardware but instead exist virtually as software constructs. The first of these virtual components are the processes. A process is an independent instance of a running program, created by the operating system in which the process has its own memory space, system resources, and execution state. Each one of these processes contains one or more threads. If a process is single-threaded, it only requires a single core. However, if it is multi-threaded, it can run on multiple cores within a single node. This would be an example of shared memory programming. Alternatively, a single program could be written to run multiple processes, potentially across multiple nodes. This is an example of distributed programming. Each process in a multi-process program could be either single-threaded or multi-threaded. Depending on which of these cases is true will dictate the programming framework used for the development of the program being executed by the process. For shared memory programs OpenMP could be used, whereas MPI is used for distributed computing. More detail about the usage of these packages will be given later.

A thread is the smallest unit of execution within a process. It is a serial sequence of instructions that are treated as a single unit for the purposes of scheduling by the OS and execution on a core. Multiple threads within a process share the same memory space which allows for quicker communication between threads than processes to process communication.

The following table summarizes the most relevant terminology from this section.

Component

Summary

Cluster

Group of nodes connected via a network.

Node

Individual computer with its own processors (e.g., CPU, GPU), memory, and OS.

Processor / Cores

CPU, or GPU inside a node; each typically has multiple cores for execution.

Process

Independent programs running on cores, each with its own memory space.

Thread

A sequence of instructions that are treated as a single unit for the purposes of scheduling and execution

definition

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Advanced Research Computing using Digital Research Alliance of Canada Resources Copyright © by Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; and Andrew Schoenrock is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.