Ch. 2.2: Alliance National Clusters

Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; Andrew Schoenrock

10 Ch. 2.2: Alliance National Clusters

Alliance National clusters

Alliance Clusters

As of this writing, the Alliance national clusters are as follows:

Béluga – a general purpose cluster located at École de Technologie Supérieure with 972 compute nodes totalling 38,880 cores and 190TB of RAM
Cedar – a general purpose cluster located at Simon Fraser University with 2,272 compute nodes totalling 93,712 cores and 350TB of RAM
Graham – a general purpose cluster located at University of Waterloo with 194 compute nodes totalling 8,568 cores and 42TB of RAM
Narval – a general purpose cluster located at École de Technologie Supérieure with 1,390 compute nodes totalling 440TB of RAM
Niagara – a cluster for large parallel jobs located at University of Toronto with 2,024 compute nodes totalling 80,960 cores and 320TB of RAM

Most clusters are general purpose in that they support a variety of workloads. The Niagara cluster, however, is designed to run primarily large parallel jobs that use many CPU cores at once. Unlike the other national clusters, access to Niagara is not available by default, but access can be enabled on request.

Connecting to a cluster

You will need a terminal application on your desktop or laptop to connect to a cluster. In our example we use the Graham cluster mentioned above. We also pretend our Alliance username is youruser. When you try this command yourself, you should use your own username, and you should select a current cluster from the list of Alliance national clusters. The cluster hostname that you connect to is its name followed by .alliancecan.ca. For example, to connect to Beluga, you would use the hostname beluga.alliancecan.ca.

More details on using ssh to connect to Alliance are here

Connecting from a Mac

If you are on a Mac or Linux, there is a built-in Terminal app that comes with your operating system. From the Terminal, you can run:



ssh youruser@beluga.alliancecan.ca

When you execute the ssh command and provide a username and cluster hostname you will be prompted to enter your password. The Alliance is implementing multi-factor authentication using the Duo mobile app. This will soon become mandatory. Before then, it is recommended that you enroll in multi-factor authentication to improve your account’s security. Information on multi-factor authentication for your Alliance account is available here. When you successfully login, you will see some messages and then be left with a shell prompt, similar to what you had before connecting. However, the prompt will show different information, such as the name of the login node and your current directory.



ssh youruser@beluga.alliancecan.ca



Success. Logging you in...

Last login: Mon Mar 10 14:26:52 2025 from 8.8.8.8

############################################################

_       __ _

| |__   /_/| |_   _  __ _  __ _     Grappe de calcul Béluga

| '_ \ / _ \ | | | |/ _` |/ _` |

| |_) |  __/ | |_| | (_| | (_| |  Aide/Support:    support@

|_.__/ \___|_|\__,_|\__, |\__,_|  Globus Collection:  compu

|___/         Documentation:

Portail/Portal: portail.bel

############################################################

[youruser@beluga2 ~]$

To stop the shell session on the remote server and return to your local terminal, use the exit command.

exit

Connecting from Windows

To connect from Windows, you may need to install a terminal or SSH app such as MobaXterm or Putty.

MobaXterm is useful because it also includes file transfer capabilities. To complete the initial setup of a MobaXterm session:

Start MobaXterm, then click the Session button to open the Session settings dialog box
Click the SSH icon along the top of the Session settings dialog box
Type the national cluster hostname in the Remote host field, for example beluga.alliancecan.ca, select the Specify username checkbox, and type your Alliance username
Click OK
The MobaXterm login will ask you for your password. As you type your password nothing will appear on the screen, not even asterisks
MobaXterm will connect to the cluster. MobaXterm will save your session. To reconnect, you do not need to fill out the session dialog again. Instead, choose the saved session

Transferring Files

Transferring using Globus

Globus is a high-performance data transfer tool. To use Globus to transfer files from your computer, you first need to install Global Personal Connect.

Then you can login to the Globus website and start a file transfer by following the Globus instructions. The transfer happens in the background, and you can login to the website again to monitor its progress.

Transferring from Windows

If you are using MobaXterm to connect to a national cluster then you can use it to transfer files to the cluster. MobaXterm has a file management pane on its left side where you can drag files to and from your Windows folders.

Another option is the free app WinSCP.

Transferring from a Mac

To transfer files from your Mac, you can install and use a free graphical app such as filezilla or cyberduck.

You can also use the Mac’s built-in Terminal app and use terminal commands to transfer files. Here is an example command to upload a file called example.txt to the server. You would need to change your username and the name of the cluster you are using:

scp example.txt youruser@beluga.alliancecan.ca:

To copy a file in the other direction, and download it to your computer:

scp youruser@beluga.alliancecan.ca:example.txt .

Running jobs

When you run code or software on a cluster, we call that a job. You don’t directly start your job on a cluster. Instead, you submit a batch job that waits in queue until hardware is available to run your job.

This batch job is submitted to a scheduler. The scheduler used on national clusters is called SLURM. The following subsections describe how to run jobs using SLURM. Detailed reference information may be found in the Alliance documentation.

Submitting a Job

A job is described by a shell script that includes additional scheduling information in comments. The additional information includes the resources required to run the job. Resources can be things like amount of time, number of cores, amount of RAM, or GPUs.

Here is an example of a minimal job script:

#!/bin/bash

#SBATCH --time=00:01:00

echo “Hello World”
date
sleep 10
date

The SBATCH comment provides an option to the SLURM scheduler. This comment requests one minute of time. Then the rest of the script contains shell commands that will print output and pause for a few seconds.

You should not run a batch queue script directly on a login node. Instead, you should use the sbatch command to submit the script to the job queue.

For example, let us suppose that the above script is called firstjob.sh. To submit this to the scheduler, run:

sbatch firstjob.sh

Submitted batch job 28070080

The sbatch command shows the job number, which is 28070080 in this example. Each job has a unique number that you can use to refer to the job in in other commands.

Until the job is completed, you can check its status using the squeue command. Without options, the squeue command will list all jobs for all users. So instead add an option to only list your own jobs:



squeue -u $USER

    JOBID  USER      ACCOUNT        NAME ST TIME_LEFT NODES
         28070080 jdoe3 cc-debug_cpu firstjob.sh PD      1:00     1

In the above output, you can see the status PD in the ST column. This indicates the job is pending, and has not started running yet. Once it is running, that status changes to R:



squeue -u $USER



JOBID  USER      ACCOUNT        NAME ST TIME_LEFT NODES

28070080 jdoe3 cc-debug_cpu firstjob.sh  R      0:56     1

After the job has completed, it is removed from the squeue list. The job will no longer appear when you run squeue. Instead, you can use the sacct command to see completed jobs. Here is an example that limits the number of columns in the sacct output to make it easier to read:



sacct --format=JobID,JobName,MaxRSS,Elapsed



JobID    JobName     MaxRSS    Elapsed

------------ ---------- ---------- ----------

28070080    firstjob.sh              00:00:11

28070080.ba+      batch     17964K   00:00:11

28070080.ex+     extern          0   00:00:11

SLURM saves a file with the output that an interactive job would have printed to the screen. Let’s use the cat command to print it out:

cat slurm-28070080.out



Hello World

Wed 01 Jan 2025 01:02:29 AM EST

Wed 01 Jan 2025 01:02:39 AM EST

Requesting Resources

The previous section demonstrated a short job script. This job script has additional SBATCH options to request more resources:

 #!/bin/bash

#SBATCH --time=00:01:00

#SBATCH --ntasks=1

#SBATCH --cpus-per-task=1

#SBATCH --mem-per-cpu=2G

#SBATCH --account=def-youruser

#SBATCH --output=%x-%j.log

echo “Hello World”
date
sleep 10
date

Here is the meaning of the SBATCH options in this example script:

--ntasks=1: In this example a single task is set. Setting multiple tasks is used with multiprocessing for software that supports it. Tasks may be distributed across multiple compute nodes. Code that uses MPI is an example of multiprocessing.
--cpus-per-task=1: The number of CPUs per task controls how many CPU cores are available in each task. These CPUs will be on the same node and are used with multithreading for software that supports it. Code that uses OpenMP is an example of multithreading.
--mem-per-cpu=2G: How much memory is needed per core. In this case, one core was requested, so this is a total of 2GB of memory.
--account=def-youruser: Alliance projects are assigned a SLURM account to track resource use. Each PI has a default group shared with their sponsored members. If a user has access to more than project, then they must use the account option to choose which project a job will run under.
--output=%x-%j.log: When you are not on a cluster and instead run code or software interactively from the command-line, it prints output to the terminal. When that software runs as a cluster job, the SLURM saves the output to a file. This option specifies the filename, where %x is replaced with the job name and %j is replaced with SLURM’s job id number

For a more detailed list, refer to this full listing of SLURM options

You need to request enough resources to satisfy the amount your job requires. For example, if you don’t request enough RAM then when your job runs out, your job could be cancelled. Similarly, if you do not request enough time, when your time runs out your job would be cancelled.

However, if you specify too much, the SLURM scheduler will have a more difficult time finding space for your job so it could wait longer in the queue. Also, SLURM tracks resource use for each project, so if you request too much it would unnecessarily increase your project’s resource accounting.

Cancelling a job

The scancel command removes a job from the scheduler’s pending queue or stops a running job. To cancel the job, first find its JOBID. This is printed when sbatch submits the job, and is also available using the squeue command, such as in this example:

squeue -u jdoe3



JOBID  USER      ACCOUNT        NAME ST TIME_LEFT NODES

28070081 jdoe3 cc-debug_cpu firstjob.sh PD      1:00     1

Then you run the scancel command and specify the JOBID for the job you wish to cancel:

scancel 28070081

After cancelling the job, it should no longer appear in the squeue list.

Using GPUs on a cluster

Most of the Alliance national clusters also include GPUs. As of this writing, the GPU models on Alliance national clusters are as follows:

Béluga – 688 NVIDIA Tesla 16GB V100
Cedar – 96 NVIDIA Tesla 12GB P100, 96 NVIDIA 16GB P100, 128 NVIDIA Tesla 32GB V100
Graham – 16 NVIDIA Tesla 32GB V100, 144 NVIDIA T4 16GB Turing, 16 NVIDIA Tesla A100, 44 NVIDIA Tesla 24GB RTX A5000
Narval – 636 NVIDIA Tesla 40GB A100

Adding an option to your SLURM job script allows you to request GPUs. Here is an example job that requests a GPU:

#!/bin/bash

#SBATCH --time=00:01:00

#SBATCH -n 1

#SBATCH -c 1

#SBATCH –mem-per-cpu=4G

#SBATCH --account=def-youruserGPUs

#SBATCH --output=%x-%j.log

#SBATCH --gpus-per-node=1

The gpus-per-node option determines how many GPUs are requested on each compute node that you are allocated. In this case there is a single task, so there can only be one node and a single GPU.

With the above job script, you would be allocated whatever model of GPU is available. If you need a certain type of GPU, you can specify the model. For example. to request one Tesla V100, you would use –gpus-per-node=v100:1

The list of currently available GPUs is here.

Some newer GPU models are quite large and often cannot be fully utilized by software. So, they can be partitioned into pieces and shared by different users. These are called Multi-Instance GPUs (MIG). Current information on requesting a MIG can be found here.

Because models of GPUs vary significantly in their performance, the Alliance has created a measure of GPU power called Reference GPU Unit (RGU). One RGU is equivalent to one Tesla 40GB A100.

Exercises

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Advanced Research Computing using Digital Research Alliance of Canada Resources Copyright © by Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; and Andrew Schoenrock is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.