"

Chapter 2: The Digital Research Alliance of Canada

Introduction

This chapter will introduce the reader to the computational resources available from the Digital Research Alliance of Canada DRAC and look at how to use the shell on a remote Linux “supercomputer” to run intensive software applications and code. This chapter will include the following sections:

Frequently, research problems that use computing can outgrow the desktop or laptop computer where they started:

  • A statistics student wants to cross-validate their model. This involves running the model 1000 times – but each run takes an hour. Running on their laptop will take over a month!
  • A genomics researcher has been using small datasets of sequence data but soon will be receiving a new type of sequencing data that is 10 times as large. It’s already challenging to open the datasets on their computer – analyzing these larger datasets will probably crash it.
  • An engineer is using a fluid dynamics package that has an option to run in parallel. So far, they haven’t used this option on their desktop, but in going from 2D to 3D simulations, runtime has more than tripled and it might be useful to take advantage of that feature.

In all these cases, what is needed is access to more computers than can be used at the same time. Luckily, large scale computing systems – shared computing resources with lots of computers – are available at many universities, labs, or through national networks. These resources usually have more CPU (central processing unit) cores, which can operate at higher speeds, with more memory, more storage, and faster connections with other computer systems. They are frequently called “clusters”, “supercomputers” or resources for “high performance computing” (HPC). We will usually use the terminology of HPC and HPC cluster.

These HPC clusters are in fact comprised of many computer nodes. These nodes are computer servers, that is servers without a monitor or keyboard and packed into racks in large computer rooms. Most of the nodes are compute nodes, which are responsible for running your code and software. You do not directly use compute nodes. Instead, there are also some login nodes, which you do connect to using software that supports SSH server connections. From the login nodes, you can run commands that indirectly run jobs on the compute nodes. Jobs are big, resource intensive runs of code and software. The SLURM batch queue scheduling software provides the commands on the login nodes that manage compute nodes.

Using a cluster often has the following advantages for researchers:

  • Speed – With many more CPU cores, often with higher performance specs, than a typical laptop or desktop, HPC systems can offer significant speed up.
  • Volume – Many HPC systems have both the processing memory (RAM) and disk storage to handle very large amounts of data. Terabytes of RAM and petabytes of storage are available for research projects.
  • Efficiency – Many HPC systems operate a pool of resources that are drawn on by many users. In most cases when the pool is large and diverse enough the resources on the system are used almost constantly.
  • Cost – Bulk purchasing and government funding mean that the cost to the research community for using these systems is significantly less than it would be otherwise.
  • Convenience – Maybe your calculations just take a long time to run or are otherwise inconvenient to run on your personal computer. There’s no need to tie up your own computer for hours when you can use someone else’s instead.

The Digital Research Alliance (the Alliance) of Canada provides a variety of national clusters for Canadian academic researchers, their students and collaborators. We will now look at how to get access and use these particular clusters.

definition

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction to Advanced Research Computing using Digital Research Alliance of Canada Resources Copyright © by Jazmin Romero; Roger Selzler; Nicholi Shiell; Ryan Taylor; and Andrew Schoenrock is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.