Understanding the BBC Bioinformatics Facility Cluster (and SGE)

The Cluster

What is a Cluster? A desktop or a laptop, in most cases, is inadequate for analysis of large scale datasets (e.g. Genomics, transcriptomics) or to run simulations (e.g. protein dockings, weather forecast). They lack both the processing power and the RAM to carry out these analyses. This limitation can be overcome by joining many of these machines (computers) in a predefined architecture/configuration so that they act as a single unit with enhanced computational power and enabled to share resources. This is the basic concept of a high performance cluster. A cluster consists of a set connected computers that work together so that they can be viewed as a single system. Each computer unit in a cluster is referred as ‘node’.

The components of a cluster are usually connected through fast local area networks (“LAN”), with each node running its own instance of an operating system. Clusters improve performance and availability of a single computer. The benefits of clusters include low cost, elasticity and being able to run jobs anytime, anywhere.

Our Cluster The BBC Bioinformatics Facility Dell cluster consists of numerous nodes, each of which is a separate machine with its own hard drive and 2 processors, each having 4 cores. Normally, a single job requires one core, so a single node could run 8 separate analyses simultaneously. A job is always started on the head node, which is the node you are using when you log in to the cluster. Your job may or may not run on the head node; it is up to the scheduler software to decide where your job will run. The software that manages your run is known as the Sun Grid Engine, or SGE for short. It schedules jobs to run in a cluster environment and utilizes system resources in the most efficient way possible. SGE consists of several programs (qsub, qstat, etc.) that you will use to get your run started and check in on its progress.

Connecting to the Cluster

Xming/Xquartz: (optional)

In order to display graphics from the cluster, we need a software that allows one to use Linux graphical applications remotely. Xming and Xquartz are the display options available for windows and mac respectively. (Download and install it on your local computers)

Windows: Xming

Mac: Xquatz

NOTE: Start the X Server on your machine (Xming/Xquartz), each time you reboot your PC or whenever you want to use X Windows. Once enabled, Xming will appear in your system tray as a black X with an orange circle around the middle.

To log-in to the head node of the cluster run the command Mac or Linux terminal:

ssh your_username@bbcsrv3.biotech.uconn.edu

or

ssh –X your_username@bbcsrv3.biotech.uconn.edu

-X is to enable X11 forwarding to enable graphics display from cluster.

Putty Configuration

Windows users will need to use an SSH client to connect to the cluster. Please refer to the PuTTY guide on tutorial section.

Install Putty and configure for use.

Putty Configuration steps.

Open Putty it will open window1.

Provide host name e.g. your_username@bbcsrv3.biotech.uconn.edu
Expand SSH tab and select X11 (shown in window2)
Enable X11 forwarding by selecting it. (window2)
Scroll up the left panel and select Session.(window1)
Name your session e.g. BBC_cluster and click save tab to save.
Your session name should appear in saved sessions.

Double click on your session name to connect to server with SSH session.

SSH logins from outside the UConn domain SSH logins from outside the UConn domain (UConn-Storrs, UConn-Avery Point, UCHC) must now use the UConn VPN client prior to connecting through your usual SSH client.

Windows and OS X: Junos Pulse:

Please see http://remoteaccess.uconn.edu/vpn-overview/connect-via-vpn-client/ for instructions on setting up a Junos Pulse VPN client.

Linux: Openconnect (ex. using Ubuntu):

Step 1: Install required libraries and tools
sudo apt-get install vpnc
sudo chmod a+x+r -R /etc/vpnc
sudo apt-get install libxml2 libxml2-dev gettext make libssl-dev pkg-config libtool autoconf git


Step 2: Download, build and install openconnect
git clone git://git.infradead.org/users/dwmw2/openconnect.git
cd openconnect
autoreconf -iv
./configure
make
sudo make install


Step 3: Connect to the VPN over terminal
openconnect --juniper sslvpn.uconn.edu

Now openconnect should prompt you to enter your netID and password. Upon successful login, openconnect should confirm an ssl connection has been established. Once this happens, create a new terminal window and enter the ssh command you use to connect to the server.

BBC Cluster layout

From user perspective there are 5 important components and each one with specific properties.

Head Node (bbcsrv3): Head node is the gateway for the users to communicate with the cluster. The principal role of Head node is to provide management and job scheduling services to the cluster.
- NOTE: HEAD NODE SHOULD NOT BE USED FOR COMPUTATIONAL PURPOSES. IF CLUSTER BECOMES SLOW OR NON RESPONSIVE DUE TO JOBS RUNNING ON HEAD NODE, SUCH JOBS WILL BE TERMINATED WITHOUT WARNING.
Login Nodes and compute nodes: User can login on these nodes using ‘qlogin’ command and can have interactive sessions. They are ideal for interactive sessions for testing scripts. To use them type “qlogin’ at command prompt and “exit” when you want to end your session.
Fast connection directories: Your home directory, /common , /tempdata3 and /tempdata2 are mounted/connected on the cluster in a configuration such that they are optimized for fast data access by compute nodes and head node./$HOME : Your home directory is relatively small space and may be suffice only to hold small amount of data. Use it wisely.
- /common : This directory holds databases (http://bioinformatics.uconn.edu/databases/ ) and bioinformatics softwares. Avoid using it at all cost. It’s a limited space and is required for storing common resources.
- /tempdata3: This directory is designed for users to store their files while they are performing analysis on them.
  - NOTE: THIS IS MENT FOR TEMPORARLY HOLDING YOUR FILES WHILE THEY ARE USED IN ANALYSIS. THIS DRIVE IS NOT BACKED UP AND A CRASH MAY LEAD TO LOSS OF DATA BEYOND RECOVERY. IT IS WISE TO MOVE YOUR FILES AS SOON AS THEIR USE IS OVER.
- /tempdata2: This disk space smaller than the /tempdata3, but it can also can be used by users
/archive : This drive has big storage capacity and is backed up at two other centers. This is ideal for long term storage of data files and archiving. However due to its mount and network connection to cluster, accessing data is not very fast.SOLUTION: Copy your files to /tempdadta3 for analysis and after you are done copy results back to archive and clear space (on /tempdata3) for other users.

Cluster Etiquette

Never run anything on the head node of a cluster.
Keep track of what you are running and refrain from using all nodes at the same time.
Run and write your output files on scratch drive or /tempdata3. Then copy the file back to your home directory or /archive.

Monitoring User Quota

To check how much space is used/available on your home directory:

quota -u username

Interacting with SGE

Once a you logged on BBC cluster, by default you are directed to your home directory. At this point you are still on head node and any command or computation you perform is executed on head node, something that we do not recommended. For computing purpose user has two options

Running an interactive session (qlogin or qrsh).
Submitting a script (qsub).

Interactive session:

An interactive job starts when you log onto the system and ends when you log off. Even the activities on head node is an interactive session. In interest of all users, it is recommended that for all interactive sessions users should logon to nodes, with help of following commands

qlogin
qrsh

Once you type qlogin you get logged on to one of the nodes (e.g.: in it is compute-1-0 node)

$ qlogin
Your job 236491 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 236491 has been successfully scheduled.
Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host compute-1-0.local ...
Last login: Wed Feb  1 17:37:27 2017 from bbcsrv3.local
Rocks Compute Node
Rocks 6.1 (Emerald Boa)
Profile built 09:01 01-Nov-2016
Kickstarted 09:17 01-Nov-2016
[user@compute-1-0 ~]$

qrsh will connect to node (e.g. in this case it is compute-1-16 node)

$ qrsh
[user@compute-1-16 ~]$

‘exit’ command will move you back on head node.

If resources are available, the job gets started immediately. Otherwise, the user is informed about the lack of resources and the job gets aborted. Jobs are always passed onto the available executing hosts. Records of each job’s progress through the system are kept and reported when requested (qstat). qrsh is similar to qlogin by submitting an interactive job to the queuing system. However, qrsh establishes a rsh connection with the remote host. qlogin uses the current terminal for user I/O.

SSH directly into a specific node on the cluster

First, view the current load on the cluster with the qhost command. Then determine which nodes satisfy your computational needs (number of CPU, MEMTOT). Of those nodes, select the node that has the lowest current LOAD and MEMUSE.

After a node has been selected connect to the node with the terminal command:

ssh HOSTNAME # where HOSTNAME is the hostname of the node

For example, if the screenshot below represented the current load on the server, and a highpri queue was required, then compute-2-2 or compute-2-3 would be selected since they are both empty.

Batch Job A batch job requires little or no interaction between the user and the system. It consists of a predefined group of processing actions. If resources are available, job gets started immediately. Otherwise, the user is informed about the lack of resources and job gets abandoned.

Composing Scripts

A sample template of script and example scripts are at the end of this document to help you compose your own.

Some key points to consider:

Depending on the computational resource required by your software, one of the queues on the cluster has to be specified. The decision relies on the fact whether the task is CPU or memory intensive.

Software’s that creates large tmp files and hold them in RAM will often be memory intensive e.g. (k-mer estimation, transcriptome assemblies, genome assemblies etc). Other tasks are generally CPU intensive and specifying more cores will speed up the process (e.g, Bowtie, STAR, RSEM, SICKLE, TOPHAT etc).

Queue on cluster and their status:

$ qstat –g c

CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.41     78      0     54    132      0      0 
highmem.q                         0.44     64      0      0     64      0      0 
highpri.q                         0.65     64      0      0     64      0      0 
lowpri.q                          0.65      0      0      0     64     64      0 
molfind.q                         0.00      0      0     36     36      0      0

Column 1: Queue Name
Column 3: No. processors in use.
Column 5: Processors available to use.
Column 6: Total processors designated to the queue.

All queue: This is the default queue for a job submitted using qsub and when queue is not specified.

Highpri queue: This queue is not available to the general user as the nodes were purchased using funds from outside the BBC. The queue can be used to run priority jobs. Specify the use of this queue with the following option in an SGE script:

#$ -q highpri.q

The total memory available is 63 gigabytes on each compute node.

Lowpri queue: This queue is available for general use, runs on the same four nodes as the highpri.q, but can only run single-threaded jobs. In addition, jobs running in the lowpri.q can be preempted by jobs in the highpri.q. Specify the use of this queue with the following option in an SGE script:

#$ -q lowpri.q

The total memory available is 63 gigabytes on each compute node.

Highmem queue: This queue should be used if a job requires more memory and CPU than offered by the regular queue. Specify the use of this queue with the following option in an SGE script:

#$ -q highmem.q

Please note that this queue limits each user to a maximum of 16 threads. The total memory available is 505 gigabytes.

DETAILS OF NODES AND QUEUE ASSOCIATED

Submitted Jobs

NOTE: Please make sure that the software you are running supports multi-threading. Usually this information is available in manual. If it does not, then it will not matter how many cores you specify, the software will still be running on one core and other cores will be idle which could have been used by other user.

Once you have composed your script please submit it using qsub.

QSub qsub is used for both single and multiple node jobs:

qsub scriptname.sh

Job Status A submitted job will;

Wait in the queue
Will be executed
Be completed and leave the SGE scheduling system

qstat

This can be used if the job is in either (1) or (2) state. The command will inform the user if the job is still waiting or if it has started being executed. For state (1): SGE treats every node as a queue. If resources are unavailable, the job will not be executed. Once the resources do become available, the job will leave the queue and move on to state 2. qstat will provide information about the state of the job. qdel can be used to delete jobs from the queue. For state (2):

Once the job is submitted, check its status using qstat.

qstat      
# this will list the jobs submitted by the user and their status. Below is the example where qstat lists the 2 jobs submitted by user vsingh.

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236976 0.50500 dwnld_1    vsingh       r     02/14/2017 09:43:20 all.q@compute-1-3.local            1        
 237005 0.60500 rep_us     vsingh       r     02/14/2017 11:50:00 highpri.q@compute-2-3.local       16

Column 1: JobID a unqiue ID to identify the job
Column 3: Name of the Job
Column 4: Job owner’s username
Column 5: Status of the job.
           r  : Running
           R  : Restarted
           qw : Waiting in queue 
           t  : Transferring
           dr : Job ending
           s,S: Suspended
           eqw: Error queuing job
Column 6: Job submission time
Column 7: Queue in which job is running
Column 8: No of cores used by the job

Tip:

Obtaining error from a qsub job in ‘eqw’ status. This will help you troubleshoot the problem.

qstat –j jobID | grep error

This is used to monitor the job’s status (e.g. time and memory consumption)

qstat –j job_number

This will give time and memory consumed information For state (3):

qstat -j job_number | grep mem

This is the only command that may be able to tell you about the past jobs by referring to a database of past usage.

qstat -j job_number

The following are variations on the qstat command:

qstat -f   #full listing for all users

qstat -u username   #specific to user

qstat -f -u username   #detailed information

Job Removal This command will remove the specified jobs that are waiting to be run, or kill jobs that are already running.

qdel job number

List of jobs:

qdel job number1 job number2 job number3

All jobs running or queueing under a given username:

qdel -u username

Monitoring cluster usage

qhost

This command is used to view the load on SGE. The output includes the architecture (ARCH), the number of CPUs (NCPU), the current load (LOAD), the total memory (MEMTOT), the memory currently in use (MEMUSE) and the swap space (SWAPTO) for each node. To show the queues available for each node, add the parameter -q. To show the jobs currently running on each node, add the parameter -j.

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
bbcsrv3                 linux-x64       8 28.84   35.4G   28.0G    2.0G    2.0G
compute-1-0             linux-x64       8  3.39   31.5G  672.9M    2.0G    1.6M
compute-1-1             linux-x64       8 10.01   31.5G    1.5G    2.0G  410.8M
compute-1-10            linux-x64       8  3.55   31.5G   14.6G    2.0G    1.9G
compute-1-11            linux-x64       8  2.07   31.5G   15.6G    2.0G   14.4M
compute-1-12            linux-x64       8  6.06   31.5G    1.8G    2.0G   16.2M
compute-1-13            linux-x64       8  4.82   31.5G    2.8G    2.0G  143.9M
compute-1-14            linux-x64       8  1.09   23.6G  718.8M    2.0G   17.5M
compute-1-15            linux-x64       8  2.55   31.5G  853.1M    2.0G  314.4M
compute-1-16            linux-x64       8 10.07   31.5G    1.4G    2.0G   16.9M
compute-1-2             linux-x64       8 12.10   31.5G    1.8G    2.0G   16.7M
compute-1-3             linux-x64       8  4.77   31.5G    2.2G    2.0G   17.5M
compute-1-4             linux-x64       8  3.00   31.5G    7.3G    2.0G    2.0G
compute-1-5             linux-x64       8  4.57   31.5G    3.4G    2.0G   20.2M
compute-1-6             linux-x64       8  8.14   31.5G    1.9G    2.0G   16.4M
compute-1-7             linux-x64       8  1.02   31.5G  641.4M    2.0G   14.5M
compute-1-8             linux-x64       8  8.80   31.5G    1.9G    2.0G  559.3M
compute-1-9             linux-x64       8  1.12   31.5G  488.3M    2.0G   16.1M
compute-2-0             linux-x64      16  1.00   63.0G  998.2M    2.0G   13.0M
compute-2-1             linux-x64      16  1.00   63.0G  997.5M    2.0G   15.5M
compute-2-2             linux-x64      16  0.00   63.0G  846.3M    2.0G   14.4M
compute-2-3             linux-x64      16  7.19   63.0G    7.0G    2.0G   14.2M
compute-2-4             linux-x64      64 29.49  504.9G   46.9G    2.0G   33.3M
compute-3-0             linux-x64       4  0.02   15.7G  350.8M    2.0G     0.0
compute-3-1             linux-x64       4  0.00   15.7G  351.4M    2.0G     0.0
compute-3-3             linux-x64       4  0.00   15.7G  350.6M    2.0G     0.0
compute-3-4             linux-x64       4  0.00   15.7G  353.6M    2.0G     0.0
compute-3-5             linux-x64       4  0.00   15.7G  352.3M    2.0G     0.0
compute-3-6             linux-x64       4  0.00   15.7G  353.4M    2.0G     0.0
compute-3-7             linux-x64       4  0.00   15.7G  351.9M    2.0G     0.0
compute-3-8             linux-x64       4  0.00   15.7G  352.2M    2.0G     0.0
compute-3-9             linux-x64       4  0.01   15.7G  350.6M    2.0G     0.0

SSH directly into a specific node on the cluster

After a node has been selected connect to the node with the terminal command:

ssh HOSTNAME # where HOSTNAME is the hostname of the node

For example, if the screenshot below represented the current load on the server, and a highpri queue was required, then compute-2-2 or compute-2-3 would be selected since they are both empty.

Screen

Often a loss of connection or closing of the terminal terminates the process you were working on, e.g. moving or coping large files. Screen command is very useful to avoid such situations. Once the terminal is activated using screen command anything in that terminal stays running even if you log out.

screen                        # It will activate the terminal

screen –S test                # It will activate the screen and name it as test

ctrl + a + d                  # It detaches the screen and hide it.

screen –r                     # List of all detached screens.

screen –r test                # This will reattach the screen name test

screen –wipe test             # wipes a screen when it is dead

screen -S test -p 0 -X quit   # Terminates the screen with name test