Data Transfer

Please abide by the following guidelines while transferring data between local machine (laptop/desktop)/other cluster and Xanadu. The instructions are also available in this YouTube video https://youtu.be/1al1DRUG7dE . In order of preference, there are 3 possible ways to do so:

1)    GLOBUS file transfer:

This is the preferred method of data transfer when you have very large datasets (>1GB) to be transferred but can be used for smaller size files as well. A detailed instruction on how to transfer data using GLOBUS is available at https://bioinformatics.uconn.edu/wp-content/uploads/sites/15/2018/03/GlobusFileTransfer_Tutorial.pdf .

Transferring from a local machine to Xanadu

In case of data transfer from your local machine to Xanadu (UConn Health HPC), please install a globus mount point on the local machine with help of the instruction given in above weblink.  The tutorial has step-by-step instructions to help you set up a mount point and to transfer files. For more information please refer to https://www.globus.org/data-transfer.

Transferring from another HPC to Xanadu

This option is to transfer files between two HPC systems.  Xanadu mount point is named as “UConn Health HPC”.  Please ensure that the second HPC system also has a globous mount point and find its name.  Please follow the instructions listed here. Then select one mount point for each HPC system and then initiate file transfer as suggested in the tutorial.

Note:  GLOBUS does not preserve file attributes of the transferred files. We advise users to reinstate attributes on data transfer completion.

2)    scp (Secure Copy):

This method is advisable only if the data file size is less than 1GB.  We recommend tarring of files/directories before transfer and ensuring complete transfer with help off Md5sum strings (described in the tutorial).  Please do not initiate more than 5 transfers at a time. Use interactive session if the transfer is initiated from Xanadu cluster.

. The data transfer steps are reproduced here

Transferring Data Between Clusters:

The Data transfer process has two distinct steps,

(1) STEP 1: Formatting data which include compressing Data, md5sum strings etc

(2) STEP 2: Transfering the data

STEP 1

In order to transfer the data directory between clusters, it is a good practice to compress the directory and generate md5sum string for the compressed file before transferring. This string should match with md5sum value generated after transferring the file. This is to ensure complete and uncorrupted transfer of data. Steps are listed here,

If you are running these command on xanadu, please avoid running them on submit node. Please start an interactive session as

$ srun --pty bash

This will move you to one of the compute nodes. To check that you are on compute node simply execute

$ hostname

The output should be one of the compute nodes starting with shangrila or xanadu, if the output is xanadu-submit-ext or xanadu-submit-int then the interactive session is not initiated. Try again and if the problem persists please get in touch at cbcsupport@uconn.edu

(1) Compress the directory

data : Directory with data to be compressed
to compress execute the command.

$ tar -cvzf data.tar.gz data

data.tar.gz : Output compressed file.

(2) Create md5sum string

$ md5sum data.tar.gz

STEP 2

(3) Once the files are compressed, please use transfer.cam.uchc.edu VM to initiate a transfer using scp. The general syntax for using scp is

$ scp [source] [destination]

Transferring data (data.tar.gz)  from local machine  to Xanadu

$ scp path/to/data.tar.gz user_name@transfer.cam.uchc.edu:<Path/to/DestinationDir>

It will prompt you for password, this password is for the Xanadu cluster.

After transfer check the md5sum of the transferred file at the destination.

$ md5sum data.tar.gz

The value or string should match the string generated in step 2.
Decompress the file

$ tar -xvzf data.tar.gz

Once the transfer process is completed to exit from the interactive session simply execute

$ exit

 

Transfer between Clusters using ncftp:

To transfer the data to Xanadu from a ftps server you can use ncftp
Once you start an interactive session or using the transfer node.
eg:
> ncftp ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/

ncftp /refseq/release/complete > get complete.1.1.genomic.fna.gz
For more complete information please check the ncftp manual

 

If you have any questions please email at cbcsupport@uconn.edu