Data Download from Basespace (Illumina)

DO NOT FORGET TO CHANGE PERMISSION OF YOUR HOME DIRECTORY MENTIONED IN STEP 5-4 and 5-10

Majority of NGS sequencing facilities provide Illumina sequencing data to clients using Illumina’s cloud service Basespace. Following steps describe this transfer process.

Data-transfer from sequencing facility to clients basespace account.

  • Step1: The client creates an account in basespace (free service) and provide account details (email associated with basespace account) to sequencing facility.
  • Step2: sequencing facility either shares data or transfers the ownership to the client using clients email ID used to create basespace account. Client gets an email notification of the event.
  • Step3: client logs in basespace account and accept the ownership of data and the data is transferred to clients account. To check if the data is transferred successfully to your account see the step4 of data download section ibelow.

Data download (Downloading data from users basespace account)

Data download can be done at command line interface or with a script. For command line interface, Please use interactive sessions (qlogin on BBC and srun –qos=general –pty bash on Xanadu).

  • Step 4: (Executed on Basespace website) Please login in your Basespace account and make sure that you have data files in your account related to your project.

Select project and then samples to see the available datasets.

  • Step5: (To be executed on HPC/Cluster)
    1. Login in your HPC/cluster account.
    2.  Create following two directories in your home directory with command
      $ mkdir basespace       #(Following mounting of this directory, contents of your basespace account can be seen in this directory )
      $ mkdir dinosaur        #(replace dinosaur with your project name, This is the directory where data will be transferred into from Basespace)
    3. Check available modules using $ module avail and look for basemount/0.13.3.1573 (preferred) /. We will use basemount in this tutorial. Load module using the command
      $ module load basemount/0.13.3.1573
    4.  Change permissions to allow mount and unmount of basespace directory. Use command
      $ chmod 755 basespace
      $ basemount basespace

      this will give a output similar to one shown below.

    5.  Copy the URL from your output shown in the example in red box and paste in browser. (Do not copy from the example above, copy from your output !!!) This will open up the Basespace login window. Login using your credentials and that will authenticate the connection. It may not ask for authentication in future sessions.
    6. Your datasets (fastq.gz files) are now available in
      basespace/Projects/$PROJECT_NAME/Samples/$SAMPLE_NAME/Files/

      The directory names changes based on projects and samples that were sequenced, so replace them with appropriate project and sample name.e.g.

      basespace/Projects/dinosaur/Samples/trex/Files/

      in here they were replaced as

      $PROJECT_NAME=dinosaur (ProjectName)
      $SAMPLE_NAME=trex	(Sample Name)
    7. Next step is to copy fastq.gz files to local directory, here directory dinosaur which we created in step5-3. To do so we will use scp command
      $ cd basespace/Projects/dinosaur/
      $ sample_list=$(ls)
      $ for each in $sample_list
       > do
      > mkdir ~/dinosaur/$each
      > scp basespace/Projects/dinosaur/ > Samples/$each/Files/* ~/dinosaur/$each/
      > done

      Once this is completed, Check to ensure the transfer.

    8. Next step is to unmount with the following set of commands
      $ chmod 755 ~
      $ basemount --unmount basespace
      $ chmod 700 ~

If you want to run it as a part of script: Please compose the script as (add computational resources header appropriate to the cluster)
Here the computational resources header is for xanadu (SLURM).
________________________________________________________________________

#!/bin/bash
# Submission script for Xanadu
#SBATCH --job-name=Basespace_dwnld
#SBATCH --mail-user=first.last@uconn.edu
#SBATCH --mail-type=ALL
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
module load basemount/0.13.3.1573
PROJECT_NAME=$1
mkdir -p basespace
chmod 755 basespace
mkdir $PROJECT_NAME  
basemount basespace
cd basespace/Projects/$PROJECT_NAME/
sample_list=$(ls)
for each in $sample_list
do
   mkdir ~/$PROJECT_NAME/$each
   scp basespace/Projects/$PROJECT_NAME/Samples/ $each/Files/* ~/$PROJECT_NAME/$each/
done
chmod 755 ~
basemount --unmount basespace
chmod 700 ~

***This is a single continuous command line
Save the script as basespace_download.sh or any other appropriate name.
Run the script on xanadu

$ SBATCH basespace_download.sh $PROJECT_NAME

As an example $PROJECT_NAME=dinosaur. So the command will be

 SBATCH basespace_download.sh dinosaur