Skip to content

Uploading Files Using Gsutil Tool

Timothy Tickle edited this page Mar 16, 2017 · 28 revisions

Our wizard is useful if you want to upload a couple, small files into the portal. Often primary data (eg. Fastq files) are many files or big files. To upload fastq files, it is easier to use more appropriate tools like the gsutil tool (a part of the Google Cloud SDK). This tool allows you to upload files directly into your study in the cloud. This tool does require some use of command line, but we are happy to help you get acclimated to the process. Let's get started!

Getting Ready

  1. Download Google Cloud SDK (which contains the Gsutil Tool). The following are the official install instructions for several operating systems. If you do not have administrative rights to your system, we found the following document's instructions for installing from a tar archive very useful. If you use these instructions, step 3 and 4 are optional.

  2. Have your data (eg.fastq.gz files) in a directory you can access with a command line console (terminal program on a Mac).

  3. Get your bucket id from the portal team. Email us at single_cell_portal@broadinstitute.org and we will set up a bucket for you! When you get your first bucket, you will also get an email about registering for FireCloud. Please register with FireCloud (this is the framework we are using to access our underlying cloud services). You are also welcome to access your studies and data through FireCloud; many advanced services can be found here for expert users. Read more here.

Let's Upload!

The following steps only need to be executed once. Afterward, as long as you are logged in to the same server with the same username, you will already be configured.

  1. Authenticate to get access to the portal project.

gcloud init

  1. Type 'Y' to login.

  2. If you are on your machine a page will be opened in your browser. If you are in a terminal on a server a URL will be given to open in a browser (paste this URL in a browser).

  3. Log in with the google account you use with the portal.

  4. Copy the code and paste it in your terminal.

  5. Select the project 'single-cell-portal'

The following commands will always be executed to load data.

  1. Move to your directory.

cd /the/path/to/your/directory/holding/your/data

  1. Copy a file to the bucket

gsutil cp file.fastq.gz gs://bucket_id

  1. Or copy all compressed fastq files to the bucket.

gsutil cp *.fastq.gz gs://bucket_id

  1. Done! Let us know you are complete and we will attach your data to the portal study.

Troubleshooting:

  1. If you get a syntax error for a *.py file on the gcloud init command, set up your system to point to a version of python 2.7.X. Use the following command with the path to your python install and try again.

export CLOUDSDK_PYTHON=/path/to/your/python_2.7.1-sqlite3-rtrees/bin/python

Clone this wiki locally