-
Notifications
You must be signed in to change notification settings - Fork 25
Uploading Files Using Gsutil Tool
Our wizard is useful if you want to upload a couple, small files into the portal. Often primary data (eg. Fastq files) are many files or big files. To upload fastq files, it is easier to use more appropriate tools like the gsutil tool (a part of the Google Cloud SDK). This tool allows you to upload files directly into your study in the cloud. This tool does require some use of command line, but we are happy to help you get acclimated to the process. Let's get started!
-
Download Google Cloud SDK (which contains the Gsutil Tool). The following are the official install instructions for several operating systems. If you do not have administrative rights to your system, we found the following document's instructions for installing from a tar archive very useful. If you use these instructions, step 3 and 4 are optional.
-
Have your data (eg.fastq.gz files) in a directory you can access with a command line console (terminal program on a Mac).
3a. Get your bucket id from the portal (if you have already created a study):
-
Log in, and select 'My Studies' from the profile dropdown menu.
-
Click 'Show Details' next to the study you want to upload data to:
-
At the top of the page, your bucket id is listed in the 'General Info & Defaults' panel:
3b. Or, you can email us at scp-support@broadinstitute.zendesk.com and we will set up a bucket for you! When you get your first bucket, you will also get an email about registering for Terra (Firecloud). Please register with Terra (Firecloud) (this is the framework we are using to access our underlying cloud services). You are also welcome to access your studies and data through Terra (Firecloud); many advanced services can be found here for expert users. Read more here.
The following steps only need to be executed once. Afterward, as long as you are logged in to the same server with the same username, you will already be configured.
-
Authenticate to get access to the portal project.
gcloud auth login
-
Type 'Y' to login.
-
If you are on your machine a page will be opened in your browser. If you are in a terminal on a server a URL will be given to open in a browser (paste this URL in a browser).
-
Log in with the google account you use with the portal.
-
NOTE: You will most likely be asked to pick a project to use with your session. Since you will not be able to see the Single Cell Portal GCP project, please select any existing project you have, or create a new one. This will not impact your ability to upload files to your bucket.
-
Copy the code and paste it in your terminal.
-
Move to your directory.
cd /the/path/to/your/directory/holding/your/data
-
Copy a file to the bucket
gsutil cp file.fastq.gz gs://bucket_id
-
Or copy all compressed fastq files to the bucket.
gsutil cp *.fastq.gz gs://bucket_id
-
Once this is complete, log back into the portal, click 'My Studies' in your profile menu, and then click 'Sync' next to the study from above. This will let you 'synchronize' the data in the bucket to the study.
- If your local file is uncompressed and you wish to compress the file while stored in your google bucket:
gsutil cp -Z file.fastq gs://bucket_id
Note that the file will be automatically be decompressed upon download therefore the file name should not include ".gzip" or ".gz"
- If you get a syntax error for a *.py file on the gcloud init command, set up your system to point to a version of python 2.7.X. Use the following command with the path to your python install and try again.
export CLOUDSDK_PYTHON=/path/to/your/python_2.7.1-sqlite3-rtrees/bin/python