SRA in the Cloud
Overview
Sequence Read Archive (SRA) data is available on the Google Cloud Platform (GCP) and Amazon Web Services (AWS) clouds. All publicly-available, unassembled read data and authorized-access human data are available for access and compute through these cloud providers.
Are you cloud-curious?
There are several benefits to working with SRA data in the cloud:
- Access to original submitted data files
- Faster download speed
- Unlimited concurrent downloads from our cloud buckets to your buckets
Accessing the SRA data in the cloud requires an instance to be setup.
You can perform cloud native search for data using Athena from AWS or BigQuery from Google. With Athena and BigQuery you can:
- Write your own SQL to search for your specific data sets
- Get search results in seconds, at very low cost
- Calculate statistics on the available data from SRA
- Access the data using multiple API libraries
Search for data
BigQuery (in the Google Cloud)
BigQuery provides fast, programmatic access to SRA metadata and supports a large collection of client libraries.
Athena (in AWS)
AWS provides fast, programmatic access to SRA metadata and supports a large collection of client libraries.
NCBI's Entrez search engine
Download/Access the Data
SRA Toolkit allows you to create next-generation sequencing files in your desired format and cloud bucket. You can also download originally-submitted files for some data sets.
To download dbGaP data from the cloud, you need to use both the most recent version of the SRA toolkit and a JWT file instead of the NGC file.
The Cloud Data Delivery service allows the delivery of files that are not accessible by the SRA Toolkit directly to your AWS and GCP bucket.
SRA on YouTube: Tutorials
Engage
NCBI wants your feedback on SRA in the Cloud. Contact [email protected] with questions or if you would like to provide input on new functionality.