Get Started in Athena
Setup
To get started with Athena, you will need an Amazon AWS account. Please follow the AWS-provided tutorial to become familiar with:
- Creating S3 buckets for saving your results
- Creating databases
- Creating tables
- Querying tables
Please make sure you create your bucket for saving results in the US-east-1 region.
We recommend using AWS Glue to create the tables from the bucket.
In order to create the tables, you need to include the S3 location of the metadata. SRA provides data in two different locations:
- Coronaviridae dataset in the AWS Public Dataset Program: s3://sra-pub-sars-cov2-metadata-us-east-1/
- Entire SRA metadata: s3://sra-pub-metadata-us-east-1
AWS Glue does have a small charge associated with it, based on the number of tables in the catalog and the amount
of time it takes to run the crawler to find all the datasets. The crawler charge will generally be less than $1.
With the AWS Glue Data Catalog, you can store up to a million objects for free. An object in the AWS Glue Data Catalog is a table, table version,
partition, or database. The first million access requests to the AWS Glue Data Catalog per month are free.
Alternatively, you can opt to manually create a database yourself
and add the tables.
You can find the table definitions here: SRA Cloud Based Table Definitions.
The table S3 locations
These data can be accessed using the command line interface with the --no-sign-request
option, see examples below.
For all SRA metadata
- metadata:
aws s3 ls s3://sra-pub-metadata-us-east-1/sra/metadata/ --no-sign-request
- taxonomy analysis:
aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis/ --no-sign-request
- tax analysis info:
aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis_info/ --no-sign-request
- taxonomy:
aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/taxonomy/ --no-sign-request
- kmer:
aws s3 ls s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/kmer/ --no-sign-request
For the Coronaviridae specific dataset
- annotated variations:
aws s3 ls s3://sra-pub-sars-cov2-metadata-us-east-1/annotated_variations/ --no-sign-request
Access methods
We recommend first using the Athena query editor to become familiar with writing queries before attempting to use the command line tools or client libraries.
Athena can be accessed through a web browser query editor:
https://console.aws.amazon.com/athena/.
Athena client library documentation is also available for reference if you plan to access it through the AWS CLI or one of the supported SDKs:
https://docs.aws.amazon.com/cli/latest/reference/athena/.
AWS command line tools can be downloaded and set up from here:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html.
Please see the AWS documentation for more information on these options.
Payment
The user pays to run queries against public data sets and for storage of results in S3. We recommend all users review the payment requirements for on-demand queries from Athena.
Engage
NCBI wants your feedback on SRA in the Cloud. Contact [email protected] with questions or if you would like to provide input on new functionality.