Search in Athena
Overview
SRA has deposited its metadata into Athena to provide the bioinformatics community with programmatic access to this data.
You can now search across the entire SRA by sequencing methodologies and sample attributes. NCBI provides help users leverage
the benefits of elastic scaling and parallel execution of queries.
Athena has a large collection of client libraries that can be used within your workflow. You can also interact with it on a web browser.
The Athena resource contains tables for SRA metadata and computed metadata on SRA runs. It also contains metadata on SRA aligned reads, including taxonomic content and BLAST results.
Tables
The list of SRA-cloud-based tables can be found here: SRA cloud-based tables.
Please read about the SRA Taxonomy Analysis Tool to learn how the analysis is carried out.
The Basics of SQL
The basic SQL query has three parts or statements:
SELECT
: Identifies which columns from the selected table(s) to show. The*
indicates "all columns"FROM
: Identifies table(s) to queryWHERE
: Joins tables using the identical columns in both tables and sets filters on the query
In Athena, the table name (eg:. metadata) is defined by NCBI but the database name (<db_name>
in all examples)
is defined by the user. This name is chosen at the time you create the Glue crawler or manually create the database.
For all queries the <db_name>
tag should be replaced with the name chosen for your local database.
Basic example query
Select all columns (indicated by '*') from the table called <db_name>.metadata that have the organism value "Homo sapiens"; the results are limited to the first 10 hits by limit 10 (this can be removed from any example to get the full result set instead of just 10).
FROM <db_name>.metadata
WHERE organism = 'Homo sapiens'
limit 10
Example queries for web UI
Search for records of the pipefish:
FROM "<db_name>"."metadata"
WHERE 'Syngnathus scovelli'
limit 10
Find all the public human data sets:
FROM "<db_name>"."metadata"
WHERE organism = 'Homo sapiens' AND consent='public'
limit 10
Build a local taxonomic tree by ordering the data based on ileft and ilevel for a metagenomic data set:
FROM "<db_name>"."tax_analysis"
WHERE acc = 'SRR2046458' ORDER BY ileft, ilevel
Search for SRA Runs by taxonomic name:
FROM "<db_name>"."tax_analysis"
WHERE name = 'Sarbecovirus' AND total_count > 1
limit 10
Find all SRA aligned read contigs that have taxonomy ID 2697049 (SARS-CoV-2), have coverage of at least 100x and a total length greater than 15,000 bases:
FROM "<db_name>"."contigs"
WHERE where tax_id = '2697049' AND coverage > 100 and length > 15000
limit 10
Find all SRA aligned read contigs' BLASTn hits where the taxonomy ID of the hit is 2697049 (SARS-CoV-2), the percent identity is greater than 99% and the hit length is at least 25,000 bases:
FROM "<db_name>"."blastn"
WHERE tax_id = '2697049' AND pident > 99 and length > 25000
limit 10
Contact SRA
Contact SRA staff for assistance at [email protected]