Download SRA sequences from Entrez search results
Obtain search results
Task: find RNA-Seq records for lymph node tissue in BALB/c mice in SRA Entrez
To learn how to use Advanced Search Builder please refer to Search in SRA
- In the Entrez search bar enter the query: ((("mus musculus"[Organism]) AND BALB/c*) AND "lymph*") AND "rna seq"[Strategy].
- To limit your search to only aligned data add to the above query AND aligned data"[Properties].
- Click the checkboxes next to records (experiments) to select data of interest. Leave all checkboxes unchecked to select all records (experiments) from your search.
Obtain run accessions
Run accessions are used to download SRA data. To download a list of Run accessions selected from your Entrez search (Example ):
- Click Send to on the top of the page, check the radiobutton File, select Accession List.
- Save this file in the location from which you are running the SRA Toolkit.
SraAccList.txt
is formatted like this:
SRR11192681
SRR11192682
SRR11192683
SRR11192684
Download sequence data files using SRA Toolkit
Downloading public data
Prefetch is a part of the SRA toolkit. This program downloads Runs (sequence files in the compressed SRA format) and all additional data necessary to convert the Run from the SRA format to a more commonly used format. Prefetch can be used to correct and finish an incomplete Run download.
Use this prefetch command to download the Runs from the previous example in SRA format.
One Run:
$ prefetch SRR000001
A list of Runs:
prefetch --option-file SraAccList.txt
fasterq-dump and sam-dump are also part of the SRA toolkit and can be used to convert the prefetched Runs from compressed SRA format to fastq or sam format. For example:
fasterq-dump --split-files SRR11180057.sra
You can also avoid the prefetch step and download and convert the Run in one step by entering just the Run accession
without the .sra
extension in your fasterq-dump or sam-dump command:
fasterq-dump --split-files SRR11180057
Downloading Original Submitted Files
SRA has deposited original submitted files into a cloud bucket accessible via the prefetch command if you wish to use those instead of dumping standardized data from the archive.
Please refer to Download SRA sequence data using Amazon Web Services (AWS)
An example prefetch command:
prefetch --type fastq SRR11180057
The --type
command allows you to specify the type of file to download. You can look up the file type of the original files in either SRA in BigQuery
or the Data Access tab on the Run Browser or use any
to get all available formats.
Downloading protected data
Download metadata associated with SRA data
From the search result page
SRA Run files do not contain any information about the metadata (sample information, etc.) linked to the data themselves.
To download metadata for each Run in your Entrez query click Send to on the top of the page, check the File radiobutton, and select RunInfo in pull-down menu.
This will generate a tabular SraRunInfo.csv
file with metadata available for each Run.
From Run Selector
A slightly different set of metadata can be downloaded in a tab-delimited file from Run Selector .
To download metadata for each Run in your Entrez query:
- Click Send to on the top of the page, check the Run Selector radiobutton, and click the button Go.
- If necessary, refine your results by using various filters provided by the Run Selector's interface.
- Click the RunInfo Table button. This will generate a tabular
SraRunTable.txt
file with metadata available for each Run.
Download sequence data from the Run Browser
Run Browser allows for limited download (one run at a time, containing less than 5 Gbases of sequence, over HTTP) of sequence data in fasta or fastq format.
Download example
- Open the selected run in the Run Browser .
- Click the FASTA/FASTQ download tab.
- Find certain reads by applying a Filter or leave the Filter field empty.
- Select the Run to download, optionally select Filtered or Clipped, then click the FASTA or FASTQ button to download data in that format.
Download SRA sequence data from the Cloud
Download SRA sequence data from the Cloud
Download Trace Archive Data
Trace Archive data now resides in SRA and can be searched in NCBI SRA Entrez like any other SRA data. To limit your search to capillary sequence data, use the Capillary option from the Platform section of the results facet list or add "platform capillary"[Properties] to your query. For example, to search for capillary data from the Organism "Mus musculus" use this search:
(Mus musculus[Organism] AND "platform capillary"[Properties])
Once Runs of interest are found, they can be accessed with the SRA toolkit as demonstrated above. Trace Archive data can be accessed in fasta format by direct query with TI numbers using an SRA web service which is described in more detail on our Access Trace Data page.
Contact SRA
Contact SRA staff for assistance at [email protected]