Access Trace Data
Introduction
Capillary sequence data may still be submitted to the NCBI Sequence Read Archive (SRA) via
Submission Portal.
All data previously stored in the Trace Archive has been transferred to SRA and may be found in Entrez using standard search terms or accessed directly by TI number.
Searching for Trace Archive data in Sequence Read Archive (SRA) Entrez
Trace data can be searched in Entrez and accessed like any other data housed in SRA. Users may search by Organism, Center Name, or other metadata terms in SRA Entrez. The key distinction in Entrez is that after entering a search, users should ensure the Capillary option is selected from the Platform section of the facet list on the left side of the results page.
Examples
Go to SRA Home page and
Find all capillary sequencing records
Type in the search box "platform capillary"[Properties]
Find capillary records for term 'cancer'
- Type "cancer" in the search field: cancer
- Click "Capillary" in Platform section of the facet list on the left side of the results page
Accessing Trace Archive data from SRA Entrez
Run Browser
If interested in a few specific records, users can click on individual entries from the Entrez results page to see a more detailed breakdown of the record metadata. From there users may click on a Run accession link in the 'Runs' section to see the record in Run Browser where individual reads can be viewed and the data can be downloaded manually in SRA normalized format.
Example
Run Selector
Users preferring to look at and further refine a larger set of records can send the Entrez results to the Run Selector:
- Click Send to on the top of the results page, check the Run Selector radiobutton, and click the button Go.
- If necessary, refine your results by using various filters provided by the Run Selector's interface.
- Click the Metadata button. This will generate a tabular
SraRunTable.txt
file with metadata available for each Run.
Example
Three selected runs in the Run Selector
Downloading Trace Archive data using SRA Toolkit
Once a user has one or more Run accessions of interest, the SRA Toolkit can be used to access the data in fasta
or fastq
format like
any other SRA data.
For example, fasta
data can be obtained from any Run accessions with a toolkit command like:
fasterq-dump --fasta SRR9495649
For more information on installing and using the SRA Toolkit to access these data, please see
Accessing Trace Archive data by TI Number
For users that want to access specific Trace Archive records by TI number, SRA provides a service to access these data directly in fasta
format that can
be called from the command line. This web service can process one or more TI numbers simultaneously using a comma delimited list (eg: TI numbers 10, 20, and 30):
curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,30&retmode=text"
You can use retmode=raw
or remove the retmode
element entirely to access the data in .gz
format:
curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,30&retmode=raw" > output_fasta.gz
If the list includes any invalid TI numbers, an error will be returned: "Unrecognized TIs were found".
To have the service ignore bad TI numbers ('badTI' below) and return fasta records for all valid TIs in the request, add the ignore_bad_tis
parameter:
curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,badTI&ignore_bad_tis&retmode=text"
While there is no limit to the number of TIs that can be queried per request, using the syntax above can lead to an error if the length of the URL is too large: "414 Request-URI Too Long". To query a set of TIs that exceeds this length, please create a text file containing the TI numbers in a single, comma-delimited line, e.g.:
100001,100002,100003,100004,100005
To download the data in a gzipped fasta file for a list of TIs in a file named 'ti.txt' in the current working directory, the curl command is:
curl -F "ti=@./ti.txt" 'https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?retmode=raw&ignore_bad_tis=1' > ti.fasta.gz
Legacy Support for query_tracedb
In the past the query_tracedb script could be used to access Trace Archive data by TI number,
and this will continue to function as all queries via this old method will be redirected to our new service
automatically. The fasta output format will remain the same, ensuring that existing scripts or pipelines that use the
query_tracedb tool should continue to function normally. However, features of the query_tracedb script other than fasta
retrieval by
TI number will no longer
function or be supported and we recommend that users migrate to using the new service mentioned above as soon as possible.
Contact SRA
Contact SRA staff for assistance at [email protected]