U.S. flag

An official website of the United States government

Access Trace Data

Introduction

Tack As a result of diminished usage and aging hardware infrastructure, the Trace Archive, which previously stored capillary sequence data submitted to NCBI, has been retired.

Capillary sequence data may still be submitted to the NCBI Sequence Read Archive (SRA) via Submission Portal.
All data previously stored in the Trace Archive has been transferred to SRA and may be found in Entrez using standard search terms or accessed directly by TI number.

Searching for Trace Archive data in Sequence Read Archive (SRA) Entrez

Trace data can be searched in Entrez and accessed like any other data housed in SRA. Users may search by Organism, Center Name, or other metadata terms in SRA Entrez. The key distinction in Entrez is that after entering a search, users should ensure the Capillary option is selected from the Platform section of the facet list on the left side of the results page.

Examples

Go to SRA Home page and

Find all capillary sequencing records

Type in the search box "platform capillary"[Properties]

Find capillary records for term 'cancer'

  1. Type "cancer" in the search field: cancer
  2. Click "Capillary" in Platform section of the facet list on the left side of the results page

Accessing Trace Archive data from SRA Entrez

Run Browser

If interested in a few specific records, users can click on individual entries from the Entrez results page to see a more detailed breakdown of the record metadata. From there users may click on a Run accession link in the 'Runs' section to see the record in Run Browser offsite image where individual reads can be viewed and the data can be downloaded manually in SRA normalized format.

Example

/Traces/sra/?run=SRR19207298 offsite image

Run Selector

Users preferring to look at and further refine a larger set of records can send the Entrez results to the Run Selector:

  • Click Send to on the top of the results page, check the Run Selector radiobutton, and click the button Go.
  • If necessary, refine your results by using various filters provided by the Run Selector's interface.
  • Click the Metadata button. This will generate a tabular SraRunTable.txt file with metadata available for each Run.

Example

Three selected runs in the Run Selector offsite image

Downloading Trace Archive data using SRA Toolkit

Once a user has one or more Run accessions of interest, the SRA Toolkit can be used to access the data in fasta or fastq format like any other SRA data. For example, fasta data can be obtained from any Run accessions with a toolkit command like:

fasterq-dump --fasta SRR9495649

For more information on installing and using the SRA Toolkit to access these data, please see

Accessing Trace Archive data by TI Number

For users that want to access specific Trace Archive records by TI number, SRA provides a service to access these data directly in fasta format that can be called from the command line. This web service can process one or more TI numbers simultaneously using a comma delimited list (eg: TI numbers 10, 20, and 30):

curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,30&retmode=text"

You can use retmode=raw or remove the retmode element entirely to access the data in .gz format:

curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,30&retmode=raw" > output_fasta.gz

If the list includes any invalid TI numbers, an error will be returned: "Unrecognized TIs were found". To have the service ignore bad TI numbers ('badTI' below) and return fasta records for all valid TIs in the request, add the ignore_bad_tis parameter:

curl "https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=10,20,badTI&ignore_bad_tis&retmode=text"

While there is no limit to the number of TIs that can be queried per request, using the syntax above can lead to an error if the length of the URL is too large: "414 Request-URI Too Long". To query a set of TIs that exceeds this length, please create a text file containing the TI numbers in a single, comma-delimited line, e.g.:

100001,100002,100003,100004,100005

To download the data in a gzipped fasta file for a list of TIs in a file named 'ti.txt' in the current working directory, the curl command is:

curl -F "ti=@./ti.txt" 'https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?retmode=raw&ignore_bad_tis=1' > ti.fasta.gz

Legacy Support for query_tracedb

In the past the query_tracedb script could be used to access Trace Archive data by TI number, and this will continue to function as all queries via this old method will be redirected to our new service automatically. The fasta output format will remain the same, ensuring that existing scripts or pipelines that use the query_tracedb tool should continue to function normally. However, features of the query_tracedb script other than fasta retrieval by TI number will no longer function or be supported and we recommend that users migrate to using the new service mentioned above as soon as possible.


Contact SRA

Contact SRA staff for assistance at [email protected]

Support Center

Last updated: 2022-06-15T17:33:03Z