The Sequence Read Archive (SRA)
Introduction
The SRA is NIH's archive of high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them.
SRA Mission
The SRA is a publicly available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis.
Data Processing, Status and Release
SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes, which may contain human sequences. These data often utilize NIH controlled access via dbGaP (the database of Genotypes and Phenotypes). Data submitters need to determine if their data is suitable for public distribution or if it needs controlled access. For further information, consult with institutional review boards and NIH Genomic Data Sharing Policy https://sharing.nih.gov/genomic-data-sharing-policy.
It is the responsibility of submitting parties to ensure that they have appropriate consent for human sequence data to be distributed publicly without access controls. We encourage submitters to screen for and remove contaminating human reads from data files prior to submission. We also offer human contamination screening as a service available on request.
Following submission, data are subject to automated and manual processing to ensure data integrity and quality and are subsequently made available to the public. On rare occasions, data may be removed from public view. More details about this process can be found on the NLM GenBank and SRA Data Processing.
Contact SRA
Contact SRA staff for assistance at [email protected]