Introduction
The National Center for Biotechnology Information (NCBI) creates and maintains a set of databases that archive, process, display and report information related to human germline and somatic variants. These databases, primarily the Database of Short Genetic Variations (dbSNP) and the Database of Genomic Structural Variations (dbVar) represent almost 2 billion submitted human variants. The primary roles of both databases are to process submissions, archive the data, annotate on the genome and NCBI Reference Sequences (RefSeqs), and distribute it worldwide. The data is important for studying the basis of human diseases to improve diagnosis, treatment, and prevention and for research in a variety of fields such as species diversity, evolution, and conservation. Submission is accepted in various formats including VCF for reporting numerous variations generated by high-throughput sequencing (HTS) projects over multiple populations, as well as a wide variety of associated data including genotype and allele frequency data. Each submitted variant is assigned a database identifier (ss# in dbSNP or nsv#/esv# in dbVar) for citing in publications, allow cross-reference to other databases and linking to related data, facilitate annotation, and promote data exchange. These submissions are then processed to aggregate information from multiple submitters (rs# in dbSNP) and to calculate locations and functional consequences on RefSeqs and to integrate with other NCBI resources including Gene, PubMed, Nucleotide, Protein, and Genome. dbSNP and dbVar data are updated during regular build cycle with annotations on new assemblies and RefSeqs and the data distributed in diverse ways: Entrez searches, study-specific reports, annotation on the genome, Sequence Viewer, and FTP downloads as BED, VCF, and other.
NIH Genomic Data Sharing
If you're funded by NIH please consider complying with NIH Genomic Data Sharing (GDS) policy which takes effect on January 25, 2015. If you're NOT funded by NIH we would still hope you follow the spirit of the policy and submit to dbSNP or dbVar which are trusted GDS repository partners.
The table below highlight features of dbSNP and dbVar and their differences.
Database |
dbSNP |
dbVar |
Description |
The SNP database (commonly known as dbSNP) contains short human nucleotide variations. |
The dbVar database contains large human genomic structural variation data generated mostly by published studies. Variants typically have lengths longer than 50 nucleotides(in contrast to dbSNP). |
Variation Type |
Small variations (<= 50bp)
|
Large variations (> 50bp)
|
Accession |
|
|
Data Aggregation |
Data by RS:
|
Data by SV and SSV:
|
Linked Resources |
|
|
Annotation |
|
|
Access Policy |
Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP. |
Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP. |
Submission Guidelines |
https://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html |
|
Submission Limitations |
dbSNP and dbVar DO NOT accept:
|
|
Contact |