NCBI

Introduction

The National Center for Biotechnology Information (NCBI) creates and maintains a set of databases that archive, process, display and report information related to human germline and somatic variants. These databases, primarily the Database of Short Genetic Variations (dbSNP) and the Database of Genomic Structural Variations (dbVar) represent almost 2 billion submitted human variants. The primary roles of both databases are to process submissions, archive the data, annotate on the genome and NCBI Reference Sequences (RefSeqs), and distribute it worldwide. The data is important for studying the basis of human diseases to improve diagnosis, treatment, and prevention and for research in a variety of fields such as species diversity, evolution, and conservation. Submission is accepted in various formats including VCF for reporting numerous variations generated by high-throughput sequencing (HTS) projects over multiple populations, as well as a wide variety of associated data including genotype and allele frequency data. Each submitted variant is assigned a database identifier (ss# in dbSNP or nsv#/esv# in dbVar) for citing in publications, allow cross-reference to other databases and linking to related data, facilitate annotation, and promote data exchange. These submissions are then processed to aggregate information from multiple submitters (rs# in dbSNP) and to calculate locations and functional consequences on RefSeqs and to integrate with other NCBI resources including Gene, PubMed, Nucleotide, Protein, and Genome. dbSNP and dbVar data are updated during regular build cycle with annotations on new assemblies and RefSeqs and the data distributed in diverse ways: Entrez searches, study-specific reports, annotation on the genome, Sequence Viewer, and FTP downloads as BED, VCF, and other.

NIH Genomic Data Sharing

If you're funded by NIH please consider complying with NIH Genomic Data Sharing (GDS) policy which takes effect on January 25, 2015. If you're NOT funded by NIH we would still hope you follow the spirit of the policy and submit to dbSNP or dbVar which are trusted GDS repository partners.

The table below highlight features of dbSNP and dbVar and their differences.

Database	dbSNP https://www.ncbi.nlm.nih.gov/snp	dbVar https://www.ncbi.nlm.nih.gov/dbvar/
Description	The SNP database (commonly known as dbSNP) contains short human nucleotide variations.	The dbVar database contains large human genomic structural variation data generated mostly by published studies. Variants typically have lengths longer than 50 nucleotides(in contrast to dbSNP).
Variation Type	Small variations (<= 50bp) Single nucleotide variation (SNV) Short multi-nucleotide changes (MNV) Small deletions or insertions retrotransposable element insertions	Large variations (> 50bp) Copy number Variants (CNV) Large deletions and insertions Inversions Translocations Mobile elements More…
Accession	Submitted SNP (ss#) – submitted variant based on asserted location or flanking sequences Reference SNP(rs#) - Non-redundant set of variations based on clustering of SS’es of same variant type and sequence position (More).	Study (std#) - unit of submission, usually corresponds to the data output of a publication Variant call (ssv#) - all independent experimental observations of structural variation Variant region (sv#) - regions of the genome containing aggregated structural variation, i.e. calls
Data Aggregation	Data by RS: Submitted SNP (ss) information Submitter contact and publications Variation Data – alleles, genotype, and frequency Experimental methods and conditions Genomic positions on different assembly versions ClinVar clinical assertions	Data by SV and SSV: Submitter contact and publications Method Genotype and Frequency Genomic positions on different assembly versions ClinVar clinical assertions
Linked Resources	ClinVar dbGaP BioProject BioSample Gene PubMed Genome Nucleotide Protein Taxonomy External collaborators	ClinVar dbGaP BioProject BioSample Gene PubMed Genome Nucleotide Protein Taxonomy External collaborators
Annotation	RS are annotated on all available latest genomic assemblies and RefSeq sequences (mRNA, Protein, and RefSeqGene)	SV and SSV are annotated on all available latest genomic assemblies
Access Policy	Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP. WEB:https://www.ncbi.nlm.nih.gov/snp/ FTP:ftp://ftp.ncbi.nih.gov/snp/	Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP. WEB:https://www.ncbi.nlm.nih.gov/dbvar/ FTP:ftp://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/
Submission Guidelines	https://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html	https://www.ncbi.nlm.nih.gov/dbvar/content/submission/
Submission Limitations	dbSNP and dbVar DO NOT accept: Synthetic mutations Variations ascertained from cross-species alignments and analysis Personal human data due to current NIH policy unless the participant is enrolled in a study with institutional oversight Bacterial variant sequences which can be submitted to SRA (https://www.ncbi.nlm.nih.gov/sra/) or as alignments to GenBank PopSet (http://www.ncbi.nlm.nih.gov/popset/) Human variations with an asserted relationship to disease or other phenotypes. These should be submitted to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/docs/submit/). However, dbSNP and dbVar staff members will help broker such submissions.
Contact	[email protected]	[email protected]

Medical Genetics and Human Variation

Tagline for the app