History
Until dbSNP build 151 (March 2018), a legacy RefSNP page (as shown below) was used to report the details of a RefSNP. Then after a whole new design of dbSNP database (since dbSNP build 152), a redesigned NCBI dbSNP refSNP Web Report page has replaced the old page to provide a more user-friendly interface. From the two images shown below, we can see the differences between the two versions of web report.
Classic (legacy) Site |
Current Site |
The rest of this tutorial will focus on the redesigned NCBI dbSNP RefSNP Web Report. The aim is to provide an overview of the page structure and help users to find the information they need. More details about RefSNP can be found here. Send us email with feedback about any problems you may encounter to [email protected].
RefSNP Report
dbSNP RefSNP Web Report is an interactive page for browsing the submitted and computed information for a single Reference SNP variant (RefSNP, rs).
There are a few major parts on the top of RefSNP web Report page, as labelled in the image below.
- (A) Search Box: The Search box is used to find and explore other RefSNP (rs) records. Currently the page only supports searching for exact rs identifiers, not for rs attributes. Enter the rs identifier (ie. 'rs268') and click 'Search'.
- (B) Summary: Summary section provides an overview of the variant, reports the allele in the forward orientation of the chromosome, and summary allele frequencies when available. Links to related records in other databases are listed in the right hand column
- (C) Download Link: The information in display is available in JSON format through the Download link at the upper right. That function is provided by the Variation Service API, and more information is available here
- (D) Tabs: The RefSNP report separates details of the variation into various categories and lists them in the horizontal tabs. The default tab is "Frequency".
- (E) Tab Content: The tab content is displayed beneath the tabs.
There are eight tabs in total and they are listed below. Each tab content will be introduced in more details later in this tutorial.
- Frequency
- Variant Details
- Clinical Significance
- HGVS
- Submissions
- History
- Publications
- Flanks
Summary
Summary contains the essential information about the variant in the form of tag-value pairs for Organism, Position, Alleles, Variation Type, Frequency, Clinical Significance, Gene and Consequence, Publications, and Genomic View. SNPs with LitVar information are also labeled on refsnp page, under the 'Publications' item.
Tag | Description |
Organism | Human, Homo sapiens |
Position |
Variant map position on the latest assembly, i.e. GRCh38.p7. The assembly version will also include the GRC patch version 'p7'. The different patch versions, p6, p7, etc. does not disrupt the position on the primary assembly version GRCh38. See the online documentation for further explanation of patches (https://www.ncbi.nlm.nih.gov/grc/help/patches/).
Detailed list of all positions is available in the Variant Details and Aliases tabs.
NOTE: A dbSNP RefSNP is a cluster of variants that have the same position and type when mapped to the preferred top-level sequence. The position is the deletion interval of the variant. It includes adjacent nucleotides in repeating regions when the variant is shiftable, that is, it includes all nucleotides that could be affected by the variant. This is what is displayed in the "Position" field in the 'Detail' section for a RefSNP. However, the web page displays alleles in HGVS notation. HGVS right-shifts shiftable variants, so HGVS positions can differ from the displayed common "Position". For example, consider inserting an A into a string of 5 A's that starts at position 11 in a sequence named SEQ1. The canonical SPDI would be SEQ:11:AAAAA:AAAAAA (1-based for illustration). So the RefSNP position would be 11-15. The corresponding HGVS expression using a right-shifted 1-based position of 15 would be SEQ1:g.15dup. |
Alleles | Reference and variant alleles for the rs. Detailed list of all alleles is available in the Variant Details and Aliases tabs. |
Variation type | One of the possible variation types: SNV : Single Nucleotide Variation; MNV : Multiple Nucleotide Variation; Insertion; Deletion; Indel : Insertion and Deletion; Identity; None. |
Frequency |
Minor Allele Frequency (MAF) listed as a percentage representing a fraction, where the numerator is the number of samples observed and the denominator is the total number of samples, as reported in a given study. Minor allele 'G' with a frequency of 0.0052 means that 'G' is observed 26 times among 5008 samples, as reported by the 1000 Genomes study. List of all reported frequencies for refrence and alternate alleles from various studies and populations is in the Frequency tab. |
Clinical Significance |
Link to ClinVar, if any clinical significance data exist there, or an indication that there are no clinical significance data available in ClinVar for that rs. If data are present in ClinVar, more links are available in the Clinical Significance tab. |
Gene : Consequence |
Gene symbol (ie. LPL) and molecular consequence based on RefSeq mRNA or protein annotations. Detailed list is available in the Variant Details tab. |
Publications |
Number of citations where this rs was mentioned. Detailed list is available in the Publications tab. |
Genomic View |
Link to a graphical view of the RefSNP in the latest genomic context along with RefSeq mRNA and protein. Use the zoom option within the Genomic View panel to inspect the nucleotides adjacent to the variant, and to see its neighbors. |
The frequency tab displays a table showing the reference and alternate allele frequencies reported across various studies and populations. The top table provides frequency data from the NCBI ALFA project, while the table below summarizes frequency data from all other projects in dbSNP, including 1000Genomes, GnomAD, and TOPMED.
In the ALFA frequency table:
- Population entries marked as "Global" refer to the entire study population, while those marked as "Sub" refer to specific population subgroupings (e.g., African, European, etc.).
- Ref Allele and Alt Allele indicate the reference and alternate allele frequencies for each population group.
- Sample Size gives the total number of chromosomes included in the analysis for each group.
- Ref HMOZ and Alt HMOZ show the homozygous counts for the reference and alternate alleles, respectively.
- HTRZ reflects the heterozygous counts within each population.
- HWE P-value (HWEP) is the Hardy-Weinberg Equilibrium p-value, providing insight into whether the allele frequencies deviate from what is expected under equilibrium assumptions.
Variant Details tab shows known variant placements on genomic sequences: chromosomes (NC_), RefSeqGene, pseudogenes or genomic regions (NG_), and in a separate table: on transcripts (NM_) and protein sequences (NP_).
The corresponding transcript and protein locations are listed in adjacent lines, along with molecular consequences from Sequence Ontology. When no protein placement is available, only the transcript is listed.
Column "Codon[Amino acid]" shows the actual base change in the format of "Reference > Alternate" allele, including the nucleotide codon change in transcripts, and the amino acid change in proteins, respectively, allowing for known ribosomal slippage sites. To view flanking nucleotides and neighbor rs use the Genomic View at the bottom of the page - zoom into the sequence until the nucleotides around the variant become visible.
Clinical Significance shows a list of clinical significance entries from ClinVar associated with the variation, per allele. Click on the RCV accession (i.e. ;RCV000001615.2) or Allele ID (i.e. 16589) to access the full ClinVar report.
HGVS tab displays HGVS names representing the variant placements and allele changes on genomic, transcript and protein sequences, per allele. HGVS name is an expression for reporting sequence accession and version, sequence type, position, and allele change.
The column "Note" can have two values: "diff" means that there is a difference between the reference allele (variation interval) at the placement reported in HGVS name and the reference alleles reported in other HGVS names, and "rev" means that the sequence of this variation interval at the placement reported in HGVS name is in reverse orientation to the sequence(s) of this variation in other HGVS names not labeled as "rev".
To limit the display to a sequence type or alias of interest use the “Filter” box as shown in the example below to select only desired HGVS aliases, based on NM sequences for mRNA.
Submissions tab displays variations originally submitted to dbSNP, and now supporting this RefSNP cluster (rs).
We display Submitter handle, Submission identifier, Date and Build number, when the submission appeared for the first time.
Direct submissions to dbSNP have Submission ID in the form of an ss-prefixed number (ss#). Other supporting variations are listed in the table without ss#.
The list of submitters can be long, so use the "Filter" box to find a given submitter (1000 Genomes).
History table is a listing of RefSNPs (Associated ID) from previous builds (Build) that now support the current RefSNP, and the dates, when the history was updated for each Associated ID (History Updated).
History listing can be long, so use the "Filter" box to limit the display to only selected identifiers based on Associated ID or History Date, or Build number.
PubMed articles citing the variation as a listing of PMID, Title, Author, Year, Journal, ordered by Year, descending.
Publication list can be long, so use the "Filter" box to limit the display to selected citations based on PMID, Title, Author, Year or Journal.
The flanks tab allows you to display flanking bases from the reference assembly. You can choose to display 25, 50, 100, or 200 bases upstream and downstream of the variant. Flanking bases are displayed in forward orientation.
Genomic regions, transcripts, and products
The "Genomic regions, transcripts, and products" section is the NCBI Graphical Sequence Viewer display of the genomic region, transcripts and protein products for the reported RefSNP (rs).
Visit the Sequence Viewer page for help with navigating inside the display and modifying the selection of displayed data tracks.
Use the zoom option to view the nucleotides around the RefSNP and find other neighboring RefSNPs.
Comparisons between new and old designs
To help users who are used to the classic RefSNP web page to make better use of the new design, comparisons between the new and old pages are summarized in a table below. In the table, screenshots of the above introduced sections are provided for both designs, and users can click on the thumbnails to see larger images.
Note that there is no 'History' section on the classic page, and for several other sections (clinical significance, aliases, and publication) in the new design, their counterparts are combined and included in the 'summary' section in the old design.
Section |
New Site |
Classic Site |
Summary | ||
Frequency | ||
Variant Details | ||
Clinical Significance | ||
HGVS | ||
Submission | ||
History | No History section | |
Publication |