NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Search Field Descriptions for Sequence Database

, M.L.S.
Kevric Corp
, Ph.D.

Created: ; Last Update: July 5, 2024.

Estimated reading time: 4 minutes

Table 1.

Fields available for Nucleotide and Protein Sequence Databases.

Search FieldShort Field SpecifierDefinition
[Accession] [ACCN] The accession number assigned by NCBI.

Examples:

AF123456[ACCN] Nucleotide
NP_000240[ACCN] Protein
[All Fields] [ALL] All terms from all search fields in the database.

Example:

human[All Fields] Nucleotide Protein

(Compare with human[Organism], see [Organism] entry in this table.)
[Author] [AU]
[AUTH]
All authors from all references in the records. The format is last name [space] first initial(s), without punctuation.

Example:

venter jc[AUTH] Nucleotide Protein
[EC/RN Number] [ECNO] Enzyme Commission (EC) number for an enzyme activity.

Example:

5.3.1.9[ECNO]) Protein Nucleotide
(glucose-6-phosphate isomerase)
[Feature Key]
(Nucleotide, Protein, GSS)
[FKEY] Biological features listed in the Feature Table of the sequence records.

Examples:

3 utr[FKEY] Nucleotide
nonstdres[FKEY] Protein

The GenBank feature table definition has more information on available features.
[Filter] [FILT]
[SB]
Filtered subsets of the database. An important kind of filter is based on the presence of links to other records. Other filters create useful subsets of data such as those set as Filters in the Discovery column of search results

Examples:

Links

nucleotide_protein[Filter] Nucleotide
protein_structure[Filter] Protein

Organism or properties subsets

all[filter] Nucleotide Protein
mrna[filter] Nucleotide
refseq[filter] Nucleotide Protein
mammals[filter] Nucleotide Protein
[Gene Name] [GENE] Gene names annotated on database records. For NCBI Reference Sequences, these names correspond to official nomenclature guidelines when possible. Submitters provide the gene names on GenBank/GenPept records. Gene names on submitted records may be historical names or vary from official guidelines for other reasons.

Example:

BRCA1[GENE] Nucleotide Protein
[Bioproject] [BPRJ] The numeric unique identifier for the BioProject that produced the sequence records.

Examples:

13139[Bioproject] Nucleotide Protein
(Oryza sativa Japonica)

21117[Bioproject] Nucleotide
(Pelagic Microbial Assemblages in the Oligotrophic Ocean)
[Issue] [ISS] The issue number of the journals cited on sequence records, not generally useful in sequence databases.
[Journal] [JOUR] The name of the journals cited on sequence records. Journal names are indexed in the database in abbreviated form although many full titles are mapped to their abbreviations. Journals are also indexed by their by International Standard Serial Number (ISSN).

Examples:

proceedings of the national academy of sciences of the united states of america[Journal] Nucleotide Protein
Proc Natl Acad Sci U S A[Journal] Nucleotide Protein
0027-8424[Journal] Nucleotide Protein
[Keyword] [KYWD] Keywords applied by submitter or from controlled vocabularies applied by NCBI or other databases. Except for specific kinds of records, such as the examples given below, the terms in this index are not well controlled. This field is unpopulated for many GenBank/GenPept records.

Examples:

BARCODE[KYWD] Nucleotide Protein
HTG[KYWD] Nucleotide
RefSeqGene[KYWD] Nucleotide
WGS_MASTER[KYWD] Nucleotide
[Modification Date] [MDAT] The date of most recent modification of a sequence record. The date format is YYYY/MM/DD. Only the year is required. The Modification Date is often used as a range of dates. The colon ( : ) separates the beginning and end of a date range.

Examples:

2023/01/08[MDAT] Nucleotide Protein
1995/09[MDAT] Nucleotide Protein
2022/01:2023/12/31[MDAT] Nucleotide Protein
[Molecular Weight]
(Protein only)
[MOLWT] The molecular weight in Daltons of the protein chain calculated from the amino acids only. This may not correspond to the molecular weight of the protein obtained from biological samples because of incomplete data or post-translational modifications of the protein in living systems. The colon ( : ) separates the beginning and end of a molecular weight range.

Examples:

3039[MOLWT] Protein
25000:75000[MOLWT] Protein
[Organism] [ORGN] The scientific and common names for the complete taxonomy of organisms that are the source of the sequence records.This vocabulary includes all available nodes in the NCBI taxonomy database.

Examples:

cellular organisms[ORGN] Nucleotide Protein
firmicutes[ORGN] Nucleotide Protein
human[ORGN] Nucleotide Protein
Escherichia coli O157:H7[ORGN] Nucleotide Protein
[Page Number] [PAGE] The page numbers of the articles that are cited on the sequence record, not generally useful in sequence databases.
[Primary Accession] [PACC] The primary accession number of the sequence record. This is the first one appearing on the ACCESSION line in the GenBank/GenPept format. Many records have additional secondary accessions representing records that have been merged. The Accession field indexes both primary and secondary accessions.

Examples:

U01317[PACC] Nucleotide
M18047[PACC] Nucleotide
(Compare: M18047[ACCN] Nucleotide, see [Accession] entry in this table.)
[Primary Organism] [PORGN] The primary organism when there is more than one source organism.

Examples:

human[PORGN] Nucleotide
(Compare with human[ORGN] Nucleotide, see [Organism] entry in this table.)
[Properties] [PROP] Molecular type, source database, and other properties of the sequence record. Terms indexed for this field are a useful classification system for sequence records.

Examples:

Molecule type

biomol_ncrna[PROP] Nucleotide
biomol_genomic[PROP] Nucleotide
biomol_mrna[PROP] Nucleotide

Cellular location

gene_in_genomic[PROP] Nucleotide Protein
gene_in_mitochondrion[PROP] Nucleotide Protein
gene_in_plastid[PROP] Nucleotide Protein

GenBank division

gbdiv_htg[PROP] Nucleotide
gbdiv_vrt[PROP] Nucleotide Protein

(These GenBank division queries must be combined with srcdb_genbank[PROP] to retrieve only GenBank records.)

Database source

srcdb_genbank[PROP] Nucleotide Protein
srcdb_ddbj/embl/genbank[PROP] Nucleotide Protein
srcdb_refseq[PROP] Nucleotide Protein
srcdb_pdb[PROP] Nucleotide Protein
srcdb_swiss-prot[PROP] Protein
[Protein Name] [PROT] The names of protein products as annotated on sequence records. The content of this field is not well controlled for GenBank/GenPept records and may contain inaccurate or incomplete information.

Examples:

aldolase[Protein Name] Nucleotide Protein
[Publication Date] [PDAT] The date that records were made public in Entrez. The date format is YYYY/MM/DD. The colon ( : ) separates the beginning and end of a date range.

Examples:

2023/01/08[PDAT] Nucleotide Protein
1995/09[PDAT] Nucleotide Protein
2022/01:2023/12/31[PDAT] Nucleotide Protein
[SeqID String] [SQID] The NCBI identifier string for the sequence record. This is a brief structured format used by NCBI software.

Example:

gnl asm gca 000000215 2 chr3 45328308[SeqID String] Nucleotide
[Sequence Length] [SLEN] The total length of the sequence − the number of nucleotides or amino acids in the sequence. The colon ( : ) separates the beginning and end of a length range.

Examples:

755[SLEN] Nucleotide Protein
100:1000[SLEN] Nucleotide Protein
[Substance Name] [SUBS] The names of chemical substances associated with a record. This field is only populated for sequences extracted from structure records – PDB derived sequences. The associated residue position is often included.

Examples:

mg, 1010[Substance Name] Nucleotide
atp[Substance Name] Protein
[Text Word] [WORD] Text on a sequence record that is not indexed in other fields. Terms indexed here are included in an All Fields search, not generally useful.
[Title] [TI] OR [TITL] Words and phrases found in the title of the sequence record. The title is the DEFINITION line of the GenBank/GenPept format of the record. This line summarizes the biology of the sequence and includes the organism, product name, gene symbol, molecule type, and sequence completeness.

complete cds[TI] Nucleotide
kinesin[TI] Nucleotide Protein
liver[TI] Nucleotide Protein
uncultured[TI] Nucleotide Protein
[Volume] [VOL] Contains the volume number of the journals in references on the sequence record, not generally useful in the sequence databases.

Queries using any term followed by the full name of the indexed field in square brackets will only retrieve records with the term indexed in that field. For example a search with apolipoprotein[Title] finds only records with “apolipoprotein” indexed for their Title field. Some fields have shorter names that can also be used instead of the full name. These are listed in the Abbreviated Field Specifier column of Table 1 when available.

Bookshelf ID: NBK49540