Modifiers marked as deprecated will no longer be accepted as separate modifiers in new submissions starting January 1, 2025. They may still appear on older records and taxonomic terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
Modifiers for FASTA Definition Lines
General Format
Source information contained within FASTA definition lines can be automatically fielded to the appropriate feature or descriptor using NCBI submission tools. Listed below are the currently available modifiers. You may include as many modifiers as you like, but each must be bounded by a set of brackets. The name of the modifier must be written exactly as shown in the list below.
An example of a string of modifiers is:
[organism=Mus musculus] [strain=BALB/c] [chromosome=5] [sex=male] [tissue-type=testis] [moltype=mRNA]
Do not use hard returns between the bracketed data. The FASTA definition line must be a single line of text and cannot contain a hard return. If you have trouble importing your FASTA sequences, please confirm that a hard return was not inserted by your editing software.
Source Modifier List
Descriptions of these modifiers can be found here. These source modifiers should be used in the format [modifier=text].
- acronym (deprecated)
- altitude
- anamorph (deprecated)
- authority (deprecated)
- bio-material
- biotype (deprecated)
- biovar (deprecated)
- breed
- cell-line
- cell-type
- chemovar (deprecated)
- chromosome
- clone
- clone-lib (deprecated)
- collected-by
- collection-date
- common (deprecated)
- cultivar
- culture-collection
- dev-stage
- ecotype
- endogenous-virus-name
- forma (deprecated)
- forma-specialis (deprecated)
- fwd-PCR-primer-name
- fwd-PCR-primer-seq
- genotype
- geo-loc-name
- group (deprecated)
- haplogroup
- haplotype
- host
- identified-by (deprecated)
- isolate
- isolation-source
- lab-host
- lat-lon
- linkage-group
- map
- mating-type
- note
- organism
- pathovar (deprecated)
- plasmid-name
- plastid-name
- pop-variant (deprecated)
- rev-PCR-primer-name
- rev-PCR-primer-seq
- segment
- serogroup (deprecated)
- serotype
- serovar
- sex
- specimen-voucher
- strain
- sub-species
- subclone (deprecated)
- subgroup (deprecated)
- substrain (deprecated)
- subtype (deprecated)
- synonym (deprecated)
- teleomorph (deprecated)
- tissue-lib (deprecated)
- tissue-type
- type (deprecated)
- variety
Modifiers with Formatted Values
Culture-collection has a mandatory format of "institution code:collection code:culture_id". However, collection code is not required.
Specimen-voucher and bio-material have optional structured formats.
Other modifiers do not include any submitter provided text. The format for these modifiers in the FASTA definition line is:
[modifier= ] or [modifier=TRUE]
Modifiers using this format are:
- environmental-sample
- germline
- metagenomic
- rearranged
- transgenic
Descriptors with Controlled Vocabulary
Many of the descriptors that refer to the sequenced molecule and the genetic code can be edited using the FASTA definition line. In all cases, these descriptors have a controlled vocabulary and should only be added when their values differ from the default value.
The default molecule type is genomic DNA. If the submission was derived from mRNA, you can add this information to the FASTA definition line. When using tbl2asn to submit an mRNA sequence, you must specify the molecule type in the FASTA definition line. For example:
[moltype=mRNA]
In order to specify a genetic code for an organism that is not yet listed in the NCBI Taxonomy Browser , you can use the modifiers "gcode" or "mgcode" in the FASTA definition line. The inclusion of mgcode is only necessary if the sequence is derived from the mitochondrion. In both cases, the value of the modifier must be the integer assigned to the appropriate genetic code. For example:
[gcode=1] or [mgcode=5]
would set the nuclear genetic code to "The Standard Code" (translation table 1) or the mitochondrial genetic code to "The Invertebrate Mitochondrial Code" (translation table 5).