Prokaryotic Genome Annotation Examples
Figure 1: Sample FASTA-formatted sequence
>contig001 [organism=Escherichia coli] [strain=HTE831]
tagagcaaaaaatagacattttaatggcgctaatcatacaaggaaggaataataacactg
acatggatacatccacttaatctacatttgcttattcctatcttgactatatctatatcc
[etc.]
Figure 2: Feature table format
This mock example of a feature table file includes:
- Features on the complementary strand (eg, Ngs_3038 and Ngs_11232 ).
- A 'broken' gene tagged as "pseudo" because it has frameshifts or internal stop codons but is not known to be a pseudogene ( Ngs_10112 ).
- Gene for a bifunctional protein ( Ngs_2945 ).
- RNAs (eg, Ngs_10111 and Ngs_11232 )
- Transposable element ( repeat_region feature )
- Features that are partial (eg, Ngs_2945 ).
- A misc_feature .
- Experiment and inference evidence qualifiers (CDS of Ngs_17131 and Ngs_2945 )
Note that the relative order of the features in the file does not matter, and that the misc_feature and repeat_region features do not have a corresponding gene feature, and so do not have a locus_tag.
See the flatfile view of this file in Figure 3 .
>Feature contig001
63574 65173 gene
locus_tag Ngs_17131
63574 65173 CDS
product hypothetical protein
protein_id gnl|ncbi|Ngs_17131
inference similar to DNA sequence:INSD:AY123455.2
102492 101261 gene
locus_tag Ngs_3038
gene ftsW
102492 101261 CDS
product flippase
protein_id gnl|ncbi|Ngs_3038
112616 >113646 gene
locus_tag Ngs_2945
112616 >112646 CDS
product bifunctional methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase
EC_number 1.5.1.5
EC_number 3.5.4.9
experiment Western blot
protein_id gnl|ncbi|Ngs_2945
101 180 gene
locus_tag Ngs_10111
gene trnL
101 180 tRNA
product Leu
45111 45190 gene
locus_tag Ngs_10112
pseudo
45111 45190 tRNA
product Xxx
2103 400 gene
locus_tag Ngs_11232
2103 400 rRNA
product 16S ribosomal RNA
60101 60567 misc_feature
note similar to ABC transporters
43027 43136 repeat_region
mobile_element transposon:Tn22
Figure 3: GenBank flatfile
This is part of the flatfile view of the .sqn file made from the .fsa file ( Fig. 1 ) and .tbl file ( Fig. 2 ).
source 1..116100
/organism="Escherichia coli"
/mol_type="genomic DNA"
/strain="HTE831"
/db_xref="taxon:562"
gene 101..180
/gene="trnL"
/locus_tag="Ngs_10111"
tRNA 101..180
/gene="trnL"
/locus_tag="Ngs_10111"
/product="tRNA-Leu"
gene complement(400..2103)
/locus_tag="Ngs_11232"
rRNA complement(400..2103)
/locus_tag="Ngs_11232"
/product="16S ribosomal RNA"
repeat_region 43027..43136
/mobile_element="transposon:Tn22"
gene 45111..45190
/locus_tag="Ngs_10112"
/pseudo
tRNA 45111..45190
/locus_tag="Ngs_10112"
/product="tRNA-OTHER"
/pseudo
repeat_region 56408..56558
/mobile_element="transposon:Tn22"
misc_feature 60101..60567
/note="similar to ABC transporters"
gene 63574..65173
/locus_tag="Ngs_17131"
CDS 63574..65173
/locus_tag="Ngs_17131"
/inference="similar to DNA sequence:INSD:AY123455.2"
/codon_start=1
/product="hypothetical protein"
/translation="MQSTQSKSDRSSMHRGPLLLCAVMVVLVTLPEQINARMAFEKLT
DFDFPGNTYYSVKNLSLYECQGWCREEADCQAAAFSFVVNPLSPSQETHCQLQNDSSA
ANPSAAPQRSANMYYMIKLQLRSENVCHRPWSFERVPNKVIRGLDNALIYTSTKEACL
SACLNERRFVCRSVEYDYNNMKCVLSDSDRRSSGQFVQLVDAQGTDYFENLCLKPAQA
CKNNRSFGNSQKMGVSEEKVAQYVGLHYYTDKELQVTSESACRLACEIESEFLCRSFL
ALAVTCALMILLYISTLFCYYMKKWMQPHKIVA"
gene complement(101261..102492)
/gene="ftsW"
/locus_tag="Ngs_3038"
CDS complement(101261..102492)
/gene="tpnI"
/locus_tag="Ngs_3038"
/codon_start=1
/product="flippase"
/translation="MRMRGRRLLPIILSLLLIVLLSLCYFSNHLRDSSQSRKNGFLLH
LPLETKRNPSNPNTPLSNLLNLTDFHYLLASNVCRKAKRELLAVLIVTSYAGHDALRS
AHRQAIPQSKLEEMGLRRVFLLAALPSREHFISQDQLASEQNRFGDLLQGNFIEDYRN
LSYKHVMGLKWVSEECKKQAKFIIKLDDDIIYDVFHLRRYLETLEVREPGLATSSTLL
SGYVLDAKPPIRLRANKWYVSKKEYPQALYPAYLSGWLYVTNVPTAERIVAEAERMSF
FWIDDTWLTGVVRTRLGIPLERHNDWFSANAEFIDCCVRDLKKHNYECEYSVGPNGGD
DRLLVEFLHNVEKCYFDECVKRPVGKSLKETCLAAAKSRPPKHGFPEIKALRLR"
gene 112616..113646
/locus_tag="Ngs_2945"
CDS 112616..113646
/locus_tag="Ngs_2945"
/EC_number="3.5.4.9"
/EC_number="1.5.1.5"
/experiment="Western blot"
/codon_start=1
/product="bifunctional methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase"
/translation="MESITFGVLTISDTCWQEPEKDTSGPILRQLIGETFANTQVIGN
IVPDEKDIIQQELRKWIDREELRVILTTGGTGFAPRDVTPEATRQLLEKECPQLSMYI
TLESIKQTQYAALSRGLCGIAGNTLILNLPGSEKAVKECFQTISALLPHAVHLIGDDV
SLVRKTHAEVQGSAQKSHICPHKTGTGTDSDRNSPYPMLPVQEVLSIIFNTVQKTANL
NKILLEMNAPVNIPPFRASIKDGYAMKSTGFSGTKRVLGCIAAGDSPNSLPLAEDECY
KINTGAPLPLEADCVVQVEDTKLLQLDKNGQESLVDILVEPQAGLDVRPVGYDLSTND
RIFPALDPSPVVVKSLLASVGNRLILSKPKVAIVSTGSELCSPRNQLTPGKIFDSNTT
MLTELLVYFGFNCMHTCVLSDSFQRTKESLLELFEVVDFVICSGGVSMGDKDFVKSVL
EDLQFRIHCGRVNIKPGKPMTFASRKDKYFFGLPGNPVSAFVTFHLFALPAIRFAAGW
DRCKCSLSVLNVKLLNDFSLDSRPEFVRASVISKSGELYASVNGNQISSRLQSIVGAD
VLINLPARTSDRPLAKAGEIFPASVLRFDFISKYE"
ORIGIN
1 tagagcaaaa aatagacatt ttaatggcgc taatcataca aggaaggaat aataacactg
61 acatggatac atccacttaa tctacatttg cttattccta tcttgactat atctatatcc
[etc.]