Taxonomy report
General information about a taxonomic identifier
Taxonomy report
The downloaded taxonomy data package contains a taxonomy data report in
JSON Lines
format in the file:
ncbi_dataset/data/taxonomy_report.jsonl
Each line of the taxonomy data report file is a hierarchical JSON
object that represents a single taxonomy record. The schema of the taxonomy record is defined in the tables below
where each row describes a single field in the report or a sub-structure, which is a collection of fields.
The outermost structure of the report is TaxonomyNode.
Table fields that include a Table Field Mnemonic can be used with the
dataformat command-line tool's --fields
Sample report
{
"taxonomy": {
"taxId": 9606,
"rank": "SPECIES",
"currentScientificName": {
"name": "Homo sapiens",
"authority": "Linnaeus, 1758"
},
"curatorCommonName": "human",
"groupName": "primates",
"classification": {
"superkingdom": {
"name": "Eukaryota",
"id": 2759
},
"kingdom": {
"name": "Metazoa",
"id": 33208
},
"phylum": {
"name": "Chordata",
"id": 7711
},
"class": {
"name": "Mammalia",
"id": 40674
},
"order": {
"name": "Primates",
"id": 9443
},
"family": {
"name": "Hominidae",
"id": 9604
},
"genus": {
"name": "Homo",
"id": 9605
},
"species": {
"name": "Homo sapiens",
"id": 9606
}
},
"parents": [
1,
131567,
2759,
33154,
33208,
6072,
33213,
33511,
7711,
89593,
7742,
7776,
117570,
117571,
8287,
1338369,
32523,
32524,
40674,
32525,
9347,
1437010,
314146,
9443,
376913,
314293,
9526,
314295,
9604,
207598,
9605
],
"children": [
741158,
63221
],
"counts": [
{
"type": "COUNT_TYPE_ASSEMBLY",
"count": 1689
},
{
"type": "COUNT_TYPE_GENE",
"count": 193360
},
{
"type": "COUNT_TYPE_tRNA",
"count": 701
},
{
"type": "COUNT_TYPE_rRNA",
"count": 785
},
{
"type": "COUNT_TYPE_snRNA",
"count": 166
},
{
"type": "COUNT_TYPE_scRNA",
"count": 4
},
{
"type": "COUNT_TYPE_snoRNA",
"count": 1201
},
{
"type": "COUNT_TYPE_PROTEIN_CODING",
"count": 20621
},
{
"type": "COUNT_TYPE_ncRNA",
"count": 22103
},
{
"type": "COUNT_TYPE_BIOLOGICAL_REGION",
"count": 128261
},
{
"type": "COUNT_TYPE_OTHER",
"count": 844
}
],
"genomicMoltype": "dsDNA",
"currentScientificNameIsFormal": true
},
"query": [
"9606"
]
}
TaxonomyNode Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
taxId | coming soon | coming soon | uint32 | NCBI Taxonomy identifier | 9606 |
rank | RankType | The taxonomic rank of the taxonomic node. | kingdom | ||
currentScientificName | NameAndAuthority | The currently accepted name chosen out of all synonyms for the taxonomic node. | Wickerhamiella versatilis (Etchells & T.A. Bell) de Vega & Lachance, 2017 | ||
basionym | NameAndAuthority | The originally described name, no longer in use. Attached to the type material and species description. | Brettanomyces versatilis Etchells & T.A. Bell, 1950 | ||
curatorCommonName | coming soon | coming soon | string | The canonical common name. | sweet orange |
groupName | coming soon | coming soon | string | A common name describing large, well-known taxa. | even-toed ungulates |
hasTypeMaterial | coming soon | coming soon | bool | A boolean that indicates whether or not type material is available for the species. | |
classification | Classification | A subset of parent nodes including well-established ranks. | |||
parents repeated | coming soon | coming soon | uint32 | Taxids of all parents, ordered from most specific (immediate parent), to most general. | |
children repeated | coming soon | coming soon | uint32 | Taxids of children. | |
counts repeated | TaxonomyNode.CountByType | ||||
genomicMoltype | coming soon | coming soon | string | Genomic molecule type (dsDNA, ssDNA, ssDNA(-), ssRNA) | |
currentScientificNameIsFormal | coming soon | coming soon | bool | ||
secondaryTaxIds repeated | coming soon | coming soon | uint32 |
Classification Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
superkingdom | TaxData | ||||
kingdom | TaxData | ||||
phylum | TaxData | ||||
class | TaxData | ||||
order | TaxData | ||||
family | TaxData | ||||
genus | TaxData | ||||
species | TaxData |
NameAndAuthority Structure
Name and authority object.
Contains information on the taxonomic node’s name, authority, publications, basionym, synonyms, etc.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | coming soon | coming soon | string | This could be the scientific name, common name, synonym, etc. depending on the context. | |
authority | coming soon | coming soon | string | The authority that this name was created by. The authority is typically representedby the author(s) name and the year in which it was published. | |
typeStrains repeated | TaxonomyTypeMaterial | Any type materials for this entry. | |||
curatorSynonym | coming soon | coming soon | string | The primary synonym of the scientific name. | Leptosphaeria maculans |
homotypicSynonyms repeated | NameAndAuthority | (Taxonomy names report only) Names generated after the basionym (e.g. by moving it to a different genus), but sharing the same type. Usually these are the results of genus changes. Also known as objective synonym, nomenclatural synonym. | Candida versatilis (Etchells & T.A. Bell) S.A. Mey. & Yarrow, 1978 | ||
heterotypicSynonyms repeated | NameAndAuthority | (Taxonomy names report only) List of heterotypic synonyms associated with this entry. | |||
otherSynonyms repeated | NameAndAuthority | List of other (not listed as heterotypic or homotypic) synonyms associated with this entry. | |||
informalNames repeated | coming soon | coming soon | string | List of informal names for the entry. | cow, spider |
basionym | NameAndAuthority | The originally described name, no longer in use. Attached to the type material and species description. | Brettanomyces versatilis Etchells & T.A. Bell, 1950 | ||
publications repeated | NameAndAuthority.Publication | Contains a list of publication objects related to this species. | |||
notes repeated | NameAndAuthority.Note | Contains a list of note objects related to this species. | |||
formal | coming soon | coming soon | bool | Indicates whether the name is formal (i.e. compliant) |
NameAndAuthority.Note Structure
Note object
Contains information related to this specific entry.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | coming soon | coming soon | string | Name of the notation. | |
note | coming soon | coming soon | string | Note text. | |
noteClassifier | NameAndAuthority.NoteClassifier | Note classification |
NameAndAuthority.Publication Structure
Publication object
Contains information about the publication such as the name and the citation.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | coming soon | coming soon | string | Name of the publication (article, book, etc.). | |
citation | coming soon | coming soon | string | Citation to the publication. |
TaxData Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | coming soon | coming soon | string | Taxonomic name | |
id | coming soon | coming soon | uint32 | NCBI Taxonomy identifier |
TaxonomyNode.CountByType Structure
Count of various attributes, summed up for above species ranks.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
type | CountType | ||||
count | coming soon | coming soon | uint32 |
CollectionType Enumeration
Name | Number | Description |
---|---|---|
no_collection_type | 0 | |
collection_culture_collection | 1 | |
specimen_voucher | 2 |
CountType Enumeration
Name | Number | Description |
---|---|---|
COUNT_TYPE_UNSPECIFIED | 0 | |
COUNT_TYPE_ASSEMBLY | 1 | |
COUNT_TYPE_GENE | 2 | |
COUNT_TYPE_tRNA | 3 | |
COUNT_TYPE_rRNA | 4 | |
COUNT_TYPE_snRNA | 5 | |
COUNT_TYPE_scRNA | 6 | |
COUNT_TYPE_snoRNA | 7 | |
COUNT_TYPE_PROTEIN_CODING | 8 | |
COUNT_TYPE_PSEUDO | 9 | |
COUNT_TYPE_TRANSPOSON | 10 | |
COUNT_TYPE_miscRNA | 11 | |
COUNT_TYPE_ncRNA | 12 | |
COUNT_TYPE_BIOLOGICAL_REGION | 13 | |
COUNT_TYPE_OTHER | 14 | |
COUNT_TYPE_ORGANELLE | 15 |
NameAndAuthority.NoteClassifier Enumeration
Class of authority
If the authority has any special classification, such as having been effectively and validly published or having been included in an approved list.
Name | Number | Description |
---|---|---|
no_authority_classifier | 0 | No specific classification. |
effective_name | 1 | Has been effectively and validly published (i.e. in the “International Code of Nonemclature of Prokaryotes”). |
nomen_approbbatum | 2 | Has been included in an approved list (such as the “Approved List of Bacterial Names”). |
ictv_accepted | 3 | Has been ICTV accepted |
RankType Enumeration
Rank level
Name | Number | Description |
---|---|---|
NO_RANK | 0 | |
SUPERKINGDOM | 1 | |
KINGDOM | 2 | |
SUBKINGDOM | 3 | |
SUPERPHYLUM | 4 | |
SUBPHYLUM | 5 | |
PHYLUM | 6 | |
CLADE | 31 | |
SUPERCLASS | 7 | |
CLASS | 8 | |
SUBCLASS | 9 | |
INFRACLASS | 10 | |
COHORT | 11 | |
SUBCOHORT | 12 | |
SUPERORDER | 13 | |
ORDER | 14 | |
SUBORDER | 15 | |
INFRAORDER | 16 | |
PARVORDER | 17 | |
SUPERFAMILY | 18 | |
FAMILY | 19 | |
SUBFAMILY | 20 | |
GENUS | 21 | |
SUBGENUS | 22 | |
SPECIES_GROUP | 23 | |
SPECIES_SUBGROUP | 24 | |
SPECIES | 25 | |
SUBSPECIES | 26 | |
TRIBE | 27 | |
SUBTRIBE | 28 | |
FORMA | 29 | |
VARIETAS | 30 | |
STRAIN | 320 | |
SECTION | 330 | |
SUBSECTION | 340 | |
PATHOGROUP | 350 | |
SUBVARIETY | 360 | |
GENOTYPE | 370 | |
SEROTYPE | 380 | |
ISOLATE | 390 | |
MORPH | 400 | |
SERIES | 410 | |
FORMA_SPECIALIS | 420 | |
SEROGROUP | 430 | |
BIOTYPE | 440 |
Scalar Value Types
Protocol buffers type | Notes | C++ | Python | Java | Go |
---|---|---|---|---|---|
double | double | float | double | float64 | |
float | float | float | float | float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long | long | int64 |
uint32 | Uses variable-length encoding. | uint32 | int/long | int | uint32 |
uint64 | Uses variable-length encoding. | uint64 | int/long | long | uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long | long | int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long | long | uint64 |
sfixed32 | Always four bytes. | int32 | int | int | int32 |
sfixed64 | Always eight bytes. | int64 | int/long | long | int64 |
bool | bool | boolean | boolean | bool | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode | String | string |
bytes | May contain any arbitrary sequence of bytes. | string | str | ByteString | []byte |