Genome assembly report
Genome record accession, organism, assembly statistics, and annotation info
Genome assembly report
The downloaded genome package contains a genome assembly
data report in JSON Lines
format in the file:
ncbi_dataset/data/assembly_data_report.jsonl
Each line of the genome assembly data report file is a hierarchical JSON
object that represents a single genome assembly record. The schema of the genome assembly record is defined in the tables
below where each row describes a single field in the report or a sub-structure, which is a collection of fields.
The outermost structure of the report is AssemblyDataReport.
Table fields that include a Table Field Mnemonic can be used with the
dataformat command-line tool's --fields
Sample report
{
"accession": "GCF_000001405.40",
"annotationInfo": {
"busco": {
"buscoLineage": "primates_odb10",
"buscoVer": "5.7.1",
"complete": 0.9887518,
"duplicated": 0.009433962,
"fragmented": 0.0045718434,
"missing": 0.0066763423,
"singleCopy": 0.97931784,
"totalCount": "13780"
},
"method": "Best-placed RefSeq; Gnomon; RefSeqFE; cmsearch; tRNAscan-SE",
"name": "GCF_000001405.40-RS_2024_08",
"pipeline": "NCBI eukaryotic genome annotation pipeline",
"provider": "NCBI RefSeq",
"releaseDate": "2024-08-23",
"reportUrl": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Homo_sapiens/GCF_000001405.40-RS_2024_08.html",
"softwareVersion": "10.3",
"stats": {
"geneCounts": {
"nonCoding": 22163,
"other": 411,
"proteinCoding": 20078,
"pseudogene": 17063,
"total": 59715
}
},
"status": "Updated annotation"
},
"assemblyInfo": {
"assemblyLevel": "Chromosome",
"assemblyName": "GRCh38.p14",
"assemblyStatus": "current",
"assemblyType": "haploid-with-alt-loci",
"bioprojectAccession": "PRJNA31257",
"bioprojectLineage": [
{
"bioprojects": [
{
"accession": "PRJNA31257",
"title": "The Human Genome Project, currently maintained by the Genome Reference Consortium (GRC)"
}
]
}
],
"blastUrl": "https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_SPEC=GDH_GCF_000001405.40",
"description": "Genome Reference Consortium Human Build 38 patch release 14 (GRCh38.p14)",
"pairedAssembly": {
"accession": "GCA_000001405.29",
"onlyGenbank": "4 unlocalized and unplaced scaffolds.",
"status": "current"
},
"refseqCategory": "reference genome",
"releaseDate": "2022-02-03",
"submitter": "Genome Reference Consortium",
"synonym": "hg38"
},
"assemblyStats": {
"contigL50": 18,
"contigN50": 57879411,
"gapsBetweenScaffoldsCount": 349,
"gcCount": "1374283647",
"gcPercent": 41.0,
"numberOfComponentSequences": 35611,
"numberOfContigs": 996,
"numberOfOrganelles": 1,
"numberOfScaffolds": 470,
"scaffoldL50": 16,
"scaffoldN50": 67794873,
"totalNumberOfChromosomes": 24,
"totalSequenceLength": "3099441038",
"totalUngappedLength": "2948318359"
},
"currentAccession": "GCF_000001405.40",
"organelleInfo": [
{
"description": "Mitochondrion",
"submitter": "Genome Reference Consortium",
"totalSeqLength": "16569"
}
],
"organism": {
"commonName": "human",
"organismName": "Homo sapiens",
"taxId": 9606
},
"pairedAccession": "GCA_000001405.29",
"sourceDatabase": "SOURCE_DATABASE_REFSEQ"
}
AssemblyDataReport Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Assembly Accession | string | The GenColl assembly accession | GCF_000001405.40 |
currentAccession | current-accession | Current Accession | string | The latest GenColl assembly accession for this revision chain | GCF_000001405.40 |
sourceDatabase | source_database | Source Database | SourceDatabase | Source of the accession. The paired accession, if it exists, is from the other database. | REFSEQ GENBANK |
organism | organism- | Organism | Organism | ||
assemblyInfo | assminfo- | Assembly | AssemblyInfo | Metadata for the genome assembly submission | |
assemblyStats | assmstats- | Assembly Stats | AssemblyStats | Global statistics for the genome assembly | |
organelleInfo repeated | organelle- | Organelle | OrganelleInfo | Metadata for all associated organelle genomes | |
additionalSubmitters repeated | ExtraSequenceInfo | Submitter data for all associated extra sequences | |||
annotationInfo | annotinfo- | Annotation | AnnotationInfo | Metadata and statistics for the genome assembly annotation, when available | |
wgsInfo | wgs- | WGS | WGSInfo | Metadata pertaining to the Whole Genome Shotgun (WGS) record for the genome assembliesthat are complete genomes. Those that are clone-based do not haveWGS-master records. | |
typeMaterial | type_material- | Type Material | TypeMaterial | ||
checkmInfo | checkm- | CheckM | CheckM | Metadata on the completeness and contamination of this assembly | |
averageNucleotideIdentity | ani- | ANI | AverageNucleotideIdentity |
ANIMatch Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
assembly | assembly | Assembly | string | GCA_010191885.1 | |
organismName | organism | Organism | string | Salmonella enterica subsp. enterica serovar Typhimurium | |
category | category | Type Category | ANITypeCategory | Type material | |
ani | ani | ANI | float | 98.5 | |
assemblyCoverage | assembly_coverage | Assembly Coverage | float | AKA qcoverage | 90.75 |
typeAssemblyCoverage | type_assembly_coverage | Type Assembly Coverage | float | AKA scoverage | 89.60 |
AnnotationInfo Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | name | Name | string | ||
provider | provider | Provider | string | ||
releaseDate | release-date | Release Date | string | ||
reportUrl | report-url | Report URL | string | ||
stats | featcount- | Count | FeatureCounts | ||
busco | busco- | BUSCO | BuscoStat | ||
method | method | Method | string | ||
pipeline | pipeline | Pipeline | string | ||
softwareVersion | software-version | Software Version | string | ||
status | status | Status | string |
AssemblyInfo Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
assemblyLevel | level | Level | string | The level at which a genome has been assembled | chromosome scaffold contig |
assemblyStatus | status | Status | AssemblyStatus | The GenColl assembly status | current |
pairedAssembly | paired-assm- | Paired Assembly | PairedAssembly | Metadata from the GenBank or RefSeq assembly paired with this one | |
assemblyName | name | Name | string | The assembly submitter’s name for the genome assembly, when provided. Otherwise, a default name in theform ASM#####v# is assigned | GRCh38.p14 ASM985889v3 |
assemblyLongName | long-name | LongName | string | Genome Reference Consortium Human Build 38 patch release 14 (GRCh38.p14) | |
assemblyType | type | Type | string | Chromosome content of the submitted genome assembly | haploid-with-alt-loci haploid |
bioprojectLineage repeated | bioproject- | BioProject | BioProjectLineage | The lineage of BioProject accessions. The specific BioProject which produced the sequences in thegenome assembly is listed first, followed in order by its antecedents. | |
bioprojectAccession | bioproject | BioProject Accession | string | ||
releaseDate | release-date | Release Date | string | Date the assembly was made available by NCBI. This field is not returned by versions of the datasets Command Line Interface (CLI) program < 15. | |
description | description | Description | string | Long description for this genome | |
submitter | submitter | Submitter | string | The submitting consortium or organization. Full submitter information is available in the BioProject | |
refseqCategory | refseq-category | Refseq Category | string | The RefSeq Category is either reference or representative genome and indicates the RefSeq project classification | reference genome representative genome |
synonym | synonym | Synonym | string | Genome name ascribed to this assembly by the UC Santa Cruz genome browser | hg38 |
linkedAssemblies repeated | linked-assm- | Linked Assembly | LinkedAssembly | Genome assemblies derived from the same diploid individual | |
atypical | atypical | Atypical | AtypicalInfo | Information on atypical genomes - genomes that have assembly issues or are otherwise atypical | |
genomeNotes repeated | notes | Notes | string | All the RefSeq messages associated with this assembly | |
sequencingTech | sequencing-tech | Sequencing Tech | string | Sequencing technology used to sequence this genome | |
assemblyMethod | assembly-method | Assembly Method | string | Genome assembly method | |
groupingMethod | grouping-method | Grouping Method | string | ||
biosample | biosample- | BioSample | BioSampleDescriptor | NCBI BioSample from which the sequences in the genome assembly were obtained. | |
blastUrl | blast-url | Blast URL | string | URL to blast page for this assembly | |
comments | coming soon | coming soon | string | Freeform comments | |
suppressionReason | suppression-reason | Suppression Reason | string | The reason for the assembly is suppressed, for suppressed assemblies | |
diploidRole | LinkedAssemblyType |
AssemblyStats Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
totalNumberOfChromosomes | total-number-of-chromosomes | Total Number of Chromosomes | uint32 | Count of nuclear chromosomes, organelles and plasmids in a submitted genome assembly | |
totalSequenceLength | total-sequence-len | Total Sequence Length | uint64 | Total sequence length of the nuclear genome including unplaced and unlocalized sequences | |
totalUngappedLength | total-ungapped-len | Total Ungapped Length | uint64 | Total length of all top-level sequences ignoring gaps. Any stretch of 10 or more Ns in a sequence is treated like a gap | |
numberOfContigs | number-of-contigs | Number of Contigs | uint32 | Total number of sequence contigs in the assembly. Any stretch of 10 or more Ns in a sequence is treated as a gap between twocontigs in a scaffold when counting contigs and calculating contig N50 & L50 values | |
contigN50 | contig-n50 | Contig N50 | uint32 | Length such that sequence contigs of this length or longer include half the bases of the assembly | |
contigL50 | contig-l50 | Contig L50 | uint32 | Number of sequence contigs that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly | |
numberOfScaffolds | number-of-scaffolds | Number of Scaffolds | uint32 | Number of scaffolds including placed, unlocalized, unplaced, alternate loci and patch scaffolds | |
scaffoldN50 | scaffold-n50 | Scaffold N50 | uint32 | Length such that scaffolds of this length or longer include half the bases of the assembly | |
scaffoldL50 | scaffold-l50 | Scaffold L50 | uint32 | Number of scaffolds that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly | |
gapsBetweenScaffoldsCount | gaps-between-scaffolds-count | Gaps Between Scaffolds Count | uint32 | Number of unspanned gaps between scaffolds | |
numberOfComponentSequences | number-of-component-sequences | Number of Component Sequences | uint32 | Total number of component WGS or clone sequences in the assembly | |
gcPercent | gc-percent | GC Percent | float | The percentage of GC base-pairs in the assembly | |
genomeCoverage | genome-coverage | Genome Coverage | string | Genome assembly coverage | |
numberOfOrganelles | number-of-organelles | Number of Organelles | uint32 | number of organelles |
AtypicalInfo Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
isAtypical | is-atypical | Is Atypical | bool | If true there are assembly issues or the assembly is in some way non-standard | |
warnings repeated | warnings | Warnings | string | The reasons that the assembly is considered atypical |
AverageNucleotideIdentity Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
taxonomyCheckStatus | check-status | Check status | AverageNucleotideIdentity.TaxonomyCheckStatus | ok failed inconclusive | |
matchStatus | best-match-status | Best match status | AverageNucleotideIdentity.MatchStatus | derived-species-match | |
submittedOrganism | submitted-organism | Submitted organism | string | Column 5 of ANI Report | Salmonella enterica subsp. enterica serovar Tennessee str. CDC07-0191 |
submittedSpecies | submitted-species | Submitted species | string | Column 6 of ANI Report | Salmonella enterica |
category | category | Category | ANITypeCategory | syntype | |
submittedAniMatch | submitted-ani-match- | Declared ANI match | ANIMatch | ||
bestAniMatch | best-ani-match- | Best ANI match | ANIMatch | ||
comment | comment | Comment | string |
BioProject Structure
A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The record can be retrieved from NCBI BioProject
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | BioProject accession | PRJEB35387 |
title | title | Title | string | Title of the BioProject provided by the submitter | Sciurus carolinensis (grey squirrel) genome assembly, mSciCar1 |
parentAccessions repeated | parent-accessions | Parent Accessions | string | BioProject accession containing multiple children BioProjects | ["PRJNA489243","PRJEB33226","PRJEB40665"] |
BioProjectLineage Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
bioprojects repeated | lineage- | Lineage | BioProject | A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium |
BioSampleAttribute Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | name | Name | string | ||
value | value | Value | string |
BioSampleContact Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
lab | lab | Lab | string | Submitter lab name. |
BioSampleDescription Structure
Description of the BioSample object
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
title | title | Title | string | ||
organism | organism- | Organism | Organism | ||
comment | comment | Comment | string |
BioSampleDescriptor Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | SAMN20055006 | |
lastUpdated | last-updated | Last updated | string | ||
publicationDate | publication-date | Publication date | string | ||
submissionDate | submission-date | Submission date | string | ||
sampleIds repeated | ids- | Sample Identifiers | BioSampleId | ||
description | description- | Description | BioSampleDescription | ||
owner | owner- | Owner | BioSampleOwner | ||
models repeated | models | Models | string | ||
bioprojects repeated | bioproject- | BioProject | BioProject | ||
package | package | Package | string | MIGS.ba.air.4.0 | |
attributes repeated | attribute- | Attribute | BioSampleAttribute | ||
status | status- | Status | BioSampleStatus | ||
age | age | Age | string | ||
biomaterialProvider | biomaterial-provider- | Biomaterial provider | string | ||
breed | breed | Breed | string | ||
collectedBy | collected-by | Collected by | string | ||
collectionDate | collection-date | Collection date | string | ||
cultivar | cultivar | Cultivar | string | ||
devStage | development-stage | Development stage | string | ||
ecotype | ecotype | Ecotype | string | ||
geoLocName | geo-loc-name | Geographic location | string | ||
host | host | Host | string | ||
hostDisease | host-disease | Host disease | string | ||
identifiedBy | identified-by | Identified by | string | ||
ifsacCategory | ifsac-category | IFSAC category | string | ||
isolate | isolate | Isolate | string | ||
isolateNameAlias | isolate-name-alias | Isolate name alias | string | ||
isolationSource | isolation-source | Isolation source | string | ||
latLon | lat-lon | Latitude / Longitude | string | ||
projectName | project-name | Project name | string | ||
sampleName | sample-name | Sample name | string | ||
serovar | serovar | Serovar | string | ||
sex | sex | Sex | string | ||
sourceType | source-type | Source type | string | ||
strain | strain | Strain | string | ||
subSpecies | sub-species | Sub-species | string | ||
tissue | tissue | Tissue | string | ||
serotype | serotype | Serotype | string |
BioSampleId Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
db | db | Database | string | Wellcome Sanger Institute | |
label | label | Label | string | Sample name | |
value | value | Value | string | COG-UK/ALDP-17A6A8C |
BioSampleOwner Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | name | Name | string | ||
contacts repeated | contact- | Contact | BioSampleContact |
BioSampleStatus Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
status | status | Status | string | live | |
when | when | When | string |
BuscoStat Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
buscoLineage | lineage | Lineage | string | BUSCO Lineage | |
buscoVer | ver | Version | string | BUSCO Version | |
complete | complete | Complete | float | BUSCO score: Complete | |
singleCopy | singlecopy | Single Copy | float | BUSCO score: Single Copy | |
duplicated | duplicated | Duplicated | float | BUSCO score: Duplicated | |
fragmented | fragmented | Fragmented | float | BUSCO score: Fragmented | |
missing | missing | Missing | float | BUSCO score: Missing | |
totalCount | totalcount | Total Count | uint64 | BUSCO score: Total Count |
CheckM Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
checkmMarkerSet | marker-set | marker set | string | What taxonomic group is used as the basis for comparison with this assembly with regards to checkM values | Mycobacterium avium |
checkmSpeciesTaxId | species-tax-id | species tax id | uint32 | The species-level taxid for this assemblies checkM dataset | 1764 |
checkmMarkerSetRank | marker-set-rank | marker set rank | string | CheckM taxonomic rank of checkm_marker_set | species genus |
checkmVersion | version | version | string | CheckM software version | v1.2.0 |
completeness | completeness | completeness | float | What percent complete is this assembly | 86.83 |
contamination | contamination | contamination | float | What is the contamination percentage for this assembly | 5.18 |
completenessPercentile | completeness-percentile | completeness percentile | float | The percent of assemblies under the taxonomic grouping ‘checkm_marker_set’ that this assembly is as-or-more complete than. | 79 |
ExtraSequenceInfo Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
genbankAccession | coming soon | coming soon | string | genbank accession of extra sequence | |
refseqAccession | coming soon | coming soon | string | genbank accession of extra sequence | |
chrName | coming soon | coming soon | string | chromosome name | |
moleculeType | coming soon | coming soon | string | molecule type | |
submitter | coming soon | coming soon | string | Name of submitter | |
bioprojectAccession | coming soon | coming soon | string | Bioproject accession |
FeatureCounts Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
geneCounts | gene- | Gene | GeneCounts | Counts of gene types |
GeneCounts Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
total | total | Total | uint32 | Total number of annotated genes | |
proteinCoding | protein-coding | Protein-coding | uint32 | Count of annotated genes that encode a protein | |
nonCoding | non-coding | Non-coding | uint32 | Count of transcribed non-coding genes (e.g. lncRNAs, miRNAs, rRNAs, etc…) excludes transcribed pseudogenes | |
pseudogene | pseudogene | Pseudogene | uint32 | Count of transcribed and non-transcribed pseudogenes | |
other | other | Other | uint32 | Count of genic region GeneIDs and non-genic regulatory GeneIDs |
InfraspecificNames Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
breed | breed | Breed | string | A homogenous group of animals within a domesticated species | Hereford boxer |
cultivar | cultivar | Cultivar | string | A variety of plant within a species produced and maintained by cultivation | B73 |
ecotype | ecotype | Ecotype | string | A population or subspecies occupying a distinct habitat | Alpine |
isolate | isolate | Isolate | string | The individual isolate from which the sequences in the genome assembly were derived | L1 Dominette 01449 registration number 42190680 Pmale09 |
sex | sex | Sex | string | Male or female | female |
strain | strain | Strain | string | A genetic variant, subtype or culture within a species | SE11 |
LinkedAssembly Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
linkedAssembly | accession | Accession | string | The linked assembly accession | GCA_000212995.1 |
assemblyType | type | Type | LinkedAssemblyType | The linked assembly type |
OrganelleInfo Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
infraspecificName | infraspecific-name | Infraspecific Name | string | The strain, breed, cultivar or ecotype of the organism from which the sequences in the assembly were derived | |
bioproject repeated | bioproject-accessions | BioProject Accessions | string | The associated BioProject accession, when available | |
description | description | Description | string | Long description of the organelle genome | |
totalSeqLength | total-seq-length | Total Seq Length | uint64 | Sequence length of the organelle genome | |
submitter | submitter | Submitter | string | Name of submitter |
PairedAssembly Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | The GenColl assembly accession of the GenBank or RefSeq assembly paired with this one | GCF_000001405.40 |
status | status | Status | AssemblyStatus | GenColl Assembly status from paired record | current |
annotationName | name | Name | string | Annotation name from paired record | |
onlyGenbank | only-genbank | Only Genbank | string | Sequences that are only included in the GenBank assembly | |
onlyRefseq | only-refseq | Only RefSeq | string | Sequences that are only included in the RefSeq assembly | |
changed | changed | Changed | string | Sequences present on both the GenBank and the RefSeq assemblies that have been changed, e.g., contaminated sequence in the GenBank assembly has been replaced with a gap | |
manualDiff | manual-diff | Manual Diff | string | Additional details about sequence differences between the GenBank and RefSeq assemblies |
TypeMaterial Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
typeLabel | label | Label | string | ||
typeDisplayText | display_text | Display Text | string |
WGSInfo Structure
Whole Genome Shotgun (WGS) projects are genome assemblies of incomplete genomes or incomplete chromosomes of prokaryotes or eukaryotes that are generally being sequenced by a whole genome shotgun strategy.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
wgsProjectAccession | project-accession | project accession | string | AAEX03 CABHLF01 | |
masterWgsUrl | url | URL | string | https://www.ncbi.nlm.nih.gov/nuccore/AAEX00000000.3 | |
wgsContigsUrl | contigs-url | contigs URL | string | https://www.ncbi.nlm.nih.gov/Traces/wgs/AAEX03 |
ANITypeCategory Enumeration
Name | Number | Description |
---|---|---|
ANI_CATEGORY_UNKNOWN | 0 | |
claderef | 1 | |
category_na | 2 | |
neotype | 3 | |
no_type | 4 | |
pathovar | 5 | |
reftype | 6 | |
suspected_type | 7 | |
syntype | 8 | |
type | 9 |
AssemblyStatus Enumeration
Name | Number | Description |
---|---|---|
ASSEMBLY_STATUS_UNKNOWN | 0 | |
current | 1 | |
previous | 2 | |
suppressed | 3 | |
retired | 4 | This is deprecated - should no longer be seen in the data |
AverageNucleotideIdentity.MatchStatus Enumeration
Name | Number | Description |
---|---|---|
BEST_MATCH_STATUS_UNKNOWN | 0 | |
approved_mismatch | 1 | |
below_threshold_match | 2 | |
below_threshold_mismatch | 3 | |
best_match_status | 4 | |
derived_species_match | 5 | |
genus_match | 6 | |
low_coverage | 7 | |
mismatch | 8 | |
status_na | 9 | |
species_match | 10 | |
subspecies_match | 11 | |
synonym_match | 12 | |
lineage_match | 13 | |
below_threshold_lineage_match | 14 |
AverageNucleotideIdentity.TaxonomyCheckStatus Enumeration
Name | Number | Description |
---|---|---|
TAXONOMY_CHECK_STATUS_UNKNOWN | 0 | |
OK | 1 | |
Failed | 2 | |
Inconclusive | 3 |
LinkedAssemblyType Enumeration
Name | Number | Description |
---|---|---|
LINKED_ASSEMBLY_TYPE_UNKNOWN | 0 | |
alternate_pseudohaplotype_of_diploid | 1 | SEQUI-5245 |
principal_pseudohaplotype_of_diploid | 2 | |
maternal_haplotype_of_diploid | 3 | |
paternal_haplotype_of_diploid | 4 | |
haplotype_1 | 6 | |
haplotype_2 | 7 | |
haplotype_3 | 8 | |
haplotype_4 | 9 | |
haploid | 10 | Catch all for any value that is not explicitly listed above |
SourceDatabase Enumeration
Name | Number | Description |
---|---|---|
SOURCE_DATABASE_UNSPECIFIED | 0 | |
SOURCE_DATABASE_GENBANK | 1 | |
SOURCE_DATABASE_REFSEQ | 2 |
Scalar Value Types
Protocol buffers type | Notes | C++ | Python | Java | Go |
---|---|---|---|---|---|
double | double | float | double | float64 | |
float | float | float | float | float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long | long | int64 |
uint32 | Uses variable-length encoding. | uint32 | int/long | int | uint32 |
uint64 | Uses variable-length encoding. | uint64 | int/long | long | uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long | long | int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long | long | uint64 |
sfixed32 | Always four bytes. | int32 | int | int | int32 |
sfixed64 | Always eight bytes. | int64 | int/long | long | int64 |
bool | bool | boolean | boolean | bool | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode | String | string |
bytes | May contain any arbitrary sequence of bytes. | string | str | ByteString | []byte |