Background
The naked mole-rat (NMR Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights into medicine, biochemistry, and evolution.
More...Background
The naked mole-rat (NMR Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights into medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging.
Summary of 2022 assembly
We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR, linked read sequencing of its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged to have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly was annotated by Rapid-Ensembl using public RNA-seq data from multiple tissues (SRP061363). This annotated assembly is found at (https://rapid.ensembl.org/Heterocephalus_glaber_GCA_944319715.1/).
Improvement of the 2022 assembly
We identified intra-chromosomal inversions and translocations within the 2022 genome asssemblies. We addressed these misassemblies by first splitting the 2022 chromosomes into draft scaffolds computed by 3d-DNA. We then computed syntenic blocks between this assembly and the American Porcupine (EreDor) genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_028451465.1/). Syntenic blocks were completed using a custom script that filters Blastn matches for repeats, random matching, and multi-matching. Adjacent matches were then grouped based on alignment quality and density. Large draft scaffolds aligning to the same syntenic block were automatedly placed adjacently. Small draft scaffolds were placed and oriented manually, followed by the manual correction of small misassemblies. Next, we used the FISH-Karyotype of the naked mole-rat completed by Romanenko et al., 2023 (380307020) to address any misassemblies and place centromeres. Annotations were lifted over from the 2022 Rapid-Ensembl using LiftOff, and will be repeated by Ensembl in the future. Syntenic analysis and in silico chromosome painting were led by Dr. Jingtao Lilue, PI & Head of Bioinformatics Department Oujiang Laboratory Zhejiang, P.R.China.
Comment
The maternal haplotype is the primary assembly. The maternal haplotype of Trio-binned genome of a male wild-type naked mole-rat. Reads in the initial assembly were a combination of the maternal- and unassigned- reads.
Less...