Open Access: This content is Open Access under the Creative Commons license CC-BY-NC-ND.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Mattick J, Amaral P. RNA, the Epicenter of Genetic Information: A new understanding of molecular biology. Abingdon (UK): CRC Press; 2022 Sep 20. doi: 10.1201/9781003109242-17
RNA, the Epicenter of Genetic Information: A new understanding of molecular biology.
Show detailsThere are over 170 different modifications of nucleotides in RNA, some important for the function or stability of rRNAs, tRNAs, snRNAs and snoRNAs. Modifications also modulate RNA structure-function relationships to control chromatin organization, stem cell differentiation, development, brain function, stress responses, mRNA half-life and miRNA processing, among others. RNA modifications allow mRNA vaccines to evade the innate immune response. RNA is also ‘edited’ by cytosine and adenosine deamination, to form uracil and inosine, respectively. Adenosine editing has expanded in vertebrate, mammalian, and primate evolution, especially in the brain, and in humans occurs largely in Alu elements, which comprise over 10% of the genome. The APOBEC enzymes that deaminate cytosine are vertebrate-specific, the first involved in somatic rearrangement and hypermutation of immunoglobulin domains, and have expanded under positive selection during mammalian and primate evolution, apparently to regulate retroviral and transposable element (TE) activity. TEs are mobilized in the brain, which has many unusual molecular dynamics associated with its ability to re-wire connections. Trans-generational epigenetic inheritance (‘paramutation’) occurs in both plants and animals, and involves small RNA signaling and DNA methylation. Paramutation is associated with simple tandem sequence repeats (STRs), over 1 million of which are present in the human genome and are enriched in enhancers and promoters. STR variation is associated with psychiatric disorders and cancer, as well as the modulation of physiological and neurological traits, suggesting that soft-wired inheritance of experience not only occurs but has been underestimated.
Gene-environment interactions occur in all organisms, ranging from bacterial transcriptional responses to nutrient availability to epigenetic changes in eukaryotes in response to environmental circumstances, such as that observed in the increased production of erythrocytes and increased hemoglobin expression at high altitudes 1–3 or the advent of type 2 diabetes upon prolonged obesity, 4 although genetic factors are also involved. 2 , 5
The major focus of the studies of molecular basis of long-term gene-environment interactions in humans and other eukaryotes has been changes in the patterns of DNA methylation 6–8 and, more recently, histone modifications. 9 , 10 However, it is clear that RNA is also modified, over a far wider chemical range than DNA. If DNA methylation and histone modifications are RNA-directed (Chapter 16), as are some RNA modifications (Chapter 8), it is logical that RNA is the major conduit for epigenome-environment interactions 11 and, by extension, that the expansion of RNA modifications and RNA editing in complex organisms underpins phenotypic plasticity, learning, and cognition. 12 It also appears increasingly likely that RNA is the vehicle for transgenerational soft-wired inheritance of experience. 13 , 14
RNA Modifications and the Unknown Epitranscriptome
There are over 25 cell-specific versions of the 5′ cap structure of RNAs and non-canonical initiating nucleotides with functional consequences in RNA processing, export, stability and translation. 15–18 There are also over 140 known chemical modifications of internal ribonucleotides 19–22 (Figure 17.1). These modifications occur on all four standard bases as well as on the ribose, and were detected initially in the highly abundant rRNAs and tRNAs, and later in snoRNAs and snRNAs. For decades RNA modifications were thought to be irreversible decorations important for structural stability and/or catalytic function, including in prebiotic evolution, 23 but this view changed with the discoveries that rRNA modifications are context-specific, 24 that modifications occur in thousands of mRNAs, lncRNAs, enhancer RNAs, vault RNAs, miRNAs and other non-coding RNAs, 25–37 and, especially, that RNA modifications are reversible, 38–40 leading to the birth of the term ‘epitranscriptome’ to describe the collective of regulated RNA modifications. 41
There are technical challenges in identifying RNA modifications in sequencing datasets. 42 Most of what is presently known stems from the study of m6A, and to a lesser extent m1A, m6Am (N6,2′-O-dimethyladenosine), m5C and pseudouridylation modifications, for which there are specific antibodies or reagents available a for immunocapture to identify the positions of modified bases. 25 , 26 , 28–33 , 43–46
m6A modifications in mRNAs occur typically near the stop codon, but also in 5′UTRs, coding sequences, introns and 3′UTRs. 25 , 26 , 32 , 47 m6A modifications have been found to regulate miRNA processing 34 , 48 and mRNA polyadenylation, processing, splicing, stability, translation and export 48–59 by destabilizing RNA duplexes and altering RNA-protein interactions. 60–62 m6A modification of mRNAs, promoter-associated RNAs, enhancer RNAs and repeat RNAs have been shown to regulate chromatin state, phase separation, the stability of R-loops (and their role in the repair of double-strand breaks) and transcription. 63–69
m6A modifications have been shown to regulate yeast meiosis, 70 the activity of endogenous retroviruses, 71 heterochromatin formation and TE function in embryonic stem cells and early embryos, 72–74 mammalian stem cell renewal, differentiation and development, 44 , 52 , 54 , 75–78 spermatogenesis, oogenesis and fertility, 50 , 56 , 57 , 79 , 80 embryonic development, 81 , 82 adipogenesis, 39 circadian rhythms, 49 , 83 , 84 stress responses, 85–87 hematopoietic differentiation, 88 immune responses 89 and inflammation, 90 , 91 IgH recombination, 92 neurogenesis, 93–95 neural differentiation 96 , 97 and neural circuitry, 98 , 99 spatiotemporal control of mRNA translation in neurons, 100 , 101 cerebellar development, 102 , 103 learning and memory, 104–109 cancer stem cell differentiation, 110 neuronal functions and sex determination in flies 111–113 and even rice and potato yield, 114 in some cases involving interplay with histone modifications. 90
Although most current techniques favor highly expressed RNAs, m6A modifications are increasingly being detected in non-coding regions of pre-mRNAs and lncRNAs in tissues such as placenta, kidney, liver and brain, with evidence of modulation of lncRNA functions and properties, including splicing regulation and possible effects of sequence polymorphisms. 115 Specific examples include m6A modification of MALAT1 as a structural switch affecting its protein binding and phase separation properties, 116 , 117 enhancement of Xist repressive activity, 118 and modulation of the stress-induced repressor of cyclin D1 pncRNA-D to induce cell cycle arrest. 119
Other RNA modifications that have been studied have been shown to similarly affect a wide range of processes including chromatin organization, mRNA stability, tRNA and miRNA processing, and to be involved in neurological and other disorders. 30 , 36 , 46 , 77 , 120–126 Many of these modifications have been documented to affect the function of regulatory RNAs, such as SRA, 127 7SK 128 , 129 and vault RNAs. 28
As with histone modification enzymes, there are a range of RNA modification writers, readers and erasers, the loss or perturbation of which, including in rRNAs and tRNAs, results in a range of diseases, including cancer, intellectual disability and developmental disorders. 19 , 20 , 130–132
The repertoire, substrate range (often from tRNA to mRNA) and deployment of RNA modification enzymes have been expanded by successive gene duplications that have occurred at the base of the eukaryote, metazoan, vertebrate and primate lineages, with 90 cataloged in the human genome. 133
In mammals there are multiple m6A writers (the METTL protein family), readers and erasers 134 , 135 with far-reaching functions. For example, METTL3 regulates heterochromatin in embryonic stem cells 72 and promotes homologous recombination-mediated repair of double-strand breaks by modulating DNA-RNA hybrid accumulation. 67 METTL16, which is essential for mouse embryonic development, 82 regulates the expression of an enzyme that produces the methyl donor, S-adenosyl methionine. 136 The m6A reader Ythdc1 regulates the scaffolding function of LINE1 RNA in mouse ESCs and early embryos. 73 The m6A reader Prrc2a controls oligodendroglial specification and myelination. 97 The ALKBH5 m6A eraser controls translation 40 and the splicing and stability of long 3′UTR mRNAs in male germ cells. 56
There are eight mammalian m5C writers, Nsuns1–7 and Dnmt2. Nsun1, 2, 5 and Dnmt2 are present in all eukaryotes, whereas the other Nsuns are specific to multicellular organisms and are differentially expressed during development, particularly in the brain: Nsuns1–4 participate in embryonic development, cell proliferation and differentiation; disabling mutations in Nsun2 b and Nsun7 cause intellectual disability and male sterility, respectively; Nsun5 is essential for normal growth and cerebellar development; Nsun6 associates with the Golgi apparatus and catalyzes the formation of m5C72 in specific tRNAs. It also methylates mRNAs, particularly in their 3′UTRs, but is apparently dispensable for normal development. 36 , 46 , 75 , 121 , 137–141 Nsun1 binds RNA polymerase II (RNAPII) and Nsun3 and Dnmt2 bind hnRNPK, which interacts with lineage-specific transcription factors, and with CDK9/P-TEFb to recruit RNAPII to active chromatin hubs. 124 The subcellular localizations of enzymes that methylate guanosine are altered by neuronal stimulation 142 (Figure 17.2).
The field is in its infancy: most RNA sequencing protocols involve conversion to DNA, with concomitant loss of modification information, although mismatch patterns and blockage of reverse transcription can provide an indication. 143 A solution is at hand with the advent of direct RNA sequencing using nanopore technology, c and the (altered) signals from some base modifications are now being identified. 42 , 146–150 One can only speculate on the processes controlled by the many other RNA modifications that are yet to be analyzed.
In 2005, Katalin Karikó and colleagues discovered that RNAs containing modified nucleotides [m5C, m6A, m5U, s 2 U or 1-methylpseudouridine d (m1Ψ, a naturally occurring component of eukaryotic 18S rRNA)] do not activate the mammalian Toll-like receptor 7 that detects single-stranded RNAs, innate immune receptors, 152 potentiating the development of mRNA-based vaccines. 151 Proteins produced from mRNA vaccines stimulate both innate and adaptive immune responses, and the platform is much more flexible and scalable than attenuated viruses. 153 mRNA vaccines can be produced within a day or two of a viral sequence being available and formed the frontline against SARS-CoV-2, and will likely be a platform for the delivery of other vaccines, autoimmune rectification, targeted cancer therapies and even the treatment of heart failure in the future, 154–158 using lipid nanoparticles for delivery. 159 There is also increasing appreciation of the potential for non-coding RNA therapeutics, given the central role of RNA in most biological processes, and growing evidence for the efficacy of non-coding RNA interventions. 160–163
The Expansion of RNA Editing in Cognitive Evolution
An important subset of RNA modifications is base deamination, referred to as RNA ‘editing’. e There are two classes: adenosine deamination to inosine (A>I), which registers as guanosine in translation and RNA sequencing, but has differences that may be important in vivo; and cytosine deamination to form uracil (C>U), or methylcytosine to thymine (meC>T). Alterations in both A>I and C>U editing feature prominently in human cancers. 167–171
A>I Editing
A>I editing was discovered in the late 1980s by Brenda Bass, Hal Weintraub, David Kimelman and Marc Kirschner who showed that synthetic and natural RNA duplexes formed between sense and antisense RNAs expressed from the bFGF locus in Xenopus (toad) eggs, and double-stranded viral RNAs, are substrates for an enzyme that deaminates adenosines. 172–174 A>I editing has since been extensively characterized by Bass, Kazuko Nishikura, Mary O’Connell, Robert Reenan, Charles Samuel, Peter Seeberg, Marie Öhman, Gerhardt Wagner and colleagues.
A>I editing is performed by animal-specific enzymes called ADARs, which have evolved from enzymes that deaminate adenosines in tRNAs. 168 , 175–180 Invertebrates have one or two ADARs, whereas vertebrates have three (ADAR1–3). The expression of the ADARs varies across development and tissues in mammals. ADAR1 is widely expressed throughout the body and is the most highly expressed ADAR outside the central nervous system. 181
Editing appears to occur both co- and post-transcriptionally. 182 , 183 The basic substrate is imperfect double-stranded RNA, especially A:C mismatches, which includes dsRNA regions in pre-mRNAs and lncRNAs, often involving intron-exon base pairing. 184 The substrate specificities of the different ADAR orthologs is not well understood, 175 although evidence suggests that ADAR1 imposes symmetrical editing at positions in dsRNAs 30–35 bp away from structural disruptions 185 and that ADAR2 substrate recognition involves a GCU(A/C)A pentaloop conserved in mammals and birds. 186
A>I RNA editing is widespread in RNAs encoding proteins involved in neurotransmission, including pre-synaptic release machineries and voltage- and ligand-gated ion channels, 168 , 176 , 187 , 188 where it alters codons or splicing patterns f and therefore protein structure- function relationships 172 , 187 , 191–197 to modulate the electrophysiological properties of the synapse and neuronal connections in response to activity, 168 , 191 , 192 , 196 , 198 and to adapt to environmental conditions. 199
The substrate range also includes RNAs encoding proteins involved in brain patterning, neural cell identity, maturation and function, as well as in DNA repair, implying a role for RNA editing not only in neural transmission and network plasticity but also in brain development and memory consolidation. 200 The RNAs encoding ADARs are themselves also edited, 201–203 and RNA editing is regulated by m6A modifications, 204 indicating feedback loops and interplay between RNA editing and modification systems.
A>I RNA editing occurs commonly in introns, where it influences nuclear retention and splicing, including that of ADAR2 itself. 176 , 183 , 205–207 RNA editing also alters miRNA processing, expression and target specificity. 208–211
The regulatory pathways that control A>I RNA editing are not understood. Presumably, editing alters the structure and information content of coding and regulatory RNAs in response to environmental signals and experience, a possibility supported by the observations that ADAR2 has inositol hexakisphosphate (InsP6) complexed in its active site 212 and that InsP6 regulates AMPA/glutamate receptors, 213 indicating that ADAR activity and/or target selection is linked to cell signaling pathways.
Vertebrate ADAR1 and ADAR2 are widely expressed, most highly in brain, where their editing profiles overlap; 181 , 214 both are mainly localized in the nucleus, although a longer isoform of ADAR1 shuttles between the nucleus and the cytoplasm, 179 where it modulates the innate immune response (see below). ADAR1 and ADAR2 form homo- or heterodimers in vivo and dimerization is required for catalysis. 176–178
In addition to the deaminase domain and RNA-binding domains g present in other ADARs, vertebrate ADAR1 also contains one (in the constitutively expressed shorter isoform) or two domains (in the inducible longer isoform) that recognize an alternate left-handed helical DNA or RNA structure, termed Z, 216–220 which occurs naturally through the genome. h ADAR1 binds to Z-DNA and Z-RNA i in or near repetitive elements, especially Alu elements. 223
The longer isoform of ADAR1 can be induced by interferon j and edits non-coding regions of endogenous dsRNAs at Alu repeats, which are ‘flipped’ into Z-conformation to distinguish self and suppress inappropriate activation by the MDA5 helicase of the innate immune response that occurs in the presence of unmodified viral dsRNAs. 225–228 It is also induced during learning and its knockdown leads to a reduction in Z-DNA at sites where ADAR1 is recruited and an inability to modify previously acquired memory. 229 Translation of the longer isoform of ADAR1 is potentiated by m6A modification of a conserved site in the ADAR1 transcript, 228 with cross-talk between modification systems during host responses to viral infections. 230 Loss of ADAR1 in mice results in embryonic lethality, due to a failure of hematopoiesis. 231 , 232 Mutations in human ADAR1 are one of the defined genetic causes of Aicardi-Goutières syndrome, an autoinflammatory disorder characterized by spontaneous interferon production and neurological problems. 233 , 234 Curiously, however, the loss of the editing capacity of ADAR1 has little developmental effect if the innate immune system is prevented from sensing the unedited dsRNAs. 214 , 235
Vertebrate ADAR2 is required for the editing of neuroreceptor mRNAs, especially that encoding the AMPA receptor subunit GluA2 (Gria2), 236 which has the functional consequence of rendering it Ca++ impermeable. 192 ADAR2 activity is also regulated in part by snoRNAs and nucleolar sequestration. 237 , 238 Mutations in human ADAR2 cause microcephaly and neurodevelopmental disorders. 239 , 240 Its deficiency in mice causes seizures and early lethality, which can be rescued by hard-wiring the single nucleotide change in the Gria2 gene. 241 This observation begs the question of why evolution has not imposed this change in the first place, but rather conserved the surrounding intronic sequences that are required for editing. 242
While GluA2 receptors in adults are almost universally edited, k such editing may be a mechanism for pruning synaptic trees during the postnatal maturation of neuronal circuitry in response to experience, 244–249 so that only the relevant survive. l ADAR2 expression increases as neurons mature and is spatially regulated within neurons. 198 , 251 Mice that lack ADAR2 but have the genomically encoded compensatory codon change in GluA2 exhibit changes in behavior, hearing ability and the expression profiles of RNAs in the brain, including those encoding proteins involved in synaptic trafficking. 252 There is also an N-terminal extension of ADAR2 that is expressed most highly in the cerebellum (through the alternative splicing of an upstream exon), which harbors a sequence motif closely related to the single-stranded RNA-binding domain of ADAR3, which is also expressed most highly in the cerebellum. 253 Another splice variant of ADAR2 has an insertion of an Alu cassette within the deaminase domain, which alters its catalytic activity. 254
The single ADAR in Drosophila is homologous to vertebrate ADAR2 but also has similarities to ADAR1, in that it mediates suppression of both innate immune responses and brain functions. 255 , 256 Each neuronal population in Drosophila has a different editing signature 188 and Drosophila lacking ADAR are morphologically normal but exhibit extreme behavioral deficits including temperature-sensitive paralysis, locomotor defects, tremors and neurodegeneration. 250 , 256 , 257 Bees exhibit widespread A>I editing during foraging and brood caring task performance. 258 C. elegans has two ADARs, most similar to mammalian ADAR2 and ADAR3, which edit germline and neuronal transcripts (including 3′UTRs), the loss of which results in chemotaxis defects and reduced lifespan. 259–261
ADAR3 is vertebrate- and brain-specific, contains both single- and double-stranded RNA-binding domains, and is thought to be catalytically inactive, m although its deaminase domain appears normal, possibly because it does not dimerize and may act as an inhibitor of ADAR1 and ADAR2. 262–264 Loss of ADAR3 in mice causes no obvious developmental deficiencies, but does affect learning and memory. 265
Interestingly, cephalopods such as squid, cuttlefish and octopus, highly intelligent invertebrates, use A>I RNA editing far more extensively than mammals to modulate the sequence of mRNAs specifying proteins involved in nerve-cell development and signal transmission. 251 , 266–269
Editing of non-coding sequences, on the other hand, has expanded enormously in primates, especially humans. 199 While for many years it was thought that RNA editing is primarily directed at altering protein sequences, analysis of large-scale cDNA sequencing datasets revealed that it occurs in thousands of transcripts, largely in non-coding sequences, 270–274 altering RNA structure, 275 and presumably regulatory circuits and networks, thereby influencing RNA-directed epigenetic memory. These analyses also showed that there is a massive expansion (a 35-fold increase) of A>I editing of human RNAs compared to mouse, mainly in the brain and mostly in Alu sequences. 270–274
Alu elements invaded the genome in three waves during primate evolution and occupy 10.5% of the human genome (~1.2 million largely sequence-unique copies). 277 , 278 The editing of Alu sequences is higher in human transcripts compared to nonhuman primates, and new editable human-specific Alu insertions, subsequent to the human-chimpanzee split, are enriched in genes related to neuronal functions and neurological diseases 276 (Figure 17.3). Virtually all adenosines within double-stranded regions of Alu transcripts undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%), and it has been estimated that there are over 100 million Alu RNA editing sites distributed across most human genes. 279
Alu elements (and the related B2 SINE elements in mice) have been linked to a wide variety of functions, including new exons, splice junctions, promoters, nuclear localization, differentiation signals and stress responses, both within longer transcripts and as separate small RNAs 280–288 (Chapters 10, 13 and 16). They also have self-cleaving property 289 and their increased processing has been linked to neurological disorders. 288 , 290 However, their intense editing suggests a wider role as modular cassettes that permit the superimposition of plasticity on an otherwise hard-wired RNA regulatory system, which has been positively selected for physiological adaptation and cognitive advance. Alu elements are derived from a dimeric fusion of 7SL RNA, which is part of the signal recognition particle involved in targeting proteins for export (Chapter 8), and the conservation of its core structure 277 , 291 , 292 may provide a clue as to its function and success.
C>U Editing
C>U editing is performed by a family of proteins called APOBECs, 293–296 so named because the first to be discovered altered the sequence of Apolipoprotein B (‘ApoB editing complex 1′), n a key constituent of circulating lipid transport vesicles (produced in the liver), which introduces a stop codon to produce a shorter version in the intestine. 297–299
APOBECs arose at the beginning of the vertebrate radiation, although there is evidence of precursors in invertebrates, 300 and can edit both RNA and single-stranded DNA (C>T), which muddies the functional waters. APOBECs can also edit meC (but not the TET-oxidized bases 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxylcytosine) to form T, 301 an editing event that cannot be distinguished in bisulfate-based methylation assays, and there may be cross-talk between methylation and editing systems in vivo. 302
There are five families of APOBECs: AID and APOBEC2 occur in all vertebrates, APOBEC4 and APOBEC5 in tetrapods, APOBEC1 o in the amniotes (birds and mammals) and reptiles, and APOBEC3 (some members of which have two deaminase domains) in placental mammals. 293–296 , 304 , 305 Expansions of the latter in different lineages correlates with the extent of germline colonization by retroviruses, most notably in primates, which have seven APOBEC3 paralogs (A, B, C, D/E, F, G, H) 295 , 296 , 306 (Figure 17.4), which exhibit some of the strongest signatures of positive selection in the human genome. 307 , 308 Other mammals show intermediate extents of APOBEC3 duplication. 306 , 309 , 310 The APOBECs show tissue-specific expression, in the immune system, muscle, liver and brain, 296 , 311 with APOBEC3G expression primarily in neurons 312 and APOBEC4 in testis. 305
The ancestral gene, AID, arose in the jawless fishes as a central feature of adaptive immunity, and is involved in the somatic rearrangement and hypermutation of immunoglobulin domains in B cells and T cells, processes that are also heavily regulated by lncRNAs (Chapters 13 and 16).
The other APOBECs are generally thought to provide a defense against exogenous retroviruses and the mobilization of retrotransposons, such as endogenous retroviruses and SINE and LINE retroelements, 306 , 313–315 although why the APOBEC3 family expanded under strong selection in primates is unclear. Perhaps the clue is the widespread cooption of transposable elements in neuronal differentiation and function. 316
Recent studies have shown that L1 retroelements are mobilized in neurons in culture to induce somatic mosaicism, a process that is controlled in part by methyl-CpG-binding protein 2 (MeCP2) (Chapter 14). 317–319 L1 elements are also differentially and dynamically methylated and histone modified during stem cell reprogramming and neurodifferentiation. 320 , 321 L1 and Alu retroelements are mobilized in the human brain, 322 which suggests the possibility that the APOBECs, especially the APOBEC3 family, like KRAB zinc finger proteins, 323 might have evolved to domesticate retroelements, and manage their activity p in response to environmental cues, not simply suppress them.
Moreover, DNA demethylation in human neural progenitor cells leads to transcriptional activation and chromatin remodeling of hominoid-specific L1 elements, q while older L1s and other classes of transposable elements remain silent; these activated L1s act as alternative promoters for many protein-coding genes involved in neuronal functions, “revealing a hominoid-specific L1-based transcriptional network controlled by DNA methylation that influences neuronal protein-coding genes”. 329
The Brain
Despite a century since Santiago Ramón y Cajal’s meticulous depictions of the architecture and complexity of the central nervous system, 330 and many subsequent developments in neurophysiology, we are still nowhere near understanding the molecular basis of high-level brain function. While the general architecture of the brain is hard-wired, 331 its fine connections are selected and evolve in response to experience, as proposed by Gerry Edelman. 332 , 333
The brain is plastic 334 and has most complex molecular transactions. Except for the testis (itself another world), the coding and non-coding transcriptome is most varied and complex in the brain, as is the extent of RNA splicing, trafficking, modification and editing. The cell-type specificity of lncRNA expression is also most pronounced in brain (Chapter 13) and the genomic sequence variants affecting neuropsychiatric functions, neurodegenerative diseases and some neurodevelopmental disorders 335 primarily lie in non-coding regions (Chapter 11). Many neurodegenerative diseases appear to be linked to dysregulation of lnc/enhancer RNAs and/or be a consequence of aberrant RNA-protein interactions and formation of inclusion bodies associated with expansions of simple repeat sequences. 336–339 As with non-coding genomic regions that show accelerated evolution or are specific to primates or humans 340 , 341 (Chapters 10 and 13), single cell studies confirm the highly regulated expression of non-coding RNAs in specific brain areas relevant to human evolution and neurological diseases. 342–345
There is considerable evidence for the involvement of regulatory RNAs in brain evolution, development and function “by virtue of their abundant sequence innovation in mammals and plausible mechanistic connections to the adaptive processes that occurred recently in the primate and human lineages”, 346 , 347 including the widespread use of lncRNAs to nucleate specialized domains in the neuronal nucleus; 348 the expression of primate-restricted KRAB zinc finger proteins in specific regions of the developing and adult brain, which target cognate TE regulatory elements to control neuronal differentiation; 323 , 326 , 349 , 350 the widespread use of 3′UTRs as regulatory RNAs 351 , 352 to, e.g., maintain axonal integrity; 353 and the massive expansion of RNA editing during vertebrate evolution, especially in primates.
Small RNAs, lncRNAs, TEs and networks thereof are involved in neuronal differentiation, synaptic plasticity, r long-term potentiation, learning and behavior, 346 , 354–366 examples being the role of Malat1 in synapse formation; 367 the response of Gomafu to neuronal activity and its modulation of schizophrenia-associated alternative splicing and response to methamphetamine; 368 , 369 the response of Neat1 to neuronal activity and its role in mediating histone methylation, age-related memory impairment and behavioral responses to stress; 370–372 the role of the lncRNA Gas5 in cocaine action and addiction; 373 the role of the TE-derived neuronal lncRNA BC1 s in anxiety, exploratory behavior 377 and memory; 378 the regulation of impulsive and aggressive behaviors by lncRNA MAALIN; 379 the downregulation of the primate-specific lncRNA LINC00473 in the prefrontal cortex of depressed females but not males, accompanied by female-specific changes in synaptic function; 380 blockage by the loss of function of lncRNA Meg3 of the glycine-induced increase of the GluA1 subunit of AMPA receptors on the plasma membrane, a major hallmark of LTP; 381 the role of lncRNAs Tsx in hippocampal short-term memory formation 382 and LoNA in long-term memory formation; 383 the regulation of social hierarchy in mice by lncRNA AtLAS; 384 the regulation of locust aggregation by lncRNA PAHAL; 385 and lncRNA dynamics in the behavioral transition from nurses to foragers in Drosophila. 386 Moreover, epigenetic processes (which are likely RNA-directed, Chapter 16) are required for synaptic processes, cognition, learning and memory – if epigenetic processes are disrupted, learning is also disrupted. 7 , 10 , 13 , 387–390
Consequently, it appears that the brain adapted processes and regulatory mechanisms that are relatively hard-wired in development, rendering them soft-wired to enable the formation of synaptic networks that are tuned by environmental cues and cell communication to process, store and recall information.
There are several other intriguing aspects of brain molecular biology.
First, most if not all components of the innate and adaptive immune systems have paralogs and orthologs expressed in the brain, most of which also occur in invertebrates (before the appearance of the adaptive immune system in vertebrates), suggesting that the adaptive immune system is a specialized offshoot of cell recognition pathways that first evolved for communication in the nervous system. t
For example, the immunoglobulin (Ig) fold is present in most neuronal adhesion and receptor molecules, including the N-CAMs, myelin-associated glycoproteins, nectins, telencephalin, contactin and neuroglian, as well as in other proteins that are found in the immune system but also occur in brain, including the Toll-like receptors, CD4, Thy-1, the major histocompatibility complex and the complement family. 395–401
Toll-like receptors, thought mainly to activate innate immune responses, also regulate development, neural morphogenesis and neural connectivity, 402 , 403 in part by recognizing neurotrophins to control neuronal survival and death and acting as adhesion molecules to instruct axon and dendrite targeting and synaptic partner matching. 404
Cytokines, some of which are expressed in invertebrate glial cells, 405 are also present in the brain, where they affect neurotransmitter production, leading to changes in motor activity, anxiety, arousal and alarm. They also regulate sleep and a variety of neuroendocrine functions, as well as neuronal development. 406 , 407
The RAG1 and RAG2 proteins that have their origins as transposases and mediate VDJ recombination in B-cell and T-cell receptors are also expressed in the nervous system 408–411 (Figure 17.5). V(D)J recombination, like programmed genomic rearrangements and DNA repair in other organisms, is RNA directed. 412–414 Lack of RAG-1 impairs memory formation 415 and lack of RAG-2 u impairs retinal development, axonal growth and navigation. 411 Somatic gene recombination also occurs in human neurons v with similarities to V(D)J recombination but with a different mechanism that involves an RNA intermediate and reverse transcription. 418
Second, while still largely a black box, neurons regulate activity differentially at thousands of synapses, which involves transport along microtubules in both neurons and associated oligodendrocytes w of RNA granules 421 , 422 containing mRNAs and non-coding RNAs (including miRNAs, antisense pseudogene transcripts, and Alu- and other TE-containing RNAs) via various RNA-binding proteins and motor proteins. 354 , 364 , 423–427 There is synapse-specific local translation and, in all likelihood, context-dependent processing, editing and modifications of RNAs in response to activity. 354 , 428–430 A number of lncRNAs have been shown to be regulated by neuronal activity and to regulate synaptic protein localization and translation, as well as synapse density, morphology, dendritic tree complexity, activity, plasticity and stability. 376 , 378 , 384 , 431–434 The neuronally expressed gene Arc, which is essential for synaptic plasticity and memory formation, 435–437 encodes a repurposed retrotransposon-derived protein that mediates intercellular RNA transfer. 438
Synaptic protein synthesis associated with memory formation is also regulated by the RNA interference pathway, 439 including by Mili-bound (26–28nt) piRNAs, thought to be mainly involved in repressing TEs in the formation of germ cells but also highly expressed in the brain. 356 , 440–442 The piRNA pathway is also required for adult neurogenesis in mice 358 and is involved in the regulation of transposon mobilization in Drosophila brain. 443 The loss of Mili results in hypomethylation of LINE1 promoters and behavioral deficits such as hyperactivity and reduced anxiety. 444
Third, neuronal and protein transport between the cell body and synapses is bidirectional, 445 which has been interpreted as a “sushi-train” mechanism to patrol synapses, 446 , 447 but retrograde transport back to the nucleus may also return information, as a mechanism for consolidating long-term memory 200 (see below). Loss of the RNA-binding and transport proteins Staufen and Pumilio leads to the inability to form long-term memory. 448 Staufen also binds Alu sequences. 449
Fourth, there is widespread transcription at neuronal activity-regulated enhancers. 450 Neuronal enhancers are hotspots for DNA single-strand break repair 451 and such hotspots occur at sites involved in neuronal identity, synapse function and neural cell adhesion, which are enriched in RNA-binding proteins. 452 , 453 Brain activity and fear conditioning causes DNA double-strand breaks in neurons 454 , 455 and activity-induced DNA breaks govern the expression of neuronal genes. 456 An lncRNA is required for DNA damage response in neurons, the loss of which causes Purkinje cell degeneration and impairs motor function. 457 The DNA repair-associated protein gadd45γ is required for the consolidation of associative fear memory. 458 DNA repair is focused on transcribed genes and declines with age – possibly associated with reduced learning activity – with deficiencies in DNA repair linked to both developmental and age-associated neurodegenerative diseases. 459
Moreover, there are many unusual ‘DNA repair’ enzymes and DNA polymerases in the brain, x some with reverse transcriptase activity. 200 While these phenomena are usually interpreted in terms of protecting the genomic integrity of post-mitotic neurons, an alternative (and not mutually exclusive) possibility is that RNA-directed changes to DNA (“re-writing to disc”) is involved in long-term memory formation. 200 The human DNA polymerase Polθ, which occurs in the brain, was recently shown to reverse transcribe RNA and promote RNA-templated DNA repair. 461 Polθ also promotes repeat expansions in Huntington’s Disease and other neurodegenerative disorders. 462
RNA-Directed Transgenerational Epigenetic Inheritance
The foundational work on RNA interference in C. elegans showed that small RNA-mediated gene silencing can be inherited for many generations (Chapter 12), 14 , 463–465 most stably when provoked by maternal piRNAs. 466 piRNAs are required to initiate inheritance but not to maintain it, although maintenance is dependent on the nuclear RNAi pathway. 466–469 Similar transgenerational inheritance is observed plants where the inheritance is confined to dsRNAs targeting methylation of gene promoters rather than the transcribed sequence. 470 The existence of many imprinted alleles shows that epigenetic information can also be transmitted through meiosis in mammals to control gene expression in the next generation.
Although their evolutionary significance has not been widely discussed, the observations in C. elegans refute the long-standing assumption that the soma cannot communicate with the germline, since the inherited dsRNA triggers can be delivered by injection into somatic cells or ingestion of engineered bacteria producing dsRNAs, opening a new frontier for understanding the nature of both hard- and soft-wired inheritance.
Unusual non-Mendelian patterns of inheritance were, in fact, first observed in peas by Bateson in 1915, 471 and occasionally thereafter in other plants, but were not studied in a systematic way until Robert Brink, who coined the term ‘paramutation’ in the 1950s to describe the atypical inheritance of traits displayed by particular alleles in maize (where it is best studied), 472–474 tomato and other species 475–477 (Chapter 5).
Paramutation is RNA-directed transgenerational inheritance. 478 It may be summarized as the transfer of epigenetic information from one allele of a gene to another to induce metastable silencing, which is heritable for generations but incompletely penetrant and reversible. 474 Importantly, paramutation is a somatic phenomenon, which is transmitted to the germline. 474 Mechanistically, paramutation is transacted by small RNAs that direct RNA-processing and/or chromatin-modifying proteins to RNA transcripts or DNA in a sequence-specific manner, with self-reinforcing feedback loops that involve differential methylation and the biogenesis of particular classes of sRNAs, including piRNAs, mediated by Argonaute proteins. 477 , 479 , 480 Moreover, mutations in Mop1 (‘Mediator of paramutation1′), a component of the RNA-directed DNA methylation pathway responsible for methylation of TEs adjacent to transcriptionally active genes, alter the distribution and frequency of meiotic recombination in maize, 481 indicating a link between transposons and plasticity of inheritance.
While initially thought to be confined to plants, paramutation also occurs in animals 477 , 482 , 483 (Figure 17.6). In C. elegans, where animal RNA interference was discovered, the phenotypic consequences of ectopically introduced small dsRNAs can persist for many generations, as with plant paramutation, without any change to the underlying sequence. 466 , 484 Paramutation has also been described in Drosophila and mouse, the latter affecting a wide spectrum of characteristics such as pigmentation, cardiac hypertrophy, embryo development and axonal growth (the latter via a lncRNA), again mediated through the RNA interference pathway and small interfering RNAs. 477 , 482 , 483 , 485–487 The inheritance of paramutations, at least in mouse, requires the RNA methyltransferase Dnmt2. 488
Paramutation is associated with, and its strength is often dependent upon the length of, simple (usually dinucleotide or trinucleotide) tandem sequence repeats (STRs) y in the locus, 477 an interesting observation in view of the vast number of STR sequences that occur in animal and plant genomes.
The human genome contains over one million highly polymorphic and plastic STRs, whose mitotic (somatic) and meiotic (germline) expansion/contraction rates can be orders of magnitude higher than single nucleotide mutations, and impart a continuum of effects. 490–493 Variation in STRs is associated with neurological and psychiatric disorders z such as autism, and cancer, as well as the modulation of physiological and neurological traits, including circadian rhythms, sociosexual interactions, intelligence, hormone sensitivity, cognition, personality, addiction, neuronal differentiation, brain development and behavioral evolution. 492–499 An STR has also been shown to regulate the transcription of the hTERT (human telomerase reverse transcriptase) gene in a cell-context–dependent manner, the absence of which results in telomere shortening, cellular senescence and impaired tumor growth. 500
STRs are enriched in promoters and enhancers, associated with differential DNA methylation, and account for 10%–15% of the genetic variation observed in complex traits, making them substantial contributors to the missing heritability in genome-wide association studies that only poll haplotype blocks. 493 , 496 , 501–503 This is exemplified by the paramutation-like behavior imposed by untransmitted paternal alleles of the human insulin gene that have expanded numbers of repeats associated with a reduced incidence of type 1 diabetes. 504 Interestingly, a positively selected brain-specific lncRNA contains a tandem repeat that varies among individuals and affects its stability. 505 A large and highly polymorphic human-specific 30bp tandem repeat located within an intron of a gene encoding a calcium channel has been shown to act as an enhancer and to be associated with bipolar disorder and schizophrenia. 495
The mutation rate in STRs is controlled, in part, by epigenetic processes, and the length of STRs, including in humans, can be modulated by environmental parameters, notably (as the main one studied) stress. 476 , 477 , 506 , 507 As summarized by Jay Hollick: “The ability of heritable epigenetic regulatory information on one homologue to be copied to the other represents a potential adaptation of the diploid condition to rapidly disseminate a memory of environmental responses to future generations”. 477
The extent of STR modulation of gene expression may be underestimated and underappreciated. It is usually only reported in model organisms where structured pedigrees and genotypes can be constructed and monitored. A genome-wide analysis in tomato identified “thousands of candidate regions for paramutation-like behaviour... the methylation patterns for a subset of (which) segregate with non-Mendelian ratios, consistent with secondary paramutation-like interactions to variable extents depending on the locus”. 508 It would be surprising if this was not a general phenomenon, with enormous implications for understanding plant and animal biology and evolvability (Chapter 18).
While most studies of paramutation have involved focused analyses of physical traits, there have been a number of reports that transgenerational epigenetic inheritance (in animals as diverse as worms, planarians and mammals) extends to metabolic, endocrine, immunological and cognitive experience, including trauma and stress, as well as learned behaviors. 13 , 509–518 There is both male and female transmission. 14 , 519–522 Various studies have implicated lncRNAs, miRNAs, CTCF recruitment to an Fto aa enhancer, peroxisome proliferator-activated receptor (PPAR) pathways, cysteine synthases and tRNA fragments, and modifications thereof, as being involved. 478 , 516 , 519 , 525–534
One of these RNAs, the vault RNA VTRNA2-1, was identified in genome-wide screens as being “as a top environmentally responsive epiallele”, 535 which has been associated with effects on oocytes of preconceptual alcohol consumption 522 and elsewhere with cancer etiology and outcomes. 536 , 537 The progeny of mice that survived a sublethal systemic infection with Candida albicans or an endotoxin dose exhibited cellular, developmental, transcriptional and epigenetic changes in the myeloid progenitor cell compartment, with enhanced responsiveness to endotoxin challenge and improved protection against systemic heterologous bacterial infections. 538 The sperm DNA of parental male mice infected with C. albicans showed DNA methylation differences linked to immune gene loci. 538 piRNAs derived from ancient processed viral pseudogenes transmit transgenerational sequence-specific immune memory in rodents and primates. 539 There are also innate fears and phobias that clearly have appreciable genetic components. 540 , 541
It has been shown that specific tRNA fragments (Chapter 12) in mice are transferred from the epididymis to maturing sperm through small vesicles called epididymosomes. These tRNA fragments are then conveyed by the sperm to the fertilized egg where they influence the expression of specific genes associated with the retroelement MERV during embryonic development. In mouse, changes to the paternal diet modulate the tRNA fragment composition of epididymosomes, with a consequent alteration of specific metabolic pathways in the offspring. 526 , 527
The mechanisms by which experience is conveyed intergenerationally are ill-defined, and the field will remain controversial until this is understood. Not only is it difficult to separate genetic and epigenetic effects from cultural and environmental influences, 479 , 542–545 it is also difficult to conceive a mechanism to transmit complex traits, even considering the possibility of RNA signaling between the soma and the germline. Relevant perhaps is that, like the brain, the testis is an immunologically privileged tissue, with a tight barrier between blood vessels and the Sertoli cells in the seminiferous tubule, which isolates the later stages of sperm development. 546 , 547
Whatever the details, it is nonetheless already clear that there are two forms of inheritance – gene alleles and epialleles – hard-wired DNA sequence information and RNA-directed epigenetic information that is responsive to and influenced by environmental factors, directly contradicting the fixed inheritance of variation assumed by the Modern Synthesis.
Further Reading
- Conine C.C. and Rando O.J. (2021) Soma-to-germline RNA communication. Nature Reviews Genetics 23: 73–88. [PubMed: 34545247]
- Hannan A.J. (2018) Tandem repeats mediating genetic plasticity in health and disease. Nature Reviews Genetics 19: 286–298. [PubMed: 29398703]
- Hollick J.B. (2017) Paramutation and related phenomena in diverse species. Nature Reviews Genetics 18: 5–23 [PubMed: 27748375]
- Kim S. and Kaang B.-K. (2017) Epigenetic regulation and chromatin remodeling in learning and memory. Experimental & Molecular Medicine 49: e281. [PMC free article: PMC5291841] [PubMed: 28082740]
- Levenson J.M. and Sweatt J.D. (2005) Epigenetic mechanisms in memory formation. Nature Reviews Neuroscience 6: 108–118. [PubMed: 15654323]
- Nishikura K. (2015) A-to-I editing of coding and non-coding RNAs by ADARs. Nature Reviews Molecular Cell Biology 17: 83–96. [PMC free article: PMC4824625] [PubMed: 26648264]
Footnotes
- a
In the case of m5C, RNA bisulfite sequencing. 27
- b
Loss of the Nsun2 ortholog in Drosophila causes short-term memory deficits.
- c
- d
Replacement of uridine with m1Ψ in synthetic mRNA improves its immunogenicity, translational capacity and stability. 151
- e
The term RNA editing was coined by Rob Benne and colleagues in 1986 to describe the small RNA-guided site-specific insertion or deletion of uridines in mRNAs in mitochondria of Trypanosomes and related protists. 164–166
- f
- g
There are three dsRNA-binding domains in ADAR1, and two each in ADAR2 and ADAR3. 215
- h
There are also other DNA structures that differ from the canonical B-form right-handed double helix described by Watson and Crick, such as G quadruplexes and A-form helices, which also occur in double-stranded RNAs and in DNA-RNA hybrids. These were discovered in the decade from 1979 to 1989, 221 and presumably have functional significance, but their distribution in genomes, and how they might vary during differentiation and development, is as yet only poorly characterized, a blind spot in the ENCODE projects.
- i
Z-DNA is commonly found in introns, and RNA editing commonly involves dsRNAs formed between exons and introns in pre-mRNAs, adding to the selective pressure on their sequences. 222
- j
The longer isoform may be constitutively expressed and even be the dominant isoform in some tissues. 224
- k
GluA2 receptors are under-edited in malignant brain tumors. 243
- l
Evidence from knockout of the Drosophila ortholog of ADAR2 also suggests that “edited isoforms of CNS proteins are required for optimum synaptic response capabilities in the brain during the behaviorally complex adult life stage”. 250
- m
- n
- o
APOBEC1 deficiency has transgenerational epigenetic effects on testicular germ cell tumor susceptibility and embryonic viability. 303
- p
While avoiding neurodevelopmental or neuroinflammatory disorders due to inappropriate L1 expression of neurotoxic retroviral sequences and proteins. 324–327
- q
L1 expression is differentially regulated in pluripotent stem cells of humans and other great apes. 328
- r
Which “may drive species-specific changes in cognition”. 354
- s
- t
- u
RAG2 activity is regulated by a PHD domain that binds histone H3K4me3 modifications, indicating an interplay between epigenetic information and DNA recombination. 416
- v
Widespread mosaic somatic gene recombination in neurons was first detected in the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein. 417
- w
- x
- y
STRs are also referred to as ‘variable number tandem repeats’ (VNTRs) or ‘microsatellites’, variation in which was exploited for DNA fingerprinting by Alex Jeffries in the late 1980s. 489
- z
Reasonably proposed to be ‘the pathological ends of phenotypic bell curves in which healthy individuals occupy the middle territory’. 493
- aa
Fto encodes an N6-methyladenosine demethylase that is associated with obesity 523 and is required for adipogenesis. 39 It also regulates neurogenesis, neural circuitry, memory formation and locomotor responses. 94 , 98 , 105 , 524 Recruitment of CTCF to an Fto enhancer is involved in transgenerational inheritance of obesity. 516
- Plasticity - RNA, the Epicenter of Genetic InformationPlasticity - RNA, the Epicenter of Genetic Information
Your browsing activity is empty.
Activity recording is turned off.
See more...