John L. Spouge, PhD
Senior Investigator
Statistical Computational Biology
Computational Biology Branch
NCBI, NLM, NIH
Projects
Human Viruses
PA 1. Bioinformatics Studies of Viruses
NIH Identification Code: LM091213-10
Status: Active
Keywords: HIV, virology, sequence statistics
Participants
- John L Spouge MD, PhD
CBB, National Center for Biotechnology Information - Antony Devico PhD
Maryland Biotechnical Center, Institute of Human Virology, University of Maryland - George Lewis PhD
Maryland Biotechnical Center, Institute of Human Virology, University of Maryland - Antony Devico PhD
Maryland Biotechnical Center, Institute of Human Virology, University of Maryland - Scott P. Layne MD
University of California, Los Angeles
Aim
Our overall aim is to apply mathematical, statistical, computational, and bioinformatics methods to solve virological problems, particularly those posed by our collaborators. Our virologic interests center on HIV, and more recently, SARS-CoV-2.
In HIV, early viral reproduction influences whether infection in a single host occurs. We recognized the possibility of reconstructing the early history of a viral population from HIV sequences sampled during viremia. When evaluating viral therapies, if the reconstruction were successful, a slowing of viral growth in the early history would suggest progress, even if infection still occurred.
Our present interest in SARS-CoV-2 is to calculate the parameters controlling the early exponential growth of the epidemic. Typically, after societal strategies like social distancing have relaxed back to pre-epidemic status, the early epidemic's parameters also control end results like total infections during the epidemic. Unfortunately, variants of SARS-CoV-2 appear to have violated this typicality.
Summary
We developed a statistical method summarizing the mutational patterns in HIV sequences from early infection. We showed that the summary agreed with simulation results for an HIV population during early infection in a single host. We have also showed that to useful accuracies in feasible experimental designs, we can reconstruct features of the early population growth of an HIV infection in a single host.
With an NCI collaborator, Dr Ziegelbauer, we previously reported on a statistical method that we developed for inferring whether herpesvirus miRNA motifs were overrepresented on human circular RNAs. We have refined and generalized the computer algorithm for the method to other biological situations where one wants to infer that one of the terms of a sum is "too large", e.g., whether a human circular RNA has "too many" instances of a herpesvirus miRNA motif relative to other circular RNAs, whether a particular column of a domain alignments from cancer patients contains "too many" mutations, etc.
We have automated the calculation of the Malthusian parameter controlling the initial exponential growths of the COVID-19 epidemic in many countries. A basic formula from Wallinga & Lipsitch relates the basic reproduction number R0 of the COVID-19 to the Malthusian parameter.
Bibliography
A 114. J.L. Spouge "A comprehensive estimation of country-level basic reproduction numbers R0 for COVID-19: Regime regression can automatically estimate the end of the exponential phase in epidemic data" (2021) PLoS One
A 113. J.L. Spouge, J.M. Ziegelbauer, and M.W. Gonzalez "A linear-time algorithm that avoids inverses and computes jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups" (2020) Algorithms for Molecular Biology
A 110. V. Patel and J.L. Spouge "Estimating the basic reproduction number of a pathogen in a single host when only a single founder successfully infects" (2020) PLoS ONE 15(1):e0227127
A 108. A. Manzourolajdad and J.L. Spouge "Structural Prediction of RNA Switches using Conditional Base-Pair Probabilities" (2019) PLoS ONE 14(6):e0217625
A 107. J.L. Spouge "An accurate approximation for the expected site frequency spectrum in a Galton-Watson process under an infinite sites mutation model" (2019) Theoretical Population Biology 127 : 7-15
A 102. M. Mengistu, A.H. Tang, J.S. Foulkes, T.A. Blanpied, M.W. Gonzalez, J.L. Spouge, R.C. Gallo, G.K. Lewis, and A.L. DeVico "Patterns of conserved gp120 epitope presentation on attached HIV-1 virions" (2017) PNAS
A 101. M.W. Gonzalez, A.L. DeVico, and J.L. Spouge "Conserved signatures indicate HIV-1 transmission is under strong selection and thus is not a "stochastic" process" (2017) Retrovirology 24:13
A 98. A. Manzourolajdad, M.W. Gonzalez, and J.L. Spouge "Changes in the Plasticity of HIV-1 Nef RNA during the Evolution of the North American Epidemic" (2016) PLoS One
A 94. M.W. Gonzalez, A.L. DeVico, G.K. Lewis, and J.L. Spouge "Conserved molecular signatures in HIV gp120 are associated with the preferential transmission of HIV-1, SHIV, and SIV transmitted/founder viruses" (2015) J Virology 89 : 3619-3629, PMC4403421
A 88. M.W. Gonzalez and J.L. Spouge "Domain Analysis of Symbionts and Hosts (DASH) in a genome-wide survey of pathogenic human viruses" (2013) BMC Research Notes 6 : 209, PMC3672079
Taxonomy
PA 2. Taxonomics Methods Using DNA
NIH Identification Code: LM200883-15
Status: Active
Keywords: DNA barcode, sequence statistics
Participants
- John L Spouge MD, PhD
CBB, National Center for Biotechnology Information - María Paz Martín PhD
Real Jardín Botanico de Madrid - David Erickson PhD
Joint Institute of Food Safety and Applied Nutrition, University of Maryland - International Society for Human And Animal Mycology Barcoding of Medical Fungi Working Group
Aim
Our overall aim is to improve automated taxonomic identification with DNA sequences. Taxonomic identification has many potential practical uses, e.g., identifying medical pathogens, controlling poaching of wild animals, detecting consumer fraud by species substitution, enforcing customs regulations, evaluating compliance with international treaties where goals are set in terms of rates of species disappearance, etc. DNA barcoding identifies the species of an organism by standardizing and then sequencing genetic loci of less than 1 kb. As in most fields in bioinformatics, effective and objective methods of evaluating progress are critical to progress, so first we improved figures of merit for evaluating the efficacy of taxonomic identification and then developed statistics for comparing the accuracy of different methods of taxonomic classification. The project has expanded to include alignment-free methods of taxonomic analysis.
Summary
The project started when Dr. Spouge was invited to analyze data for the 2009 Edinburgh Conference "Selecting Barcode Loci for Plants", a meeting of the Plant Working Group within the Consortium for the Barcode of Life. Later, Dr. Spouge performed a similar data analysis for the Fungal Barcode Working Group in Amsterdam in 2011.
In collaboration with Dr. Martin, he has used DNA barcodes to analyze fungal taxonomy in South-East Asia and Spain. Implementations of his analysis are publicly available in user-friendly programs, updated to permit barcode researchers to replicate Dr Spouge's analyses on their own data. Dr Martin has actively tested the web tools. The collaboration is also producing statistical methods for the objective comparison of methods of taxonomic classification, notably GenBank annotation and expert classification. Dr Martin is applying the methods and tools in the fungal Genus Ramaria.
Bibliography
A 112. M.P. Martin, P.P. Daniels, D. Erickson, and J.L. Spouge "Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera" (2020) PLoS ONE 15(1):e0227127
A 105. K. Tang, J. Ren, R. Cronn, D.L. Erickson, B.G. Milligan, M. Parker-Forney, J.L. Spouge, and F. Sun "Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA" (2018) BMC Genomics
A 89. N. Suwannasai, M.P. Martín, C. Phosri, P. Sihanonth, A.J.S. Whalley and J.L. Spouge "Fungi in Thailand: A Case Study of the Efficacy of an ITS Barcode for Automatically Identifying Species within the Annulohypoxylon and Hypoxylon Genera" (2013) PLoS One 8 : e54529, PMC3563529
A 87. CBOL Protist Working Group [Member : J.L. Spouge] "Barcoding eukaryotic richness beyond the animal, plant and fungal kingdoms" (2012) PLoS Biol 10:e1001419, PMC3491025
A 85. CBOL Fungal Working Group [Member : J.L. Spouge] "The nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi" (2012) PNAS 109 : 6241-6246, PMC3341068
A 83. J.L. Spouge and L. Mariño-Ramírez "The practical evaluation of DNA barcode efficacy" (2012) Methods Mol Biol 858 : 365-377, PMC3410705
A 80. CBOL Plant Working Group [Member : J.L. Spouge] "A DNA barcode for land plants" (2009) Proc Natl Acad Sci USA 106 : 12794-12797, PMC2722355
A 77. D.L. Erickson, J.L. Spouge, A. Resch, L.A. Weight, and J.W. Kress "DNA barcoding in land plants: developing standards to quantify and maximize success" (2008) Taxon 54 : 1304-1316