Yi-Kuo Yu
Senior Investigator
National Center for Biotechnology Information (NCBI)
National Library of Medicine (NLM)
National Institutes of Health (NIH)
Bldg. 38A, Room 6S610
9000 Rockville Pike, MSC 3829
Bethesda, MD 20894, USA
Tel: (301) 435-5989
Fax: (301) 480-2290, (301) 480-2288
e-mail:
yyu <at> ncbi.nlm.nih.gov
Principal Research Interests
Our group investigates various biological problems at multiple levels of
detail in order to gain quantitative understanding in biology. At the
microscopic level, we aim to build a solid foundation for quantitative
understanding of biomolecular interactions. At the more coarse-grained
level, we develop/employ computational approaches with sound statistical
foundations to enhance the separation of information from noise in massive
biological data sets, thereby paving the way for the discovery of and
putting constraints on higher organizational principles in biology. A major
goal of our group is to foster a solid connection between medical research
and fundamental scientific research.
Molecular Interactions (MI):
Our studies at the microscopic level have concentrated on the most
important component in biomolecular interactions, i.e., electrostatics.
These studies aim to provide an accurate description of electrostatic
interactions among biomolecules. This effort has resulted in a new
electrostatics formulation involving complicated dielectric media. This
new formulation permits, for the first time, a controllable approximation
for the calculation of electrostatic energy and forces
1-3. Consequently,
one can easily estimate the magnitudes of errors for the quantities
computed and one may improve the accuracy as much as one wishes by
incorporating more prescribed correction terms in the computation. We are
also investigating the quantum mechanical effect that governs molecular
bindings and interactions.
Molecular/Information Networks (MN):
The advent of the genomic era has enabled rapid accumulation of
information including DNA/protein sequence data, protein/RNA structural
data, and biomolecular interaction data. These valuable, and often
redundant, data allow researchers to mine relevant information at various
organizational levels ranging from determining active sites in protein
domains to uncovering relations among functional pathways and even whole
cell organization. However, different combinations of these data can
also be the common basis of two conflicting claims. To avoid errors
introduced through additional annotations, we have developed a method,
called information flow, to detect the information transduction modules
responsible for propagating information from one node in the network to
another. When applied to the protein-protein interaction network, this
method illuminates nodes involved in the relevant biological pathways
connecting the two specified nodes
4. This framework is also
applicable to information filtering in any community network such as
recommendation systems
5-7. We are currently constructing other
means to meaningfully extract important information from a generic
interaction network.
Mass Spectrometry, Statistics, and Proteomics (MS):
At a more macroscopic level, we are interested in several topics where
robust statistical analyses have been proven valuable. In the realm of
sequence alignment, we have worked on improving the statistical accuracy
and the retrieval efficiency by various developments
8-16. In terms of
bioinformatics and proteomics studies, to extract biologically relevant
information we have substantially invested our effort in developing useful
tools with robust statistical foundation. For example, we have developed
computational tools for peptide/protein identification from tandem mass
spectrometry (MS/MS) data
17-20 and methods to improve statistical
significance assignment [cite] in this area. We have also integrated
existing knowledge such as protein modifications and their accompanying
disease associations with our peptide searches. Our goal in this
general direction is to enhance the separation of information from noise
in massive biological data sets, thereby putting constraints on higher
organizational principles in biology yet to be discovered.
1. Yu YK (2003) On a class of integrals of Legendre polynomials with complicated arguments--with applications in electrostatics and biomolecular modeling.
Physica A,
326: 326.
PMID: 15759366
2. Doerr TP, Yu YK (2004) Electrostatics in the presence of dielectrics: The benefits of treating the induced surface charge density directly.
American Journal of Physics,
72: 190-6.
DOI: 10.1119/1.1624115
3. Doerr TP, Yu YK (2006) Electrostatics of charged dielectric spheres with application to biological systems.
Phys Rev E,
73: 061902.
DOI: 10.1103/PhysRevE.73.061902
4. Stojmirović A, Yu YK (2007) Information flow in interaction networks.
J Comput Biol,
14: 14.
PMID: 17985991
5. Zhang YC, Blattner M, Yu YK (2007) Heat conduction process on community networks as a recommendation model.
Phys Rev Lett,
99: 99.
PMID: 17995171
6. Yu YK, Zhang YC, Laureti P, Moret L (2006) Decoding information from noisy, redundant, and intentionally distorted sources.
Physica A,
371: 732-44.
DOI: 10.1016/j.physa.2006.04.057
7. Laureti P, Moret L, Zhang YC, Yu YK (2006) Information filtering via Iterative Refinement.
Europhys Lett,
75: 1006-12.
DOI: 10.1209/epl/i2006-10204-8
8. Yu YK, Hwa T (2001) Statistical significance of probabilistic sequence alignment and related local hidden Markov models.
J Comput Biol,
8: 8.
PMID: 11535176
9. Yu YK, Bundschuh R, Hwa T (2002) Hybrid alignment: high-performance with universal statistics.
Bioinformatics,
18: 18.
PMID: 12075022
10. Yu YK, Bundschuh R, Hwa T (2002) Statistical significance and extremal ensemble of gapped local hybrid alignment.
Lecture Notes in Physics, Springer Berlin/Heidelberg,
585: 3-21.
DOI: 10.1007/3-540-45692-9_1
11. Kschischo M, Lässig M, Yu YK (2005) Toward an accurate statistics of gapped alignments.
Bull Math Biol,
67: 67.
PMID: 15691544
12. Yu YK, Altschul SF (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.
Bioinformatics,
21: 21.
PMID: 15509610
13. Sardiu ME, Alves G, Yu YK (2005) Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem.
Phys Rev E Stat Nonlin Soft Matter Phys,
72: 72.
PMID: 16485984
14. Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK (2005) Protein database searches using compositionally adjusted substitution matrices.
FEBS J,
272: 272.
PMID: 16218944
15. Yu YK, Gertz EM, Agarwala R, Schäffer AA, Altschul SF (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.
Nucleic Acids Res,
34: 34.
PMID: 17068079
16. Gertz EM, Yu YK, Agarwala R, Schäffer AA, Altschul SF (2006) Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST.
BMC Biol,
4: 4.
PMID: 17156431
17. Alves G, Ogurtsov AY, Wu WW, Wang G, Shen RF, Yu YK (2007) Calibrating E-values for MS2 database search methods.
Biol Direct,
2: 2.
PMID: 17983478
18. Alves G, Ogurtsov AY, Yu YK (2007) RAId_DbS: peptide identification using database searches with realistic statistics.
Biol Direct,
2: 2.
PMID: 17961253
19. Doerr TP, Alves G, Yu YK (2005) Ranked solutions to a class of combinatorial optimizations - with applications in mass spectrometry based peptide sequencing and a variant of directed paths in random media.
Physica A,
354: 558-70.
DOI: 10.1209/epl/i2006-10204-8
20. Alves G, Yu YK (2005) Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics.
Bioinformatics,
21: 21.
PMID: 16105903