Glycomics and Glycoproteomics

Pauline Rudd; Niclas G. Karlsson; Kay-Hooi Khoo; Nicolle H. Packer

doi:10.1101/glycobiology.3e.051

A new version of this title is available

See the updated version of this chapter

Chapter 51Glycomics and Glycoproteomics

Rudd P, Karlsson NG, Khoo KH, et al.

Publication Details

The term “glycomics” currently describes studies designed to define the complete repertoire of glycans that a cell or tissue produces under specified conditions of time, location, and environment. “Glycoproteomics” describes this glycome as it appears on the cellular proteome. Glycoproteomics determines which sites on each glycoprotein of a cell are glycosylated and ideally includes the identification and quantitation of each glycan structure at each site on the heterogeneous glycoforms in the cell. This complexity makes glycomics and glycoproteomics both exciting and daunting. Because neither the proteome nor the transcriptome can accurately predict such a moving target, the glycome and glycoproteome must be analyzed directly, and the techniques used to characterize the glycome and glycoproteome are described in this chapter. Analyses of glycolipids and free glycans are described in other chapters.

HISTORICAL PERSPECTIVE OF “OMICS” SCIENCE: GENOMICS, TRANSCRIPTOMICS, AND PROTEOMICS

The term genomics arose from the availability of complete genome sequence data as well as computational methods for their analysis. Less than 2% of genes in the human genome encode proteins. These genes are transcribed into messenger RNAs (mRNAs) that make up the “transcriptome,” of which ∼30% are protein coding. Parallel analysis of the genome and transcriptome has enabled researchers to probe global differences in gene expression—for instance, between healthy and diseased cells, between neurons and muscle cells, and between drug-sensitive and drug-resistant cancer cells. Such comparisons have revealed networks of genes whose expression is linked to a condition.

The total proteins expressed by the cell are collectively termed the “proteome.” Most eukaryotic proteins are posttranslationally modified (e.g., by phosphorylation, sulfation, oxidation, ubiquitination, acetylation, methylation, lipidation, or glycosylation). These modifications, combined with alternative splicing in eukaryotes, render the proteome considerably more complex than the transcriptome. It is estimated that approximately 120,000 proteins (including isoforms) are expressed by a human cell, but there is no estimate of the number of modified forms of each protein. The systems-level analysis of all proteins expressed by cells, tissues, or organisms is known as “proteomics.” The proteome, like the transcriptome, but unlike the DNA sequence of the genome, is fundamentally dynamic. The repertoire of proteins expressed by a cell is highly dependent on tissue type, microenvironment, and stage within the life cycle. As cells receive cues in the form of growth factors, hormones, metabolites, or cell–cell interactions, various genes are modulated and may be transcribed at levels ranging from silence to more than 10⁴ mRNA copies per cell and 10⁷ protein molecules per cell. Thus, proteomes vary during cell differentiation, activation, trafficking, and malignant transformation.

WHAT ARE “GLYCOMICS” AND “GLYCOPROTEOMICS”?

Vertebrates synthesize N- and O-glycans, glycosaminoglycans, and glycosylphosphatidylinositol (GPI) anchors covalently attached to proteins, as well as lipid-linked glycans and free glycans, such as hyaluronan (Chapters 9–17). As with the proteome, each cell type has its own distinct glycome that is governed by local cues and the metabolic state of the cell. Other organisms have distinct glycomes; those of plants (Chapter 24) and prokaryotes (Chapter 21) are distinctly different from the vertebrate and invertebrate glycomes (Chapters 25–27). The size of any particular glycome has not yet been established, but the combinatorial possibilities that can occur with numerous glycans on multiple glycosylation sites on each protein (the glycoproteome), means that determining a “complete” glycome is not straightforward.

The notion that glycans should be studied as a totality (glycomics), as well as simply one glycan or glycoprotein at a time, developed when it became apparent that glycans form patterns on cells that change during development (Chapter 41), cancer progression (Chapter 47), infection (Chapters 42 and 43), and many other diseases (Chapters 44–46). Many glycan-binding proteins, such as lectins, are oligomerized on the cell surface and interact with multivalent arrays of glycans on opposing cells (Chapters 28–38). Sometimes, multiple discrete glycans work together to engage two cells or to deliver a signal from one cell to the other. Thus, the term “glycomics” was coined to describe the many aspects of glycobiology that can be understood only with a systems-level analysis. No systems-level analysis of a biological process is complete without interrogating the glycome and the glycoproteome in addition to the genome, transcriptome, and proteome. Methods for systemic analysis of the total cellular glycolipids (Chapter 11), glycophospholipids (Chapter 12), free glycans (Chapter 3), plant glycans (Chapter 24), and prokaryotic polysaccharides (Chapter 21) are mentioned in the respective chapters.

RELATIONSHIP OF THE GLYCOME TO THE GENOME AND PROTEOME

Clues regarding the composition and complexity of the glycome are found in the genome, transcriptome, and proteome of a cell. Thus, if a gene encoding a glycosyltransferase is not expressed (absent from the transcriptome), no glycans in the cell can carry the sugar transferred by that glycosyltransferase. The combinatorial action of glycosyltransferases in many competing biosynthetic pathways renders the complete glycome impossible to predict with current tools and knowledge. As an example, the reduced expression of a single glycosyltransferase can perturb the biosynthesis of dozens of glycans. Furthermore, unlike the genome, the glycome is sensitive to exogenous nutrient levels and metabolic fluxes including salvage pathways. Thus, variations in dietary monosaccharides, such as glucose, galactose, glucosamine, fucose, mannose, and N-glycolylneuraminic acid (Chapter 15), change the composition of the glycome. The numerous factors that influence the glycome (the transcriptome, the proteome, environmental nutrients, as well as the secretory machinery, pH, and many other determinants) create a glycome that is highly diverse and dynamic. Thus, the glycome and glycoproteome of a cell can change dramatically over time. It is this enormous structural plasticity in response to cellular and environmental states that underlies the essential roles of glycans in development and disease processes.

The term “glycomics” in the context of this chapter thus describes studies to define the complete repertoire of glycans that a cell produces under specified conditions of time, location, and environment. “Glycoproteomics” describes studies to elucidate the glycome on the whole cellular proteome level, also under specified conditions of time, location, and environment. Glycoproteomics determines which sites on each glycoprotein of a cell are glycosylated and, ideally, the identity and quantity of the glycan at each site on the heterogeneous glycoforms in the cell. This complexity makes glycomics and glycoproteomics both exciting and daunting. Because neither the proteome nor the transcriptome can accurately predict such a moving target, the glycome and glycoproteome must be analyzed directly, whether on a single glycoprotein (Figure 51.1) or on a complex mixture of glycoproteins (Figure 51.2). Techniques used to characterize the glycome and glycoproteome in animal cells and tissues are summarized below.

FIGURE 51.1.

Glycomics/glycoproteomics workflow for analysis of a purified glycoprotein. Example of a glycomics and glycoproteomics analysis of an affinity purified serum glycoprotein, haptoglobin, including (left panel) a total glycomics profile of N-glycans released (more...)

FIGURE 51.2.

Glycomics-assisted glycoproteomics of a complex mixture of glycoproteins. Workflow showing analysis of the glycosylation of a complex mixture of proteins. The structures of the PNGase F released N-glycans from the total rat brain membrane proteins are (more...)

COMPARATIVE GLYCOMICS

Because the glycome is influenced by both genetic and environmental factors, the information contained therein sheds light on intraspecies and interspecies variations, including providing indicators of disease that can be used for diagnosis and for monitoring the efficacy of drugs. Comparative glycomics is therefore an exciting frontier in biology and medicine.

For example, as discussed in detail in Chapter 47, numerous changes in the glycome have been associated with malignancy and metastasis, including altered N- and O-glycosylation, up-regulation of sialylated and fucosylated glycans, and altered glycosaminoglycans (GAGs). Regardless of functional consequence, a change in the glycome that is highly correlated with malignancy (or any disease) can serve as a diagnostic biomarker. Notably, glycans altered in a disease may reflect downstream consequences of the disease on remote organs, changes in the patient's immune system, or other effects of the disease.

One major caveat is the currently unknown extent of natural variation among individual human glycomes. Because the glycome can respond, in principle, to dietary and environmental changes, how do glycome variations relate to age, gender, and acquired disease susceptibility (Chapter 46)? Studies of evolutionary biology also have much to gain from comparative glycomics. Evolution of the vertebrate immune system, for example, was accompanied by the acquisition of new glycan-binding proteins, including the Siglec (Chapter 35) and selectin (Chapter 34) family members. Likewise, the glycomes of microbes and their vertebrate hosts appear to have coevolved in some instances (Chapter 42).

TOOLS FOR CHARACTERIZING THE GLYCOME

The glycome can be determined at different levels of granularity. First, “glycomics” constructs an inventory of glycans separated from their protein or lipid scaffolds from the cell, organ, or organism of interest. This is an important starting point for any comprehensive glycome analysis. The second level of analysis defines specific glycans associated with individual proteins or lipids. Analysis of the complete repertoire of a cell's glycoproteins, including their glycan structures and sites of attachment, lies at the intersection of glycomics and proteomics (“glycoproteomics”). A third level of complexity involves determining which glycans and/or glycoconjugates are expressed in specific cells, tissues, or secretions at specific times. This level of analysis is essential if the goal is to reveal new functions in cell–cell communication or to correlate particular glycomes with specific diseases. Of course, none of these approaches recapitulates the actual complexity of glycan forest on the surface of cells or in extracellular matrices. This level of organization of the glycome is currently only amenable to imaging using various glycan-recognizing probes (GRPs; Chapter 48) and more recently by matrix-assisted laser desorption/ionization (MALDI) mass spectrometric imaging methods.

As described below, numerous techniques have been developed for interrogating the glycome at these different levels. However, no single technique can define all aspects of the glycome or glycoproteome. Thus, several approaches are typically used in parallel, allowing one to assemble a picture of the glycome both from the “bottom up” (i.e., from the individual cell glycan repertoire) and from the “top down” (i.e., from a global tissue expression analysis). A significant challenge in analyzing the glycome derives from its enormous structural diversity and the heterogeneous sites of glycosylation. Different approaches and techniques are required to characterize, for example, the structures of glycoproteins versus glycolipids, N-glycans versus O-glycans, and sulfated GAGs versus neutral glycans (Chapter 50). In contrast, a technique such as RNA-seq (RNA sequencing) can be used to quantitate all RNA transcripts at once, a much easier task.

CHARACTERIZATION OF N- AND O-GLYCANS RELEASED FROM PROTEINS

Mass spectrometry is now the primary technique for characterizing the nature of individual glycans when only small quantities are available, as is the case in most glycomic studies. In a typical experiment, a glycoprotein- or glycolipid-enriched sample is prepared from a cell lysate and analyzed by multiple rounds of mass spectrometry (Chapter 50). In the case of glycoproteins, the N-glycans can be selectively released enzymatically or chemically, separated by high-performance liquid chromatography (HPLC) methods, and sequenced by mass spectrometry with or without glycosidase treatments. Separately, the O-glycans may be released chemically and sequenced in the same manner. Glycolipids can often be directly sequenced without release from the lipid component. GAGs are more problematic because of their large size, but smaller fragments can be sequenced by mass spectrometry in conjunction with enzymatic digestion (Chapter 17). An advantage of mass spectrometric glycan profiling is that multiple different glycans can be profiled at once, increasing the throughput of the glycomic analysis. However, mass spectrometry of glycans may miss potentially important modifications such as sulfation and O-acetylation depending on techniques applied and has inherent challenges because of the isomeric nature of the constituent monosaccharide units, linkage configurations, and positions.

Initially, the techniques for glycomic analyses have been mostly directed toward protein glycosylation because of the current proteomics focus, although considerable effort is being directed toward methods that encompass all glycan classes. As a first step for characterization of mixtures of glycoproteins, a liquid chromatography (LC) and/or mass spectrometry (MS) glycomics experiment is performed after glycans are released from the protein backbones. In this way the nature of all N- and O-glycans synthesized by specific cells or tissues can be characterized.

Depending on the level of detail desired, glycomics analyses may be divided into basic classes: glycoprofiling, glycan class characterization, and full structural analysis. Each provides a particular degree of information and the relevant class depends on the particular question at hand.

Glycoprofiling (fingerprinting, patterning) is the one-dimensional separation of a complex glycan mixture by a single technique to provide a signature or fingerprint that gives a simple overview or snapshot of the protein glycome. Technologies that provide different one-dimensional windows on this world are HPLC (separation by physical parameters such as lipophilicity or charge), capillary electrophoresis (separation of labeled glycans by mass:charge ratio), and MS (separation by mass/charge).
Glycan class characterization uses technologies to separate glycan mixtures into types of glycans. Examples include MS separations of di-, mono-, and nongalactosylated IgG glycans or the weak anion-exchange (WAX) LC separations that separate glycans into neutral, mono-, di-, tri-, and tetrasialylated structure types. This approach is a convenient way to highlight defined critical features and provide relative quantitation of the different glycan classes.
Detailed (full) structural analysis requires the determination of the monosaccharide sequence and modifications, anomericity, and linkage of the glycans in a glycome. In this detailed analysis, orthogonal technologies are usually required, first to assign preliminary structures and then to confirm the assignments. For example, an anion-exchange separation into differently charged glycan classes can be complemented by hydrophilic interaction liquid chromatographic (HILIC) separation of each class. The digestion of aliquots of the pools by exoglycosidases can then be used to help determine the sequence, anomericity, and linkage of different glycans. On the other hand, MS assigns compositions that are consistent with the mass data, and structural details can be resolved by electrospray ionization (ESI) MS/MS ion fragmentation. Separate release and analysis of glycans with sialic acids that have labile modifications such as O-acetylation or polysialylation may be needed, if preparation for MS is likely to destroy them. Full structural analysis also can include absolute or relative quantitation of the assigned glycan structures such as, for example, the level of core fucose on antibodies designed to initiate antibody-dependent cellular cytotoxicity (ADCC), the levels of antigenic α-gal residues, or sialyl Lewis x epitopes that may be useful markers of inflammation and metastasis, or the specific structure of a bacterial binding protein.

Full structural analysis of glycans is challenging as different structures can have the same mass, often coelute on separation systems, and can require detailed annotation of MS/MS spectra. Usually glycans are given a preliminary structural assignment from one technology and are then confirmed by at least one orthogonal technology. Many bioinformatic tools are being developed to mitigate this bottleneck (Chapter 52).

Release of Glycans from Proteins

The starting material in a glycomics analysis can be glycoproteins embedded in SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis) gels, whole cell lysates, homogenized tissue, enriched membranes, or serum and other body fluids. For high-throughput analyses of the glycome, intact N-glycans are most often released from glycoproteins using an amidase (peptide N-glycosidase F [PNGase F]). PNGase F cleaves the linkage between the core GlcNAc and the asparagine residue of all classes of N-glycans, with the exception of specific N-glycans found in plant and insect glycoproteins that contain fucose α(1,3) linked to the GlcNAc residue attached to the protein. PNGase A, an enzyme extracted from almond emulsion, may be used to release all fucosylated N-glycans from protease-generated glycopeptides. Treatment with other enzymes that cleave between the two GlcNAc residues within the chitobiose core (endoglycosidase D [endo D], which releases all classes of N-linked sugars, endo H, which selectively cleaves oligomannose- and hybrid-type structures and various types of endo F) is also possible. Before treatment, denaturation with or without trypsin digestion can be used to relax the three-dimensional (3D) structure of the protein and improve enzyme accessibility. N-glycans can also be conveniently released from glycoproteins purified in SDS-PAGE bands. After the N-glycans have been cleaved, the protein remaining in the gel can be identified by traditional proteomics.

For O-linked glycans the release method of choice is the reductive β-elimination of O-linked glycans from serine and threonine. Drawbacks of the method include the fact that the O-linked alditols cannot be further labeled at the reducing end, and that labile modifications can be destroyed. To date, there is no single enzyme that can release O-linked glycans in general. The enzyme O-glycanase (in contrast to the PNGase enzymes) is restricted to only simple core 1 (Galβ-14GalNAcαSer/Thr) release.

Analysis of Released Glycans

Derivatization of N- and O-Linked Oligosaccharides for LC and MS

Labeling of released glycans can optimize HPLC and capillary electrophoresis (CE) detectability and separability, and may improve MS properties. Many of these labeling approaches use reductive amination of the monosaccharide at the reducing end or react the free amine at the reducing end that is left after PNGase F release. Fluorescent tags increase the sensitivity and limit of quantitation and detection. Permethylation and peracetylation, when all the mobile protons (present on hydroxides, carboxyls, and sometimes amides) in a glycan are substituted by alkyls (e.g., -methyl) or esterified (e.g., -acetyl) converts the glycans from being hydrophilic to hydrophobic, greatly improving sensitivity and linkage determination by MS-based analyses.

Both N- and O-glycans can also be analyzed as alditols in which the reducing end monosaccharide ring is converted into a reduced linear alditol by reducing agents, such as sodium borohydride. This reduction increases MS stability and removes the anomeric ambiguity of the carbohydrate, where the α and β isomers of the reducing end sugar may otherwise separate chromatographically.

Special protocols have also been developed to improve the quality of the MS fragmentation spectra of sialylated glycans, in particular to target the charged carboxyl group of sialic acids. These can be converted into esters or amides to remove the acidic proton of the carboxylic acid that destabilizes the sialic acid and promotes nondesirable in-source or post-source fragmentation in MS/MS.

LC (HPLC) and CE

Tagging of the reducing ends of the released glycans, for example, with 2-aminobenzamide (2-AB), aminobenzoic acid (2-AA), or aminopyridine (2-PA), is often used for HILIC and reversed-phase LC separations (Figure 51.1). Standard labeled dextran oligomer ladders are commonly used in LC as external standards to help define composition and size based on comparable retention times, for which an “incremental value” can be calculated for each monosaccharide present in the structure (Figure 51.1).

CE with laser-induced fluorescence (LIF) can also provide efficient, rapid, and quantitative separation of derivatized glycans. Glycans are mostly neutral structures so coupling a charged fluorescent label such as 1-aminopyrene-3,6,8-trisulfonic acid (APTS) is necessary to provide electrophoretic mobility and enables their sensitive fluorescence detection. Further details can be assigned after digestion of glycan mixtures by exoglycosidases used in arrays or singly. These specifically cleave glycosidic bonds of individual monosaccharide units from the terminal residue producing predictable shifts in the HPLC or CE profiles of the digests.

MS Profiling of N- and O-Linked Glycans

The purpose of determining the molecular masses of oligosaccharides using MALDI- or ESI-MS gives a picture of the molecular distribution of glycans and allows a comparison of glycosylation between samples (Chapter 50). The limited number of masses of the monosaccharide units (Table 51.1) makes combinatorial translation of molecular ion masses to monosaccharide composition possible. There are available search engines that can provide a suggested list of glycan compositions based on an experimentally determined mass (e.g., GlycoMod; Chapter 52). MS, however, cannot distinguish between isomeric monosaccharides, so a nomenclature for compositions has been adopted with isomeric monosaccharides described by a single term or abbreviation. For instance, all of the isomeric (6-carbon-containing) monosaccharides, such as glucose, mannose, and galactose are given the unifying name of hexose (Hex) (Table 51.1).

Table 51.1.

Families of common monosaccharides found in mammalian N- and O-linked oligosaccharides

Generic glycomic workflows using all the derivatization approaches described above have been developed for MS analysis. Both the neutral and sialylated glycans in a sample can be analyzed after permethylation and MALDI-MS, where glycan masses are detected as their singly charged alkali ion adducts in positive ion mode (e.g., as [M+Na]⁺ ions or [M+K]⁺ ions). Negative ion ESI-MS is widely adopted for intact alditols, where both neutral and sialylated are detected as [M-nH]ⁿ⁻ ions. The number of charges (n) will increase with the size of the molecule and is also dependent on the number of acidic moieties present (e.g., sialic acids, sulfates, and phosphates). Positive ion MALDI-MS without permethylation usually requires that sialic acid residues are derivatized as described above to prevent their loss in-source.

To obtain orthogonal separations, ESI-MS is often connected to HPLC. HILIC chromatography can be used for screening of reducing end derivatized glycans and provides separation based on size, with some isomeric structure resolution (Figures 51.1 and 51.2). The interaction between the glycans and the HILIC column is based on hydrophilic hydrogen bonding. Using an alternate stationary phase of porous graphitized carbon (PGC) can uniquely separate isomers of released glycan alditols. If glycans are permethylated, the increased hydrophobicity allows separation using conventional C18 reversed-phase chromatography.

MS Fragmentation of N- and O-Linked Glycans

MS fragmentation is the holy grail of characterization of oligosaccharides and the goal is to be able to generate information-rich fragment spectra that will allow unequivocal assignment of a glycan structure. However, in the current state of the art, fragments from colliding carbohydrate molecular ions (collision-induced dissociation [CID]—either “beam-type” or “ion trap-type”) can only partly determine the structure of interest. We can distinguish three types of glycan fragment ions depending on the type and amount of information they carry (Figure 51.3).

Glycosidic fragments are assigned as containing the nonreducing end and not containing the glycosidic oxygen (a B-fragment) or a C-fragment when the glycosidic oxygen is included. Reducing end fragments are assigned as Y (with glycosidic oxygen) and Z (without glycosidic oxygen).
Cross-ring fragments can be assigned as nonreducing end fragments A and reducing end fragments X.
Internal fragments that occur from more than one fragmentation event from a combination of glycosidic and cross-ring fragmentation.

FIGURE 51.3.

Collision-induced dissociation–tandem mass spectrometry (CID-MS/MS) fragmentation of released N-glycans. (A) MS of released N-glycans: Masses are shown as doubly charged ions with corresponding calculated monosaccharide compositions; (B) CID-MS/MS (more...)

A fragmentation spectrum containing all possible glycosidic fragments would in theory allow the complete assignment of the primary sequence and branching of a glycan structure. In practice, most CID approaches provide glycosidic fragments, but several methods or a combination of methods (retention times, exoglycosidase digestions, ion mode, derivatization, multiple fragmentation steps, i.e., MSⁿ) are usually used to fully define a particular structure of interest.

Glycan Modifications

A further challenge is that many key glycan modifications such as O-acetylation, pyruvylation, etc., are labile to, and/or missed with, current analytical methods. This problem can result in populating databases with misleading information. As just one example, although many databases assume that a sialic acid at the terminus of the vertebrate glycan chain is N-acetylneuraminic acid, there are in fact dozens of kinds of modified sialic acids in nature, and the differences can have profound effects on biological functions (Chapter 15). The same is true of N- and O-sulfate esters on hexosamines (Chapters 14 and 17). It will take a while to address the analytical challenges effectively. Meanwhile, it would make sense to not make definitive assignments at any such ambiguous positions on glycan chains, but to use generic terms such as “NulO,” “Sia,” or “HexNAc” and to assign their corresponding white (generic) symbols in graphic depictions (Online Appendix 1B).

The Future of Glycomic Analyses

Releasing the carbohydrates from the protein(s) is currently a prerequisite for glycomics. These described techniques are, however, complementary and support the assignment of glycan structures still attached to glycopeptides, the techniques for which are less well-developed and present greater challenges (see below). N-linked and O-linked glycomics gives the overall landscape of the surface of a cell so will continue to be a valid approach in the foreseeable future to define cell–cell interactions and to discover disease biomarkers and new therapeutic targets.

However, there may not be a need for de novo analysis of every single oligosaccharide in the future. Matching of fragment spectra using fragmentation libraries containing many of the more ubiquitously expressed oligosaccharides (Chapter 52) is becoming a pathway to quickly assign a majority of structures fully or partially, and allows researchers to focus on validation of only the differentiating structures that are important for addressing the biological question. Novel mass fragmentation techniques, the introduction of ion mobility MS, and in vivo heavy isotopic incorporation are starting to provide another dimension of separation for isomeric carbohydrates and for the quantitation currently lacking in glycomics mass spectrometric techniques.

FROM GLYCOMICS TO TARGET-DRIVEN GLYCOPROTEOMICS

A key question that often follows the identification of important glycosylation features by glycomics is, which proteins carry the implicated glycan structures and at which sites? Given that protein glycosylation is dependent on the concerted action of glycosyltransferases, one might expect all glycoproteins that pass through the endoplasmic reticulum (ER)–Golgi secretory pathway to be equally susceptible to similar modifications unless one or more of the glycosyltransferases involved are protein site-specific. Alternatively, a particular subset of proteins may share some common sequon, structural conformation, a certain unique physicochemical patch surrounding the glycosylation site, or some yet unknown traits that collectively allow them to be sought out from among hundreds or thousands of other proteins to be acted on by the glycosylation machinery in a specific way.

Determination of De-N-Glycosylated Protein Sites

Glycosylation site analysis by MS of de-N-glycosylated peptides is at present the prevalent mode of glycoproteomic analysis of a complex protein sample, but this approach does not give information on what glycan structure(s) are present on these defined glycosylation sites. By virtue of the action of PNGase F or endo F/endo H, the de-N-glycosylated peptides are mass-tagged by conversion of Asn to Asp (a +1 Da change in peptide mass) within the consensus sequon (Figure 51.2) or by retention of a GlcNAc at the Asn site, respectively. These previously glycosylated proteins can then be subjected to trypsin digestion and LC-MS/MS analysis, and the MS/MS data searched against proteomic databases for rapid identification of the glycosylation sites on the tryptic peptide. This approach can be coupled to initial affinity or chemical capture of the glycoproteome subset, by use of lectins at the glycoprotein and/or digested glycopeptide level, or by hydrazine capture of the oxidized glycans on the glycopeptide, followed by PNGase F release of the captured peptide. In this manner, hundreds of N-glycosylation sites can now be routinely identified and their relative abundance quantified. Apart from not being able to inform the site-specific glycoform structures, the most commonly criticized aspect of this approach is false positives introduced by spontaneous deamidation of Asn to Asp, which is independent of the action of the PNGase F.

An innovative approach using the zinc finger nuclease gene targeting technique to impair O-glycosylation pathways by preventing extension of the O-GalNAc core has led to the generation of the first so-called SimpleCell, in which all mucin-type O-glycans synthesized carry only a single GalNAc or sialyl GalNAc. This same approach applied to O-mannose initiated glycans resolved a long-standing question concerning the apparent abundance of O-mannose glycans in brain tissue although at the time only a few relatively low abundance proteins had been determined to possess this modification. This approach has greatly facilitated glycoproteomic analysis of experimentally defined O-glycosylation sites, although the actual sites and attached O-glycan structures under more natural and specific physiological states remain unknown.

Glycoproteomics: Determining the Heterogeneous Site Glycosylation

Not every N-linked sequon (NXT/S) will carry a glycan and each glycosylated site will carry a heterogeneous collection of glycan structures. Glycoproteomics needs ideally to be able to identify all glycoproteins in a sample down to the level of which sites are occupied, and to quantify and characterize their respective glycoforms at that site. The ultimate aim is to dynamically take snapshots of the distribution of disparate glycans on each glycoprotein in a cell to infer how site-specific glycosylation may promote or interfere with interactions, signaling, and any close encounter. Ultimately, to understand the specific biological roles of protein glycoforms, we will need to define the population of each molecular species that arises from combinations of site-specific oligosaccharide diversity at multiple sites.

At the time of writing, direct LC-MS/MS analysis of a mixture of intact glycopeptides is unable to be routinely performed in an automated fashion for unambiguous glycopeptide identification such as is done in proteomics analysis of unmodified peptides. For N-glycopeptides, CID, either on an ion trap or quadrupole time-of-flight (QTOF) platform, or increasingly by higher-energy collision dissociation (HCD) induces mostly glycosidic cleavages, giving rise to highly abundant glycan oxonium ions in the low mass region, complemented by successive neutral losses of glycosyl residues from the precursors down to a single GlcNAc at the Asn (Figure 51.4A). In favorable cases, this so-called Y1 ion (peptide backbone+GlcNAc) can be identified and its m/z can be used to define the molecular mass of the peptide carrying the glycan, whereas the fragmentation spectrum gives the composition of the attached glycan. In the case of O-glycopeptides, the glycosyl residue attached to the hydroxyl groups of Ser/Thr is usually detached by HCD, leaving behind an intact peptide backbone. The molecular mass information alone, however, is not sufficient to allow unambiguous peptide identification in a shotgun analysis of enriched glycopeptides derived from a whole cell.

FIGURE 51.4.

Complementary tandem mass spectrometry (MS/MS) fragmentation of N-glycopeptides. (A) Collision-induced dissociation (CID)-MS/MS of glycopeptides produces cleavage of the glycan to give diagnostic oxonium mass ions, whereas higher-energy collision dissociation (more...)

Two other MS approaches are aimed at achieving or enhancing fragmentation along the peptide backbone as well as of the glycan. The first is to use electron transfer dissociation (ETD) fragmentation, with or without additional CID- or HCD-based activation. In principle, ETD leads mostly to c- and z-type peptide bond cleavages along the peptide backbone without inducing glycosidic cleavages (Figure 51.4B). The practical problem is that a significant peptide cleavage ion series will only be produced if the charge (z) is high, whereas the overall m/z remains low. It does not perform as well with doubly and triply charged N-glycopeptides occurring at m/z of >1400, which constitutes a substantial portion of a normal tryptic peptide digest. One way to address this issue is to increase the charge state of the peptides by either chemical derivatization, such as using a tandem mass tag (TMT), and/or by using a different proteolytic enzyme such as LysC or GluC instead of trypsin to generate larger peptides. Using higher supplemental activation energies (particularly with HCD) and/or longer ETD activation times also may help. ETD works reasonably well with O-glycopeptides, particularly those decorated with only one or two glycosyl residues including O-GlcNAc, O-GalNAc (Tn), O-GalNAc-Gal (T), O-Fuc, and O-Man. By retaining these glycosyl substituents, ETD allows identification of their distribution over several closely placed Ser/Thr residues in a way that is otherwise very difficult to achieve by CID or HCD.

The second and arguably the preferred approach for N-glycopeptide MS analysis is to increase the intensity and number of peptide sequence informative b and y ions produced by CID and to measure them at high mass accuracy (Figure 51.4A). Recent generations of MS instruments equipped with HCD (beam-type CID, i.e., the traditional type found in triple quads as opposed to ion traps) are increasingly capable of meeting these challenges without compromising the required speed and number of fragmentation spectra collected per run. With the ion trap instruments, the Y1 ion preidentified in previous runs can alternatively be isolated for further levels of fragmentation and analysis (MSⁿ) yielding additional information.

Limitations and Prospects of Glycoproteomics

At this stage of true glycoproteomic analysis, expert manual interpretation is required to assign the composition and site of attachment of the glycan. Informatics is needed to facilitate this analysis but computational solutions at present are usually not sufficient (Chapter 52). Glycopeptide MS² data can be used to deduce glycan composition at each site but does not currently afford detailed linkage and stereochemistry information. Diagnostic fragment ions can be sometimes found confirming terminal epitopes such as a fucosylated Hex-HexNAc or sialylated fucosylated Hex-HexNAc (Lewis x and sialyl Lewis x, respectively) but these cannot be distinguished from Lewis a and sialyl Lewis a. At this time, continued development of selective enrichment and/or pre-separation of specific subsets of glycopeptides needs to be pursued, because no single LC-MSⁿ method or instrument is sufficient to handle the full dynamic range of the entire glycopeptide pool derived from a complete glycoproteome. To limit the search space and enhance the detailed structural knowledge of the glycans encountered in a given glycoproteome, parallel glycome profiling can be performed. Such a “glycomics-assisted glycoproteomics” approach (Figure 51.2) can also be complemented with quantitative proteome and deglycoproteome profiling (mapping of formerly glycosylated peptides), which reduce the search space even further and provide supporting evidence to pinpoint the mechanism(s) driving the observed glycoproteome alterations. At this stage, for most of the available glycopeptide identification software tools it is strongly advised that manual validation of the glycopeptide assignments are still performed to generate sufficient confidence in the reported identifications. Future further development of these technical and bioinformatics tools will allow glycoproteomics to potentially directly resolve the structural details of the attached glycans; however, at this time glycomics remains an important and often essential forerunner to detailed glycoproteomic analysis.

ACKNOWLEDGMENTS

The authors acknowledge contributions to previous versions of this chapter by Carolyn R. Bertozzi and Ram Sasisekharan and appreciate helpful comments and suggestions from Barbara Adamczyk, Jeremy Praissman, Oliver Pearce, and Anel Lizcano.

Publication Details

Author Information and Affiliations

Authors

Pauline Rudd, Niclas G. Karlsson, Kay-Hooi Khoo, and Nicolle H. Packer.

Publication History

Published online: 2017.

Copyright

PDF files are not available for download.

Publisher

Cold Spring Harbor Laboratory Press, Cold Spring Harbor (NY)

NLM Citation

Rudd P, Karlsson NG, Khoo KH, et al. Glycomics and Glycoproteomics. 2017. In: Varki A, Cummings RD, Esko JD, et al., editors. Essentials of Glycobiology [Internet]. 3rd edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 2015-2017. Chapter 51. doi: 10.1101/glycobiology.3e.051

Name	Abbreviation	Symbol	Example	Monoisotopic mass (Da)
Hexose	Hex	○	glucose, Glc, mannose, Man, galactose, Gal,	162.0528
N-acetylhexosamine	HexNAc	□	N-acetylglucosamine, GlcNAc, N-acetylgalactosamine, GalNAc,	203.0794
Deoxyhexose	dHex		fucose, Fuc,	146.0579
Sialic acid	Sia		N-acetylneuraminic acid, Neu5Ac, N-glycolylneuraminic acid, Neu5Gc,	291.0954 307.0903

Chapter 51Glycomics and Glycoproteomics

HISTORICAL PERSPECTIVE OF “OMICS” SCIENCE: GENOMICS, TRANSCRIPTOMICS, AND PROTEOMICS

WHAT ARE “GLYCOMICS” AND “GLYCOPROTEOMICS”?