U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academies of Sciences, Engineering, and Medicine; Division of Behavioral and Social Sciences and Education; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Accelerating Behavioral Science through Ontology Development and Use; Beatty AS, Kaplan RM, editors. Ontologies in the Behavioral Sciences: Accelerating Research and the Spread of Knowledge. Washington (DC): National Academies Press (US); 2022 May 17.

Cover of Ontologies in the Behavioral Sciences

Ontologies in the Behavioral Sciences: Accelerating Research and the Spread of Knowledge.

Show details

4How Ontologies Facilitate Science

Ontologies are essential to science because they identify and clarify the entities and concepts that people want to talk about and study, and they identify the key relationships among those concepts. Understanding of these entities may change over time but identifying shared names for phenomena is an essential basis for all scientific work—as it is for any constructive communication. Psychologists today do not investigate the ego and the id as they were defined by Sigmund Freud in 1923, but the labeling of these terms opened the door to new ways of talking about and studying psychological phenomena, and thus, despite their scientific obsolescence, their influence in moving behavioral science forward has endured.

The relationships among entities in an ontology provide essential guidance for scientific work. The ability to classify neural pathways based on the neurotransmitters present at their synapses, for example, allows scientists to relate those pathways to a variety of physiological properties and to specific drugs that may affect the behavior of those pathways. The formal specification of both the essential elements of a scientific discipline and the key relationships among them provides an inspectable, shared description of what that discipline is about. The specification offers a framework that enables its practitioners to clarify their shared world view and to communicate with one another with the clarity needed to advance scientific knowledge.

Scientific ontologies are not static. As theories are tested, revised, and ultimately replaced, the terms and relationships in ontologies need to be adapted to reflect the prevailing ways in which researchers construe their discipline. Ontologies thus do not lock in or constrain scientific thought. Instead, they capture in formal terms and relationships what investigators currently are thinking about their field, and they clarify what a particular scientific paradigm postulates about the discipline being modeled. Perhaps just as important, ontologies help to identify weaknesses and gaps in the knowledge on which a discipline is based because they reveal omissions and inconsistencies in the research literature. Ontologies are more than just a convenient data structure; they provide the basis for both humans and machines to apprehend the structure of a scientific discipline and the salient distinctions that it makes about the world through an evolving set of standardized concepts and relationships.

Chapter 2 discusses the challenges that ontologies can address, and Chapter 3 examines what constitutes an ontology and the differences between ontologies and other knowledge resources. In this chapter we examine in more detail why ontologies are important to the behavioral sciences and the pragmatic benefits that they provide. We discuss how ontologies facilitate scientific advancement and how they aid the development of electronic knowledge bases that can assist scientists and clinicians in a range of knowledge-intensive tasks. The chapter closes with the committee’s conclusions about the ways ontologies can help to advance the behavioral sciences and, indeed, all sciences.

HOW ONTOLOGIES FACILITATE SCIENTIFIC PROGRESS

The existence of an explicit ontology makes possible many functions that are important not only to scientists, but also to those who rely on the knowledge that scientists produce. In this section we briefly summarize some of the ways ontologies facilitate the progress of science, with particular emphasis on the behavioral sciences. We emphasize that a single ontology (or set of ontologies) in the behavioral sciences cannot by itself offer all these advantages: our argument is that greater reliance on ontologies across domains will cumulatively build rigor, efficiencies, and the other advantages we discuss in this report. Although the engineering of an ontology describing a scientific discipline invariably requires a great deal of work (see Chapter 5), that effort ultimately is amortized over the many uses for that ontology and the many applications that can take advantage of that shared foundation.

Clarifying the Phenomena That Are Studied

The capacity to accurately refer to behavioral phenomena is a basic pillar of the behavioral sciences, allowing researchers to be precise about what they are studying and how they are conceptualizing their domain. As mentioned above, the naming of the id and ego provided a way for psychologists to begin talking about mental phenomena that had not previously been topics of study or even topics of organized discussion. A shared ontology is particularly salient for behavioral scientists, who rely heavily on constructs to guide research, because many of the phenomena they study are challenging to organize and investigate (e.g., exploratory behavior, self-control, or executive function; see Chapter 2). As a result, these constructs are not always used or measured by different researchers in the same way. But, as discussed in Chapter 2, a shared conceptualization that specifies agreed-upon definitions of phenomena is a key to successful science. Thus, this seemingly abstract function of ontologies may be the most important of all, particularly for the behavioral sciences.

A core tension in any ontology is between the values and goals of researchers and the limitations in human capacity to perfectly apprehend and represent what is “out there” in the domain of scientific inquiry that the ontology is intended to characterize. Scientists recognize that improving their capacity to perceive, understand, and make predictions about the world is a primary goal. Thus, developers of ontologies try to capture as best they can the scientists’ shared conceptualization—which, in turn, is intended to approximate what exists in the world. But ontologies also need to have pragmatic features that allow them to support the particular goals that researchers have, such as classification, communication, data integration and sharing, bibliographic retrieval, and the comparison and analysis of data. Some of the constructs of interest in the behavioral sciences—such as rationality, self-regulation, and complex emotions—may not be at all intuitive or observable, but are essential elements of the behavioral sciences that ontologies could be expected to describe. Indeed, the central role that constructs and construct validation have played throughout the history of the behavioral sciences points to the value (and ultimately, the necessity) of ontologies in facilitating progress.

Classification

Ontologies are used to sort individuals, objects, and events into different groups. For example, an ontology might classify psychiatric disorders as various types of mental illness or classify organisms as representing particular species and higher taxonomic categories. The World Health Organization’s International Classification of Diseases (ICD) supports the systematic recording, analysis, interpretation, and comparison of morbidity and mortality data collected in different countries or regions and at different times (see Chapter 2 and Appendix A). The set of terms in the ICD provides a precise nomenclature for specifying diseases and a hierarchical structure for those terms that allows for the classification of diseases. For instance, the ICD declares unambiguously that COVID-19 is an infectious disease and that superficial spreading melanoma is a kind of skin neoplasm, which is a kind of cancer. These classifications are encoded directly in the ICD’s hierarchy of terms.

When a scientific ontology is encoded in a description logic, such as the Web Ontology Language (OWL; see Chapter 3), other kinds of classifications can be inferred by a program that reasons about the implications of the semantic encoding assigned to the entities in the ontology. Thus, the National Cancer Institute (NCI) Thesaurus, like the ICD, may note that superficial spreading melanoma is a kind of skin neoplasm: see Box 4-1. Using OWL to define this condition may also allow a reasoning system to infer that, if this condition is necessarily a cancer of melanocytes, then superficial spreading melanoma should also be classified as a malignancy of cells that are of neural-crest origin. Such classifications may not be obvious simply by inspection, but they are provable from the logic of the representation. Ontologies thus allow developers to encode the classifications that they are aware of directly into the structure of an ontology, and to discover new classifications through the application of reasoning systems that determine the logical implications of the ways in which the entities have been defined. An ontology would allow investigators to test hypotheses derived from the logical structure of individual constructs and their relations, though as far as the committee could determine, this benefit of developing and using ontologies has yet to be widely realized in the behavioral sciences.

Box Icon

BOX 4-1

The NCI Thesaurus.

Communication

By enumerating phenomena of interest, an ontology allows people to communicate clearly and efficiently about the ideas that are represented. For scientists and researchers, a shared ontology makes it possible to accurately describe and express their constructs, theories, experiments, and methods. In particular, shared ontologies are critical for comparison of results from different experiments and observations. Suppose that one researcher reports an experiment in which praising dogs regardless of their performance resulted in less disciplined behavior, and another reports an experiment in which giving dogs food treats on a random schedule resulted in poor control. Comparison and integration of these two experiments crucially depends on whether praise and treats, or the underlying constructs of disciplined behavior and poor control, are defined in the same way. If the two researchers do not share an ontology (recognize the same kinds of objects, variables, or measures of the same parts of the world), their results are largely incommensurable.

Similarly, if the different experiments and observations are attempts to measure the same underlying latent variables (variables that are inferred, rather than observed) or constructs, then agreement is needed about the nature and measurement of those variables (i.e., agreement about that aspect of the ontology). Scientific communities and scientific progress depend on investigators being able to evaluate one another’s theories and to build on one another’s work, but these fundamental aspects of the scientific process cannot be achieved in the absence of shared terms and the ability to communicate in a consistent manner.

An example of a behavioral science ontology intended to facilitate communication among scientists is the Cognitive Atlas, a collaborative effort to characterize the current ideas in cognitive science and cognitive neuroscience: see Box 4-2. The Cognitive Atlas is intended to compile and systematize concepts used by experts in psychology, cognitive science, and neuroscience. Interestingly, the Atlas is explicitly designed to identify not only areas of agreement among researchers, but also areas of disagreement, such as differing definitions of concepts and constructs, differing operationalizations of those constructs, and differing interpretations of the meaning of experimental tasks and manipulations. It is intended to provide a language in which different researchers can discuss their experiments and theories in ways that other cognitive scientists can readily understand, albeit perhaps in terms that are different from those that they usually use. Thus, the development of the Atlas documents usages without needing to take a stance on philosophical differences among potential users.

Box Icon

BOX 4-2

The Cognitive Atlas.

Ontologies also support communication about theories, experiments, knowledge, and insights among disparate communities of practice. Consider a clinician selecting a code from the Diagnostic and Statistical Manual of Mental Disorders (DSM) to describe a patient’s mental disorder. The code enables payors to know the purpose of some clinical encounter, it enables pharmacists to know the purpose of some prescription for medication, and it enables social workers to negotiate follow-up care with other health care facilities. The shared ontology term (i.e., the code) thus provides a standard mechanism for workers from many different stakeholder groups—who have their own customs, their own jargon, their own world views—to embrace a piece of common information and to act on it accordingly. Although debates regarding the relative merits and disadvantages of the DSM are ongoing (see Chapter 3), there is no doubt that it has had a profound effect on the exchange of data throughout the clinical enterprise. Ontologies thus facilitate communication among disparate professional groups in precise terms for the very pragmatic purpose of ensuring that appropriate information is correctly transferred.

In many settings, successful communication with lay audiences is also crucial. This kind of communication is particularly complicated in the behavioral sciences because nonclinical or nonscientific audiences may have their own ideas and beliefs regarding psychological or behavioral concepts. Lay notions, such as attitudes, beliefs, desires, and emotions, often do not clearly map onto science-based ontologies, which rely on constructs that may be at a level of abstraction several steps removed from everyday language. The behavioral sciences, in particular, face the challenge of simultaneously refining and systematizing knowledge while making that knowledge accessible to a broad range of scientist and nonscientist stakeholders. Because the relationships between these everyday notions and the constructs defined in behavioral science are rarely delineated clearly, it is important that researchers use standardized terms for the constructs they are studying to eliminate (or at least reduce) the potential for misunderstanding whenever they are communicating with patients and other lay people, including policy makers. Having an ontology in place for a construct such as “attitude” will both allow researchers to maximize the scientific impact of their work (e.g., by expanding the ability to compare findings across studies) and also help to ensure that, when the emerging knowledge is shared, it is done in a manner that respects the distinctions between everyday and scientific language.

Data Integration

Shared ontologies make it possible for researchers to integrate their data with those of other scientists, pooling results and thus also making it possible to explore hypotheses with larger sample sizes and different sets of subjects. Such pooling is possible only if investigators use the same terms to describe the same phenomena or if there are clear mappings between the idiosyncratic terms that one investigator may use and the standard terms used generally in the scientific community. Unfortunately, the absence of widely shared ontologies in the behavioral sciences (see Chapter 5) has been at the root of debates among investigators that focus on differences in measurement and operationalization. Ontology development could help to move research domains toward more broadly accepted nomenclature for a topic of interest.

Ontologies are necessary for computers to manage the integration of data collected in different contexts. An example from outside the behavioral sciences illustrates the importance of this function. Years ago, computer-aided design (CAD) software existed in parallel with noncompatible software designed to support product fabrication—that is, software for computer-aided manufacturing (CAM) (Geddes, 2020).4 These two separate technologies could not interact. Industrial designers could use CAD technology to specify new products, but their specifications could not be interpreted by CAM technology to turn their designs into actual products. CAD and CAM software referred to the same components in inconsistent ways, so integration of information across the two computer systems was simply impossible. Industrial design and industrial manufacturing could not be unified until ontologies were developed so that CAD and CAM systems could exchange information seamlessly. In the behavioral sciences, there may be a similar strong desire to integrate different data sources, such as clinical data from electronic health record systems and experimental data obtained in the laboratory, but such data integration is stymied in the absence of ontologies that can standardize the terms used in different contexts.

The practical use of ontologies by computer systems to combine data or information from multiple, heterogeneous sources has been developing rapidly and is becoming more widespread (Zhang et al, 2021). Semantic integration involves the use of a conceptual representation of data and their interrelationships (e.g., an ontology) to serve as a canonical resource that enables integration of data from sources that may adopt heterogeneous conventions regarding the form or naming of the data elements (Xiao and Wang, 2006). Semantic integration not only standardizes the terms used by interoperating systems and provides definitions for those terms, but also entails formal representations that store contextual information to enable both interconversion across the systems and the translation of source terms into the canonical terms needed to integrate the data in a scalable manner (Alkhamisi and Saleh, 2020). Semantic integration enables researchers to use a wide range of computational tools to more rapidly advance their scientific work.

Data Sharing

A major theme in virtually all research communities in recent years has been the importance of making primary data publicly available so that other investigators can verify experimental results and perform secondary analyses. Increasingly, the data collection that results from government funding of research is viewed as a public good, and making datasets available in open data repositories is seen as an essential goal for science, equal to the goal of publishing research in journals. The critical importance of openness and sharing of scientific data has been documented in prior reports of the National Academies of Sciences, Engineering, and Medicine and elsewhere (NASEM, 2018, 2021). The goal is to create datasets that are findable, accessible, interoperable, and reusable—that is, FAIR—and the creation of FAIR data continues to receive broad support from funding agencies, professional societies, and individual researchers (Wilkinson et al., 2016). The FAIR guiding principles require that the data be annotated with metadata that describe the datasets and that are based on community-supported standards. Those standards, of course, need to include terms from appropriate ontologies.

The metadata needed to describe datasets for open science typically consist of a list of attributes–value pairs. The attributes often correspond to general ways in which an experiment might be described (e.g., that an experiment has subjects, interventions, and methods of data collection). The values provide the specific information needed to understand what was done (i.e., who or what the subjects were, exactly what the intervention was, what methods were used to collect the data). For datasets to be FAIR, the values for each of the attribute–value pairs in the metadata, when appropriate, should be terms from a designated ontology.

The datasets will not be “findable” unless ontology terms allow researchers to search the metadata of the datasets in a precise manner. The datasets will not be interoperable or reusable unless the datasets have standardized metadata that enable third parties to know what precisely was done in the experiments that generated the data. Making data FAIR requires more than putting data in an open repository: it requires ensuring that data are accompanied by rich, controlled metadata—and that requires the availability of appropriate standard, scientific ontologies.

Bibliographic Retrieval

Optimal searching of the scientific literature (and datasets) is facilitated by indexing the contents of bibliographic databases using controlled terms, which enables search engines to use those terms to find appropriate content. Use of controlled terms is important for bibliographic retrieval, as authors frequently describe their research in inconsistent ways, and there is no way for a searcher to know all the idiosyncratic ways in which researchers describe and refer to their work.

A good portion of the behavioral science literature is indexed by the National Library of Medicine, using its controlled terminology of medical subject headings through the Medline database and PubMed.5 Although PubMed provides excellent access to medical and biological knowledge, the system is weaker in its support for experimental psychology and other behavioral sciences outside of biomedicine. Psychologists often search the PsycINFO database of the American Psychological Association (APA) for access to information concerning a broad range of journal articles, research reports, and other resources.6 PsycINFO permits searches using (1) uncontrolled keywords (generally supplied by authors), (2) controlled index terms (derived from the APA’s Thesaurus of Psychological Index Terms), and (3) classification codes (used by APA staff to indicate the relevant subfield(s) of psychology).7 However, this resource, which contains over 5 million records from more than 2,200 journals, covers a very wide landscape. A recent examination noted that while the database has been regularly updated it still reflects assumptions that have become outdated, and pointed to questions about the validity of the controlled terms (Burman, 2018). As users of the database, the committee notes that because the terms are not organized hierarchically, the database does not lend itself naturally to searches that involve abstractions of index terms.

In general, formal ontologies support searches that can be more tailored to the user’s needs because they allow more abstract or more granular terms related to an initial term of interest to be easily identified and used. Such ontologies may also be able to incorporate external ontologies (such as the Cognitive Atlas, RDoC, or the DSM; see Chapter 3), making it easy to identify new search terms and easing the maintenance of the search engine as the external ontologies evolve. Modern search engines increasingly rely on natural language processing (NLP) of the text of scientific documents. The availability of appropriate ontologies can enhance the capabilities of NLP: it can improve both the precision and recall of bibliographic retrieval by codifying the abstractions, specializations, and synonyms of terms that users may use to formulate a search but that may not be mentioned explicitly in the text of relevant documents or in the user’s actual search request.

Comparison and Analysis of Data

Researchers can take advantage of the hierarchies inherent in an ontology to assist with data analysis and interpretation. An ontology allows researchers to categorize observations and to identify the general principles that those points represent. In genomics, for example, the Gene Ontology is used frequently to help analyze the results of high-throughput experiments. In a method known as enrichment analysis, scientists analyze the results of investigations into which genes tend to be turned “on” under particular experimental conditions. The algorithm uses the Gene Ontology to identify the most abstract concept in the hierarchy of biological processes that explain all the identified genes, trying to exclude consideration of other genes that are not turned “on”: see Box 4-3. Thus, from the individual observations, an algorithm can determine that, under the observed conditions, the cell is expressing genes needed for “DNA repair,” “photosynthesis,” or “sodium transport.” The ontology’s intrinsic hierarchy thus allows a problem solver to reason about myriad data and to identify, statistically, the best generalization for the individual observations. Similar methods have been applied in other areas of investigation, and they might well find a place in classifying complex datasets in the behavioral sciences (LePendu et al., 2011).

Box Icon

BOX 4-3

The Gene Ontology.

PRIMARY BENEFITS OF ONTOLOGIES

What would the behavioral sciences look like if ontologies intended for scientists, clinicians, and users of scientific research were developed to support the functions that we have discussed in this chapter? The committee highlights three primary benefits that ontologies can bring: (1) opportunities to improve care for patients, (2) infrastructure to support the mechanics of contemporary scientific research, and (3) an enhanced capacity to expand scientific knowledge. The first of these benefits is particularly relevant in the behavioral and biomedical sciences, but the other two are advantages that accrue in any scientific domain. The behavioral sciences have not yet taken full advantage of these benefits, as we discuss in Chapter 5.

Improving Patient Care

Diagnosis and treatment for one of the most common mental disorders, depression, illustrates what would be possible if the terms used by researchers and clinicians were linked more efficiently and systematically. Both scientists and clinicians would benefit greatly from greater alignment between the study of treatments and the study of the nature and causes of different mood disorders. There are excellent examples of translational, mechanism-driven research in mental health, but an agreed-upon formal ontology for mood disorders (as an example) would greatly simplify the process of designing, testing, and disseminating new and more effective treatment.

Imagine that a precise definition of an operationalized subtype of depression with a clearly articulated hypothetical cause were available and accepted by relevant stakeholders. With that ontological consensus in place, studies ranging in level of analysis—potentially from the molecular to the societal level—and covering topics from etiology to prevention, would be more easily compared and integrated. Having an articulated definition, hypothesized cause, and operationalization would help to ensure that investigators examining different aspects of the disorder were using a common language, sharing measures and the same logical structure for designing their specific studies. In turn, those advantages should lead to more rapid development and dissemination of new and more effective treatments and preventive interventions. The mental health professional mentioned in Chapter 2, who struggles to identify answers to specific questions that arise in the course of their practice, would also benefit.

Clinicians are quite familiar with selecting codes from the DSM to characterize their patients’ diagnoses for purposes of record keeping and billing. However, the availability of an ontology of behavioral disorders with more formal semantics that could support the encoding of machine understandable descriptions of neurophysiology and pharmacology would allow more extensive functionality. Clinicians could access the scientific literature in a more direct way to search for studies of drugs related to their patients’ underlying neurochemistry. They could easily locate the latest clinical trial results not only for subjects with their patients’ diagnosis, but also for those with physiologically related disorders.

Building Infrastructure for Scientific Research

Modern science is rarely conducted in the solitary manner in which Isaac Newton did his work. Science has evolved to become a complex, collaborative activity. Funders, publishers, professional societies, and investigators themselves increasingly recognize how interdependent all scientists have become. These stakeholders see the importance of sharing scientific data and of making sure that data are available in a form that is interpretable by both people and machines. The primary product of scientific experiments is new data that are a sharable resource for other scientists. The importance of establishing that conclusions drawn from the data are justified, that procedures used to create the data are replicable, and that new discoveries buried in the data do not go undiscovered has put an increased premium on describing data in standardized ways for the benefit of the entire scientific ecosystem. Those standards invariably depend on ontologies. Team science, open science, and data reuse are the future of research, whether it is laboratory based, observational, or clinical in nature (NASEM, 2015, 2018, 2019, 2021). That future requires ontologies that frame communication between people and machines to ease the interpretation of complex datasets and to make scientific data an enduring and available resource both for the community at large and for the next generation of investigators.

By establishing shared terms for the concepts and phenomena of interest within a particular domain and a classification of those entities, ontologies make key scientific functions possible, including:

  • clarification and classification of phenomena being studied;
  • accurate communication among scientists and other users of scientific research;
  • precise bibliographic retrieval;
  • integration, comparison, and analysis of data; and
  • sharing of data and reuse of data to make new discoveries.

Expanding Scientific Knowledge

Etymologically, science is the word for what is known. Because ontologies give names and structure to what is known about a scientific discipline, they provide a foundation for thought, hypotheses, and understanding of new discoveries. Ontologies cannot eliminate scientific uncertainty, but without them there will always be ambiguity about what is known, what can be inferred, and how new ideas build on the ideas of others. Without shared names for things, each scientist will always lack the ability to test hypotheses that build on the current scientific knowledge as a common background.

The scientific method is based in part on the assumptions through which researchers base their observations, hypotheses, and discoveries in a common understanding of what exists. Ancient astronomers named the stars. Ancient anatomists named every bone in the body. Standardizing names for the entities in the world and preserving those names to ensure lucid scientific discourse is as old as science itself. Science is impossible without a shared conceptualization, and making shared conceptualizations more explicit, more exchangeable, and more examinable will advance both science and the benefits that society derives from the scientific enterprise.

CONCLUSION

CONCLUSION 4-1: By establishing a controlled vocabulary of shared terms for the concepts and phenomena of interest within a particular domain and a classification of those entities, ontologies have three primary benefits:

  • They open up opportunities to improve care and services, based on the work of investigators studying disorders who use a common language, shared measures, and the same logical structure for designing their specific studies.
  • They provide an infrastructure to support the mechanics and application of contemporary scientific research, helping to ensure that conclusions drawn from the data are justified, the procedures used to create the data are replicable, and new discoveries buried in the data do not go undiscovered; framing communication between people and machines; easing the interpretation of complex datasets; and making scientific data an enduring and available resource for all.
  • They create enhanced capacity to expand scientific knowledge, providing a foundation for thought, hypotheses, and understanding of new discoveries.

REFERENCES

Footnotes

3

Cognitive Atlas developer Russell Poldrack presented to the committee at its spring 2021 workshop.

4

For a discussion of the history of CAD and CAM, see https://www​.technicalfoamservices​.co.uk/blog​/blog-history-of-cad-cam/

5

PubMed is a database of citations for research articles in biomedicine, currently containing more than 33 million citations; MEDLINE is a subset of that database. See https://www​.ncbi.nlm.nih.gov/mesh/, https://www​.medline.com/; also see https://pubmed​.ncbi.nlm.nih.gov

6

Initiated in 1927, PsycINFO provides abstracts and indexing of more than five million research articles in psychology and related fields.

7
Copyright 2022 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK584333

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...