OUP user menu

★ Review ★

Embedded Structures and Representation of Nursing Knowledge

Marcelline R. Harris , Judith R. Graves , Harold R. Solbrig , Peter L. Elkin , Christopher G. Chute
DOI: http://dx.doi.org/10.1136/jamia.2000.0070539 539-549 First published online: 1 November 2000


Nursing Vocabulary Summit participants were challenged to consider whether reference terminology and information models might be a way to move toward better capture of data in electronic medical records. A requirement of such reference models is fidelity to representations of domain knowledge. This article discusses embedded structures in three different approaches to organizing domain knowledge: scientific reasoning, expertise, and standardized nursing languages. The concept of pressure ulcer is presented as an example of the various ways lexical elements used in relation to a specific concept are organized across systems. Different approaches to structuring information—the clinical information system, minimum data sets, and standardized messaging formats—are similarly discussed. Recommendations include identification of the polyhierarchies and categorical structures required within a reference terminology, systematic evaluations of the extent to which structured information accurately and completely represents domain knowledge, and modifications or extensions to existing multidisciplinary efforts.

In recognition of the potential benefits when vocabulary and standards developers, federal agencies, and system vendors work together, a Nursing Vocabulary Summit has twice convened at Vanderbilt University to discuss characteristics of nursing vocabularies in electronic medical records (EMRs). A final sentence in the classic paper by Graves and Corcoran on nursing informatics1 was prophetic in relation to the challenges presented at these summits: “To the extent that human processing of data, information, and knowledge can be modeled, these processes can be represented in computer systems and the computer system programmed to mimic the process.” While actually mimicking human processes may not be necessary or even desirable, representing the data, information, and knowledge of a domain so that clinical practice is facilitated is essential. The purpose of this paper is to review approaches to nursing data, information, and knowledge representation that must be considered as nursing vocabulary work progresses.


Approaches to the processing of data, information, and knowledge in computer systems are typically dichotomized into domain content and information structure.2 Domain content approaches concern the formal, organized corpus of data, information, and knowledge specific to various areas of nursing practice. Standardized nursing language systems emphasize this approach. Approaches to information structure concern the way nursing phenomena are represented and managed so that they can be processed using well-designed computer algorithms. The need for expanded approaches to the representation of nursing phenomena in response to implementation issues in computer-based systems has been well described.3

It is time to tighten up relationships between appropriately structured domain models and information structures. A national agenda around the EMR is well established, and standards development organizations are forwarding recommendations on EMR content and structure to federal and international agencies.46

A critical first step in tightening up these relationships is to be fully aware of the multiple “embedded substantive structures” existing across different approaches to organizing and representing domain knowledge.7,8 Concepts of the domain naturally organize or are organized around such structures, and thereby provide context for the meaning and relationships of lexical elements associated with specific concepts. This means that not only must the multiple embedded structures relevant to domain knowledge be represented in information systems, but they must also be represented in ways that can be modified as domain knowledge is extended. For example, concept-based terminologies in medicine reference terms clustered around embedded structures, or hierarchies, such as anatomy (topology), pathology (morphology), and causes (etiology).9 Extensive work is underway to use inferencing approaches such as description logics, as well as messaging standards such as those of Health Level 7, to construct and exchange a variety of statements required within and across clinical and administrative systems (e.g., problem lists and ICD codes).10

Without a clear understanding of the way concepts are organized in the domain of a discipline and a clear understanding of organization and statements required in automated applications, we are not likely to support nurses' processing of data, information, and knowledge in the environment of EMRs.11,12

Approaches to Organizing Domain Content

Those who study the significance of the way people order phenomena (taxonomy development) note that the process of constructing and embedding infrastructures in ways that are logically sound and well-suited to users represents perceptions of common properties within not only classes but also culture, values, and politics.13 Classifications emerging from such ordering activities have served as means to achieve “good” ends (e.g., advancing knowledge through the classification of biology) and “not-so-good” ends (e.g., discrimination due to race or gender classification). The quest to classify and represent nursing's domain parallels three approaches to knowledge representation—scientific reasoning, the acquisition of expertise, and the development of standardized languages.

Scientific Reasoning

The scientific reasoning approach emphasizes systematic inquiry to define and refine theory through the conduct of basic and applied research. While it is recognized that researchers use many types of quantitative and qualitative strategies, the major steps of all research broadly include identifying the concepts of concern, formulating testable questions or hypotheses, designing a study so that the conceptualization of the problem is consistent with operationalization of the variables, collecting and analyzing data, and interpreting and disseminating findings. Thus researchers can be seen to organize concepts within the domain on an embedded structure of variables, relationships, and findings. This is this same embedded structure by which scientific models are traditionally expressed. The significance of this structure to both researchers and consumers of research is that it suggests categories that direct one's understanding of the research, and there are common expectations related to the attributes of data clustered in these structures.

The concepts of concern to researchers, and the ways the concepts are made operational, define the vocabulary of research. Investigators assign terms to concepts and then represent the concepts with specific variables. In this manner, the level at which variables are defined serves as a data dictionary for research concepts. Although not necessarily true for the concepts of basic research, a strong similarity should exist between research concepts and clinical concepts in a domain.

Knowledge created from research enters the domain when variables, relationships, and findings are relevant or of value to readers of a published study. In nursing, this means that the study serves to describe, explain, predict, or control human responses to actual or potential health events.14 Furthermore, statistically tested associations extend our understanding of distances and levels of abstraction among domain concepts, thus potentially improving the classification of nursing phenomenon.

This approach to organizing knowledge is embedded in the software, arcs, used for research knowledge management and modeling at nursing's international scholarship organization, Sigma Theta Tau International (STTI).15,16 Thus, when a researcher registers a study using the STTI Registry of Nursing Research, the researcher's explicit model of domain content is tightly linked to the logical model of arcs.

The scientific approach is highly explicit, bounded, and unambiguous, and therefore relatively straightforward in relation to codification and embedding in an information system.17 Unfortunately, the entirety of nursing domain knowledge is not represented by research or by scientific reasoning processes.


A second approach to describing the nature of nursing knowledge emphasizes the acquisition of expertise through practice.18 Expertise is acquired through a clinician's dynamic interpretation of, and response to, salient signs and symptoms that present in highly social and value-laden contexts. In contrast to novice clinicians, who rely on rule-based knowledge and calculated reasoning, expert clinicians rely not only on rationality but also on intuition, or practical wisdom gained from experience, engagement with the patient, and ethical and moral dispositions.

Novices look at different data and organize the data differently than experts do, with consequent differences in the way concepts cluster “in the heads” of individual clinicians. The expert needs fewer data points than the novice, and the data points emphasized by the expert are more salient points. It is experience, connected to explicit and domain-specific knowledge, that allows clinicians to hierarchically organize concepts that frame performance.19 This pattern of expertise is evident across disciplines and seems to explain transitions from rule-based performance to performance based on intuition and innovation.20

Interestingly, the Latin verb from which the words “experience” and “expert” are derived is experiri, which means “to put to the test.” Clinicians are tested on an “n of 1 problem” with every patient encounter. As skills and judgment increase, it becomes increasingly difficult to make explicit the domain model by which concepts are organized and a knowledge base created.

The terms and phrases used by clinicians in unstructured, free-text documentation systems can thereby be seen to represent the vocabulary of clinical care. To the extent that clinicians are free to record a representation of what they perceive as clinically relevant data and at the level of detail that seems important in the context of a specific clinical encounter, the individual domain models are potentially made evident. Highly structured chart forms, data sets, and standardized vocabularies represent a wide range of embedded structures that may or may not be consistent with the clinician's “n of 1” experience and domain model.

The Nursing Intervention Lexicon and Taxonomy (NILT) is an example of the use of natural language processing techniques to represent an embedded structure of care. Grobe and Hughes21 analyzed terms and phrases put forth by clinicians in clinical notes, using a semantic network approach from which the NILT intervention taxonomy was derived, and tested for reproducibility. The significance of the NILT structure is that it represents meanings that are tightly coupled to clinician domain representation as made evident in natural language.

Standardized Nursing Languages

A third approach to organizing nursing knowledge is represented in the nursing data systems compiled by the American Nurses Association (ANA).22 The ANA envisioned national databases as “the means for describing, measuring, and classifying nursing practice.” To this end, the Steering Committee on Databases to Support Clinical Nursing Practice (now the Committee for Nursing Practice Information Infrastructure) was charged with responsibility for policy recommendations related to the development of nursing data elements in national databases. Some of the data systems catalogued by the ANA are standardized nursing languages (SNLs), and formal recognition by the Committee facilitates the incorporation of those SNLs into the Unified Medical Language System (UMLS).23

Developers of these data systems define terms and phrases that represent concepts of relevance to intended users of the data set, classification system, or nomenclature. Then, in each data system, concepts are clustered around various taxonomic structures. In contrast to the two approaches to organizing domain content previously discussed, no high-level categorical structures are agreed on across SNLs.

The need for multiple taxonomies at this level was clearly demonstrated by the developers of the Nursing Outcomes Classification (NOC).24 The NOC team initially tried to organize outcomes in the Nursing Interventions Classification (NIC) taxonomy but determined that the focus of NOC on patient states required a different organizing structure than that of the NIC, which focuses on nurse behaviors.25 Other investigators, attempting to map problems or diagnoses across nursing classification systems, have similarly noted the effect of different taxonomic structures and levels of abstraction on concept representation.26

In addition to multiple taxonomies, there is further discussion of whether diagnoses, intervention, and outcomes are, at an even higher level of organization, the preferred organizational and representational approach to domain knowledge. Ozbolt's Patient Care Data Set (PCDS), for example, is organized around problems, goals, and orders, whereas Martin's Omaha system includes problem classifications, intervention scheme, and three-problem rating scales for outcomes.27,28 While selected systems, such as the PCDS, are structured to define relationships when different aspects of a concept (e.g., problems, goals, orders) are represented in the system, no such structure defines relationships across nursing terminology systems. This is because of the bounded perspective and independent development of the different systems. For example, NANDA deals only with diagnoses, not with interventions and outcomes.

As an example of these different approaches to organizing and representing domain knowledge, Table 1 compares terminological representations related to the concept “pressure ulcer” from the UMLS Metathesaurus, an Agency for Health Care Policy and Research (AHCPR) clinical guideline, and data systems catalogued by the ANA that include an assessment focus (i.e., the International Classification for Nursing Practice Beta Version and seven ANA-recognized SNLs).2325,2733 At a high level of ordering, the terminological representations are variously classified as diagnosis, intervention, outcome, problem, and goal. At lower levels of ordering, the terminological representations are represented as a physiologic care pattern, tissue compression focus, human response pattern of exchanging, physiologic complex domain, physiologic health domain, and functional health domain.

View this table:
Table 1

References to the Concept of Pressure Ulcer across Selected Terminological Representations

An automated search of the Metathesaurus using the string “pressure ulcer” with the UMLS Knowledge Source server identified only one ANA-recognized source vocabulary, the PCDS. (Snomed rt, another ANA-recognized vocabulary, includes the term “decubitus ulcer” but was not included in the UMLS at the time of this search.) This finding, using the Metathesaurus, reflects not only different levels of concept abstraction but also the different definitional attributes or semantics of the concept associated with the different categorical structures across SNLs. A recent evaluation of a nursing activity type definition (delivery mode, activity, focus, and recipient) demonstrates how the categorical structure of a reference terminology could accommodate the representation of data across different organizing structures of SNLs.34

These three approaches to organizing and representing knowledge independently serve nursing but lack integration and cross-validation. Clearly, multiple structured domain models must be supported in any valid reference terminology.

Similarly, a mix of technologies is required to meet the varying needs of health care organizations, and these independent systems need to exchange data and information. Various approaches to structuring information are evident. It has been suggested that a high-level schema or reference information model such as that proposed by Health Level 7, based on atomic classes, will support the interoperability of the various lower-level information models.35 As with structures in domain models, however, the structure of information must be considered in relation to the organization and representation of nursing knowledge.

Approaches to Structuring Information

Three broad approaches to structuring information in electronic systems are particularly important— clinical information systems that are designed primarily to support individual clinician's need for data, information, and knowledge; minimum data sets that are designed primarily to support the common representation of data shared among disparate users and systems; and standardized messaging formats that are designed primarily to standardize messaging between applications. All are concerned with the use of terminological data and information to support key clinical and business processes. However, the extent to which the nursing domain models in research, practice, and standardized nursing languages are evident in the different approaches to information structure is largely undescribed.

Clinical Information Systems

One purpose of a clinical information system (CIS) is to serve as a means of explicit communication between nurses or between nurses and other health care providers. The organization of the CIS is therefore ideally designed to support the indexing and retrieval of key data and information in ways that are known to support patient care. However, no common structure is evident across vendors or organizations. The nursing process is often represented through various structured approaches to information, such as care plans, flow sheets, clinical pathways, and focused assessment tools. Each approach adds a semantic aspect to the data recorded in the form, but evidence of how useful these structured representations are in clinicians' decision-making processes is limited. Required documentation for a specific structured approach is often perceived as a “chore.” Furthermore, informal information systems, such as the audiotapes and sheets of paper used to detail information received in change-of-shift reports, often contain highly relevant data and information needed to support patient care, which never make their way into the CIS.36

Clinical nursing information systems have been the subject of discussion for a number of years. Generally, there is consensus that a comprehensive nursing information system should support clinical decision-making; improve communication; enable nurses and patients to plan, organize, coordinate, and control care; improve the effectiveness and efficiency of care; stimulate and support clinical research; support financial and other assessments of nursing's contribution to health care; support policy making; produce management summaries; and enable and facilitate organizational change to improve care.2,37 While an emphasis on multidisciplinary information systems is generally noted and well placed, it is difficult to participate in this without a clear understanding of the disciplinary issues and requirements.

There is not yet a consensus around the content and structure needed to support approaches that represent nursing domain knowledge in the CIS. Different approaches to structuring information are likely to be required to meet the different uses listed above. The critical point is that information structures must retain fidelity to the underlying domain knowledge and, most importantly, to the models of those who are commonly acknowledged as sources of knowledge generation—clinicians and researchers.

Minimum Data Sets

Data sets are often developed to facilitate the computerized recording, storing, and transmission of data across organizational and geographic boundaries. They are called minimum because they represent the fewest points believed necessary to capture and convey the essence of required information. The Resident Assessment Instrument (RAI) is an example of a very extensive minimum data set, which is used by the Health Care Financing Administration and by state agencies for reimbursement and quality monitoring in long-term care facilities.38 In addition to the highly formatted information model of the document, clinical agencies are required to retain an “auditable trail” in the agency's clinical notes. Complaints from the industry center on the use of scarce registered nurse resources for this purpose. Other examples of national minimum data set initiatives of relevance to nursing are summarized by the ANA.22

While a desirable goal is to have such data sets generated directly from the clinical record, methods to support this have not yet been fully developed. Thus, while automated data storage and transmission are possible, clinicians are still required to enter the data in minimum data sets in addition to clinical data in the CIS. The growing disconnect between such data sets and the data needed for clinical care is a cause of considerable concern and frustration.

Metadata registries that define the common knowledge shared by the recipient and the originator of the data are proposed as a partial solution to expedite access and use of different data and different formats. The goal is to develop standards and guidelines for the semantic content and syntax of shared units of data. A metadata registry would thus include information about naming, identification, classification representation, physical structure, how and when information is acquired, intended uses of the information, and who is responsible for the data and the metadata.39 For example, a joint technical committee of the International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC JTC1/SC32 WG2) has prepared a standard currently under review (ISO/IEC 11179) that identifies mandatory, conditional, and optimal attributes for the description of data elements.40 Included are:

  • Unique numeric identifiers registered in a data registry

  • Well-formed definitions of the data elements

  • Representation of allowed data values

  • Enumerated domains specified by allowed values

  • Reusable domains rigorously defined and specified as generic data elements

Metadata and terminology on the surface appear to be functionally similar, but important distinctions exist. Terminology emphasizes the semantics and the conceptual structure of the discipline. Metadata emphasizes syntax and the mapping of data structures into computer systems. Accordingly, a metadata model can reference the terminology, but a terminology cannot necessarily be constructed from a metadata model.

Standardized Messaging Formats

The third broad type of information structure is the electronic format by which messages are sent and received. Health Level 7 (HL7) is a standards development organization emphasizing methods to standardize messaging between applications, with the goal of making systems interoperable. Standardization is intended to proceed in an iterative cycle, whereby domain analysis informs message design, message design informs message specifications, message specifications drive requirement analysis, and these are examined against domain analysis. It is believed that a continuous refinement or “harmonization” of message content and message structure will thus evolve.35,41 To this end, HL7 has identified the following requirements of an information model in support of messaging:

  • Provides precise definitions for the information from which the data content of HL7 messages are drawn

  • Follows object-oriented modeling and diagramming techniques

  • Provides a means for expressing and reconciling differences in data definition independent of the message structure

  • Forms a shared view of the information domain used across all HL7 messages

Among the models sets that define the HL7 Message Development Framework is the Reference Information Model (RIM). The purpose of the RIM is “to specify the conceptual data model of the clinical domain and to identify the object life-cycle events that require communication.”42 The conceptual data model is not the same as the conceptual domain model. Terminologies that make explicit the domain model reside in external coding systems and “populate” the data models supported by the RIM. Of particular interest to nursing is the clinical portion of the RIM, the Unified Service Action Model (USAM). Assuming a robust reference terminology model, any information model should be sufficiently structured to form clinically meaningful statements that are consistent with the knowledge representations in a reference terminology. The present difficulty is that the division between a reference terminology model and a reference information model is not clearly delineated.


The nursing vocabulary summit participants were challenged to delineate approaches to represent and model the data, information, and knowledge relevant to the domain. We have argued that multiple embedded structures must be accommodated in any reference terminology or information model that deals with the use of lexical elements to represent concepts. The clinical concept of pressure ulcer was presented as an example. While all the approaches to representing and modeling nursing knowledge provide a potentially rich resource for the range of needed concepts, levels of abstraction, definitions, and relationships, much work is needed on the development and evaluation of inclusive reference terminology and information models.

Major initiatives are required to develop and refine a concept-based reference terminology from which clinical information systems, data sets, and messaging formats can model meaningful statements and conversions of data, information, and knowledge across applications. These statements must accurately reflect very specific approaches to structured domain knowledge. Many details need to be worked out at lower levels of both terminology and information models.

Recommendations are to fully identify the polyhierarchies and categorical structures required in a reference terminology, systematically evaluate the extent to which approaches to structured information accurately and completely represent domain knowledge; and modify or extend existing multidisciplinary efforts. Small-scale, focused projects may be particularly useful in identifying issues such as term equivalence, level of abstraction, and modifications required to map across terminologies. For example, our group is identifying ways to represent and model specific clinical conditions such as pressure ulcers, postoperative pain, and functional status across the research vocabulary, clinical vocabulary, and various data sets.

This paper focuses on only one aspect of the many challenges that arise in developing convergence of reference terminology and information models for nursing. We have not discussed important issues such as access to the automated tools required for constructing and evaluating reference models. As in other areas of clinical system development, combined efforts are likely to be more successful than isolated individual efforts. The vocabulary summit provided an opportunity to begin working together to meet such goals.


  • This work was supported in part by grant LM 07041-15 from the National Library of Medicine and by the Mayo Division of Nursing Research.


View Abstract