OUP user menu

Data from clinical notes: a perspective on the tension between structure and flexible documentation

S Trent Rosenbloom , Joshua C Denny , Hua Xu , Nancy Lorenzi , William W Stead , Kevin B Johnson
DOI: http://dx.doi.org/10.1136/jamia.2010.007237 181-186 First published online: 1 March 2011


Clinical documentation is central to patient care. The success of electronic health record system adoption may depend on how well such systems support clinical documentation. A major goal of integrating clinical documentation into electronic heath record systems is to generate reusable data. As a result, there has been an emphasis on deploying computer-based documentation systems that prioritize direct structured documentation. Research has demonstrated that healthcare providers value different factors when writing clinical notes, such as narrative expressivity, amenability to the existing workflow, and usability. The authors explore the tension between expressivity and structured clinical documentation, review methods for obtaining reusable data from clinical notes, and recommend that healthcare providers be able to choose how to document patient care based on workflow and note content needs. When reusable data are needed from notes, providers can use structured documentation or rely on post-hoc text processing to produce structured data, as appropriate.

  • Medical informatics applications
  • medical informatics computing
  • medical records systems
  • computerized
  • support
  • US Gov't
  • P.H.S.
  • user-computer interface


The process and products of documenting clinical care occupy a critical intersection among the diverse domains of patient care, clinical informatics, workflow, research, and quality. For the current manuscript, we define clinical documentation as the process of creating a text record that summarizes the interaction between patients and healthcare providers occurring during clinical encounters. Clinical documents produced in this process may include notes from outpatient visits, inpatient admissions and discharges, procedures, protocols, and testing results. Healthcare providers generate clinical documents to achieve numerous goals, including: to create narrative reports of their observations, impressions, and actions related to patient care; to communicate with collaborating healthcare providers; to justify the level of service billed to third-party payers; to create a legal record in case of litigation; and to provide data to support clinical research and quality-assessment programs.19 Increasingly widespread adoption of electronic health record (EHR) systems has facilitated reuse of clinical documents for research, quality initiatives, and automated decision support, among other uses.1015 A major emphasis for integrating documentation systems with EHR systems is to increase the availability of structured clinical data for automated downstream processes. This has led to a profusion of computer-based documentation (CBD) systems that promote real-time structured clinical documentation.

The myriad requirements imposed on clinical documentation compel healthcare providers to create notes that are simultaneously accurate, detailed, reusable, and readable.11 ,1620 As a result, integrating clinical documentation into workflows that contain EHR systems has proven a challenge.18 ,2029 The complex interplay among the note characteristics healthcare providers value, the structure and standardization that data reuse requires, and the attributes of various documentation methods each affect the documentation method's adoption into clinical workflows. The flexibility of a CBD method to allow healthcare providers freedom and ensure accuracy can directly conflict with a desire to produce structured data to support reuse of the information in EHR systems. The challenge is compounded when those implementing CBD systems have different priorities from clinician users working in busy settings. In particular, we have previously demonstrated that healthcare providers prefer the ability to balance using a standardized note structure and having the flexibility to use expressive narrative text.19 ,25

This viewpoint paper explores the tension between requirements that documentation methods support both structure and expressivity, then reviews two general approaches for obtaining structured data through clinical documentation, and provides our perspective of how best to use CBD systems in a clinical environment.

Tension between structure and expressivity

There exists a tension between the needs of busy healthcare providers documenting clinical care and of those reusing data from healthcare information systems. Busy clinicians generally value flexibility and efficiency, while those reusing data often value structure and standardization.21 ,22 ,3032 In investigations evaluating healthcare providers' impressions of documentation systems, subjects articulated that the documentation methods they use should promote the quality and expressivity of notes they generate and should integrate efficiently into busy workflows.25 ,32

Expressivity has been defined as how well a note conveys the patient's and provider's impressions, reasoning, and thought process; level of concern; and uncertainty to those subsequently reviewing the note.25 ,32 Expressivity refers to the linguistic nuance necessary for describing aspects of the patient encounter using any words or phrases that the healthcare provider deems appropriate, at whatever length is necessary. An expressive documentation method, for example, may allow a healthcare provider to create notes using nuanced words or pictures that capture the flavor of the clinical encounter. Healthcare providers may rely on expressivity to convey: (1) patients' linguistic and narrative idiosyncrasies, (2) their level of concern or acuity, (3) the appearance of competence, (4) the provider's degree of uncertainty, and (5) the unique aspects of the clinical case that distinguishes it from other similar cases. Although healthcare providers value documentation methods supporting narrative expressivity, there have been few studies published in the biomedical literature of CBD systems directly evaluating this attribute, and its value may vary based on the type of note being written. Among existing studies, investigators have demonstrated that when compared to highly structured diagnostic or impressions data, clinical notes containing naturalistic prose have been more accurate,33 more reliable for identifying patients with given diseases,34 and more understandable to healthcare providers reviewing patient records.35 ,36 In addition, numerous structured documentation systems include components that generate natural prose notes to increase their acceptance,35 ,3739 which implies that natural-language text has value to clinical users.

While the value of expressivity is unknown, there is value in having access to reusable structured data from clinical notes.18 ,4042 Unfortunately, systems optimized to acquire structured data from healthcare providers often have user interfaces that are idiosyncratic, inflexible, or inefficient, and thus place the burden of entering the data in a structured format on a busy healthcare provider, rather than leveraging specific computer programs to extract the data from the human-input clinical narrative.23

Healthcare providers using EHR systems have two major methods for converting their observations and impressions of patient care episodes into machine-computable and reusable data. With the first method, healthcare providers directly create structured clinical notes by using specialized CBD systems that capture structured data in real time. With the second method, healthcare providers document patient care episodes using relatively unstructured approaches, such as using dictation with transcription or a computer-based CBD system, and then apply computer programs designed to extract clinical data from the notes' text. These approaches are reviewed in greater detail below.

Using structured documentation tools

Clinical structured entry systems are specialized CBD systems that emphasize capturing structured (ie, conforming to a predefined or conventional syntactic organization) and/or standardized (ie, conforming to a predefined semantic standard) clinical data during standard documentation processes.43 Users document in structured entry systems by retrieving concepts and assigning them a status (eg, locating in a ‘knee pain’ template the concept ‘knee effusion’ and selecting ‘absent’). Selectable concepts may come from sophisticated reference terminologies,44 ,45 from specialized interface terminologies,32 or from informal and nonstandardized term collections. Structured entry systems may allow users and developers to create highly customized templates to maximize data completeness and structure.32 However, such templates may not easily use existing interface terminologies or conform to knowledge representation formalisms.

Over the last half-century, investigators, developers, and pioneers in the field of biomedical informatics developed numerous structured entry systems. As we reported elsewhere,46 structured entry systems described in the biomedical literature include groundbreaking documentation systems by Slack,47 ,48 work by Ledley for documenting radiology repots,49 and those by Stead and Hammond that allowed patients to enter their own histories and that ultimately evolved into the TMR medical record system.5053 Other early structured entry systems include Barnett's COSTAR system,54 Weed's PROMIS system,55 Wirtschafter56 and Shortliffe's57 independent works developing chemotherapy documentation, and decision-support systems. Additional structured entry systems have been described, including ARAMIS which served as a user interface for data collection about patients with rheumatoid arthritis58 ,59; an endoscopy documentation tool called CORI60 ,61; Musen's T-HELPER system62 for HIV-related clinical trial and medical care documentation; Shultz's QUILL system38 ,46 ,63; Johnson's pediatric documentation system, Clictate64, and several structured entry systems in the government and private sectors, including in products by Cerner, Eclipsis, EpicCare, and the Department of Veterans Affairs.15 ,6568 Many of these structured documentation systems also allow users to enter narrative text in situations where they cannot find appropriate structured concepts.

While numerous structured entry systems were developed, deployed and described in the biomedical literature, there are limited data demonstrating ongoing adoption or widespread dissemination of structured entry outside of niche settings or clinical domains. Research evaluating why structured entry usage is limited remains sparse, but includes qualitative work by McDonald,21 Ash,69 and Johnson.16 ,64 These studies suggest that structured entry systems can have complex interfaces that slow the user down. In addition, structured entry tends to be inflexible in situations where a documentation template does not contain a needed item and may not fully integrate with other EHR system components. Johnson's studies in particular demonstrated that structured entry users believed that the system helped them comply with clinical guidelines. Although it increased documentation time when compared to paper-based forms, it did not decrease clinician or patient satisfaction.16 ,64 Additional studies identified specific attributes of structured entry systems that can attenuate their efficiency, integration, user interface navigation, and overall user satisfaction.7073 Structured documentation systems using guideline-based templates may help healthcare providers to be thorough.64 ,74 ,75

Using flexible documentation tools with text processing

With flexible documentation, healthcare providers record patient care episodes using relatively unstructured approaches, such as using dictation with transcription, speech-recognition software, or typing using a loosely templated CBD system. Once the clinical documentation is complete, post-hoc text processing algorithms can be used to produce structured data. We use the term ‘text processing’ to describe any of a number of methods designed to identify specific text, data, and concepts from the natural language stored in unstructured, narrative-text computer documents.7681 Computer programs can then deduce concepts contained in the notes' text in a subsequent step. Investigators have worked for decades to convert natural language into structured representations of those documents.7680 Text-processing technologies have been developed with differing levels of machine ‘understanding,’ from simple systems that search for key words or specific text strings8288 to those systems that attempt to capture clinical concepts with their context.76 ,80 ,89100 Text-processing tools can also serve as adjuncts to structured documentation systems, as recently described by Johnson discussing ‘structured narrative.’101

A basic approach to text processing involves searching documents in EHR systems for key narrative-text strings or string patterns. The narrative-text string search has been successfully employed in some large-scale clinical research studies.8285 For example, researchers have used plain text searches of dictated or typed medical records to find rare physical exam findings82 and post-operative infections,85 and identify patients with certain types of drug-induced liver injury84. Others have employed focused ‘regular expression’ pattern matches to extract text strings that can represent possible blood pressures83 and common section header labels (eg, ‘chief complaint’) in clinical notes,86 and remove patient identifiers such as names, phone numbers, and addresses to deidentify medical records.87 These methods are best adapted for solving focused problems. Since string-matching algorithms can be highly tuned for a given task (eg, by including manually derived synonyms, common abbreviations, and even misspellings), they can perform very well. Simple text searching also has an advantage of faster processing speed and easier implementation than complex natural-language processing (NLP) systems, as many off-the-shelf database systems and text-indexing tools support preindexed text queries. However, simple text searching is limited by a lack of generalizability and requires substantial customization for each new task. For example, finding all patients with liver injury from a single medication (such as phenytoin) is easily accomplished with a simple text search, but finding all medications that may be associated with liver injury requires a more complicated system that matches text with a controlled vocabulary that would include all medication brand and generic names. In addition, problems associated with directly entered clinical notes, such as misspellings102 and ambiguous abbreviations (eg, ‘pt’ can stand for ‘patient’ or ‘physical therapy’),103 also limit the use of the simple string-matching methods.

A more complex approach, called concept identification or concept indexing, attempts to normalize the text phrases to standardized terms representing concepts. Successful concept indexing systems include those by Miller and Cooper89, MetaMap76, SAPHIRE90, the KnowledgeMap concept identifier,80 ,91 the Multi-threaded Clinical Vocabulary Server,92 ,93 ,104 and the recently released clinical Text Analysis and Knowledge Extraction System (cTAKES).105 A goal of concept indexing is to ‘understand’ the information in natural-language documents by mapping text to standardized concepts using terminologies such as those in the Unified Medical Language System (UMLS).106 These systems have proven effective, mapping natural-language texts to concepts with recall and precisions often exceeding 80% for general tasks and near perfect for some highly focused tasks.80 ,95 ,107 ,108

Investigators have extended concept identification systems to combine them with additional algorithms that can extract contextual elements (eg, certainty, values, and temporal information) associated with a concept to form more robust NLP systems. Combining concept identification systems with negation detection algorithms (eg, ‘no chest pain,’ which indicates the absence of the finding ‘chest pain’), investigators have created systems that automatically generate problem lists,109 discover gene–disease associations,110 hypothesize new drug effects111 and new drug–drug interactions from the biomedical literature,96 and identify important findings from clinical narratives.97 ,98 Building on the work of the Linguistic String Project,79 Friedman and colleagues have been developing the MedLEE (Medical Language Extraction and Encoding System) NLP system since the 1990s.78 ,95 ,112 It can identify UMLS concepts from clinical documents, and can discern their timing and negation status. Researchers from several institutions have used MedLEE to identify pneumonia and other clinical conditions from chest x-rays,99 detect adverse events,112 and automatically calculate the Charlson comorbidity index.113 Several of these systems have been successfully incorporated into production clinical systems, some at multiple institutions.78 ,95 ,109 ,112 ,114 ,115

The uses of NLP in the clinical domain are quite diverse, including encoding clinical notes for billing purposes,95 ,116 facilitating clinical research by automating the data-extraction processes,110 ,117 ,118 conducting EHR-based surveillance,97 ,119 and enriching EHR functionalities, such as to support visualization tools120 and clinical decision-support systems.121 ,122 One of the earliest examples is the MedLEE system,123 which has been used to process chest radiographs to generate coded data for decision-support systems at New York Presbyterian Hospital since 1995. Recently, Day and colleagues121 reported the use of the MPLUS NLP system to classify trauma patients at a Level 1 trauma center on a daily basis. For specific tasks such as those mentioned above, advanced NLP systems such as MedLEE have shown equivalent performance as domain experts. However, the performance of current clinical NLP systems is still not satisfactory for broader uses. In addition, few NLP systems have been implemented in clinical settings and used in routine workflows in hospitals. For example, Kashyap et al, reported a study of using a commercial product to automatically structure admission notes in which the results were not judged to be acceptable for clinical use.124 With further improvements in advanced NLP technologies and new structures for clinical notes (such as the ‘structured narrative’),101 NLP may be able to structure clinical text such as admission notes automatically with a satisfactory performance.


The choice of a documentation method can alter the balance between expressivity and structure in the resultant notes, hamper the healthcare provider's workflow, influence the process and products of recording clinical information, and influence how well the note can be incorporated into an EHR system in such a way that the note's contents can be automatically reused and analyzed.15 ,19 ,26 ,36 ,70 ,101 While structured documentation systems can facilitate data collection and reuse, they can be cumbersome to use during patient encounters and may lack the flexibility and expressivity required for general medical practices. Transcribed notes create documents useful for text processing, but can require a time delay for the transcription process to occur. The attributes associated with each documentation method influence how they are best used and adopted. While structured entry emphasizes data standardization and structure, human adoption of CBD systems requires an emphasis on expressivity, efficiency, flexibility, and being well adapted to a typical workflow.19 ,22 ,27

Both structured clinical documentation and text-processing algorithms for flexible documentation continue to evolve. The rates of evolution may not be the same for each, but we are not aware of clear evidence that shows that one approach is improving faster than the other. Furthermore, we do not intend to convey an impression that one is necessarily more advanced or has greater promise than the other. Each approach offers distinct advantages to the user. For example, because structured documentation systems can do more than just create data, it may be relatively appealing to clinicians in certain settings. A template-based structured documentation system may be useful for notes that have a standard and predictable content and format, such as those recording pediatric health maintenance visits, preoperative evaluations, formalized disability examinations, and reviews of systems. By contrast, in some settings, clinicians may prefer the flexibility of complex and nuanced narrative text to document the history of a present illness or a diagnostic impression.

We collectively have over 10 decades of personal experience in viewing or using a wide variety of computer-based documentation systems around the country in public, private and academic settings. We gained this experience in the settings of actual clinical use, observations during site visits, and direct reviews of vendor products at trade shows and professional conferences. Based on this experience, we have observed that typically, for a given site's implementation, computer-based documentation systems are usually configured to take clinical input primarily in narrative or structured form, but not both. Most computer-based documentation systems that the authors have seen do not support hybrid documentation in the way that we recommend in the current manuscript. We note that, in the case where structured clinical data are needed, information entered using structured entry would be immediately available, whereas narrative text would require processing via NLP algorithms to become similarly ‘structured.’ Nevertheless, we observe that in our experience, few nonacademic sites have NLP-based text processing systems readily available to harvest EHR system data, with the possible exception of NLP-type systems that help to extract billing codes from clinical records.

Priorities for structured documentation and text processing will likely vary in different settings and with different user groups, and the priorities may be informed by the tasks the documentation supports. Certain tasks may require that the EHR systems understand only structural information from the clinical document (eg, its metadata or header information), while others may require a deeper concept-level understanding. For example, a quality-assessment program evaluating whether inpatient progress notes are being placed in the chart in a timely manner may require only the document title and date. Any document that has the correct title, including scanned and tagged paper-based notes, could support this need. Tasks that depend on a deeper understanding of the document may require healthcare providers to document using structured documentation tools or to apply computer programs to process the natural language in narrative-text documents. For example, to implement an automated colorectal cancer screening advisor, the advisor system will need to know the patient's family and past medical history, whether a prior screening test had occurred, and the result of any screening tests. Gathering these data requires that healthcare providers document the relevant information using tools that can support data extraction and analysis, including CBD systems, structured entry tools, or narrative-text typing (either directly or via a dictation/transcription model) with a subsequent application of text processing. In addition, different healthcare providers may value structured data in EHR systems differently. Some providers may be willing to sacrifice a degree of CBD usability or efficiency for the sake of having key elements of their notes available immediately as structured data in EHR systems. Others may preferentially value documentation methods that allow them to express their impressions fluidly or create notes more quickly, without regard to how well the methods support other functions of EHR systems.


Given the tension between structure and flexible documentation, those implementing EHR systems should assume that multiple CBD products will be needed to meet the needs of clinician users, rather than attempting to find a single best documentation method. Factors to consider in selecting CBD products include the method's fitness for a given workflow, the content in the resultant notes, time efficiency to create a note, costs, ease of using the method, flexibility for using the method to document unforeseen clinical findings or in unexpected circumstances, and support for narrative expressivity, machine readability, and document structure.19 ,25 ,46 Certain documentation methods feature some attributes at the expense of others, such as promoting narrative expressivity at the expense of formal note structure. Rather than waiting until documentation tools can be created to accommodate all needs and workflows, or force complex clinical workflows to change to accommodate the EHR system rollout, we recommend that healthcare providers be able to access a variety of different documentation methods and select the one that best fits their documentation, data, and workflow needs. EHR system developers and users can weigh different documentation methods in terms of how they impact relevant documentation-related outcomes such as usability, efficiency, quality, and readability against their utility for EHR systems. The value of this approach is that it allows organizations considering EHR systems to prioritize development and implementation efforts around clinical documentation.


The project was supported by a grant from the United States National Library of Medicine (Rosenbloom, 1R01LM009591-01A1).

Competing interests


Provenance and peer review

Not commissioned; externally peer reviewed.


View Abstract