OUP user menu

A systematic review to evaluate the accuracy of electronic adverse drug event detection

Alan J Forster, Alison Jennings, Claire Chow, Ciera Leeder, Carl van Walraven
DOI: http://dx.doi.org/10.1136/amiajnl-2011-000454 31-38 First published online: 1 January 2012


Objective Adverse drug events (ADEs), defined as adverse patient outcomes caused by medications, are common and difficult to detect. Electronic detection of ADEs is a promising method to identify ADEs. We performed this systematic review to characterize established electronic detection systems and their accuracy.

Methods We identified studies evaluating electronic ADE detection from the MEDLINE and EMBASE databases. We included studies if they contained original data and involved detection of electronic triggers using information systems. We abstracted data regarding rule characteristics including type, accuracy, and rationale.

Results Forty-eight studies met our inclusion criteria. Twenty-four (50%) studies reported rule accuracy but only 9 (18.8%) utilized a proper gold standard (chart review in all patients). Rule accuracy was variable and often poor (range of sensitivity: 40%–94%; specificity: 1.4%–89.8%; positive predictive value: 0.9%–64%). 5 (10.4%) studies derived or used detection rules that were defined by clinical need or the underlying ADE prevalence. Detection rules in 8 (16.7%) studies detected specific types of ADEs.

Conclusion Several factors led to inaccurate ADE detection algorithms, including immature underlying information systems, non-standard event definitions, and variable methods for detection rule validation. Few ADE detection algorithms considered clinical priorities. To enhance the utility of electronic detection systems, there is a need to systematically address these factors.

  • Adverse drug reactions
  • adverse drug events
  • electronic triggers
  • hospital information systems
  • accuracy
  • patient safety


Adverse drug events (ADEs) occur when patients experience medication related harm. ADEs occur in 6.5% of hospitalized patients and are considered a major threat to patient safety.1 To reduce ADEs it is necessary to systematically detect them. Unfortunately, voluntary incident reporting, retrospective chart reviews, and prospective ADE surveillance, representing the currently used methods of ADE detection, are limited in effectiveness or affordability. ‘Electronic ADE detection’ uses computer-based algorithms to automatically screen electronic healthcare databases for events suggestive of ADEs. Such computer-based methods involve queries of data that reside in different hospital information systems including laboratory, pharmacy, and administrative databases.2 This approach has been promoted as a potential meaningful use of electronic health record data.

There are several advantages of using electronic methods for ADE detection. They are based on routinely collected, readily available data and are therefore less expensive and less time-consuming than medical record review. In addition, database queries are based on objective criteria (eg, diagnostic codes, laboratory values) that lead to standardized detection processes3 and eliminate reviewer subjectivity and error.2 ,4 ,5 Finally, some electronic data are available in real time and allow for prospective detection and prompt interventions.6 ,7

Despite the promise that electronic methods hold for identifying ADEs, there are several barriers to realizing their full potential. The accuracy of electronic ADE detection is a function of the availability of integrated data systems and their criteria.8 ADE detection rules based on non-integrated information systems will be less specific because they will be restricted to single clinical criteria. Finally, electronic ADE detection is further complicated by inconsistent ADE definitions,4 nomenclature,3 and methods for defining a gold standard.7 ,9 These inconsistencies make building reliable ADE detection systems difficult, and are problematic for internal and external validity.

We conducted a systematic review of prior attempts at implementing electronic methods of ADE detection in adult populations. Our primary outcome was to determine the accuracy of published detection rules. Secondary outcomes were to determine (1) if studies derived detection rules based on prior knowledge of ADE prevalence, and (2) if the rules used by studies detected specific ADE types or generic events. The results of this review are useful to teams implementing electronic ADEs or interpreting the results from an electronic ADE detection system.


Data sources

We obtained relevant citations from the MEDLINE (1948–2011) and EMBASE (1980–2011) databases using the search strategy outlined in online supplementary appendix 1 (search date: January 21, 2011). We searched using a combination of medical subject headings (MeSH) and keywords related to ADEs or adverse drug reactions (ADRs) and computerized systems/detection. We also manually searched the references of included studies for additional articles not identified in the database search.

Study selection

We retrieved full text articles if the title or abstract suggested that investigators evaluated a computer-based system of ADE or ADR detection. We included studies that presented original data and in which the system involved searching for electronic triggers (ie, electronic records of specific clinical events or criteria) suggestive of an ADE from any of the following information systems: laboratory, pharmacy, radiology, or administrative. In order to meet our objectives and to reduce heterogeneity, studies were excluded if they did not actually state or provide examples of the detection rules used, focused exclusively on drug–drug interactions or vaccine adverse effects, detected ADEs in randomized controlled trials, detected adverse events in general without a separate analysis of ADEs, were conducted for the purposes of pharmacovigilance or post-marketing drug surveillance, were conducted in the neonate or pediatric population, or were not published in English.

Data abstraction

Two of three reviewers (AJ, CC, or CL) independently abstracted the data. Discrepancies were resolved by discussion involving a third reviewer (AF) if necessary to achieve consensus. For all included studies we abstracted data on study characteristics including study setting, design, objective, and sample size. We abstracted data on rule characteristics including the information system(s) used as well as rule type and criteria. We classified detection rule type based on whether it used a single criterion (eg, international normalized ratio (INR) >5) or combined criteria (eg, receiving warfarin and INR>5). Rule criteria referred to the specific data elements employed by the rules to detect the ADEs (eg, ICD code or antidote drugs). We also abstracted the actual detection rules/outcome triggers employed by each study. Finally, we determined if the detection rules were used to detect specific ADEs (eg, bleeding as a result of anticoagulation) or generic ADEs (eg, an injury resulting from an intervention related to a drug).

A study was considered to have assessed accuracy if it reported rule sensitivity, specificity, positive or negative predictive value, or any other accuracy measure. To determine the proportion of studies that properly assessed rule accuracy, we abstracted data about the gold standard used. We categorized studies that used a gold standard into three groups: ones that applied chart review on every study patient, studies that applied chart review on only electronic screen-positive patients, and those that used some other gold standard. We abstracted information regarding the methods used in studies that presented a measure of accuracy but did not employ a true gold standard. We also abstracted data on the overall and individual rule accuracy reported by the studies.

To determine if studies considered the underlying ADE prevalence when deciding on, or deriving detection rules, we abstracted data on the stated rationale for the derived rules. We generally defined studies to have considered the underlying ADE prevalence (or to have based their choice on clinical need) if their detection focused on the most frequent ADEs or those that cause the greatest harm.


The search in MEDLINE and EMBASE identified 2045 citations, of which 85 articles were retrieved and reviewed in full. We excluded 52 articles for the various reasons outlined in figure 1. We further identified 17 additional articles from manually searching the references of included articles. Our final analysis included 48 cohort studies (50 articles). For the two articles10 ,11 that referenced the same study as another included article, we abstracted data from all references.

Figure 1

Study selection. This figure illustrates how studies were identified and included in our review. ADE, adverse drug event.

Study characteristics

The studies were published between 1988 and 2009, the majority of them in recent years. The median sample size was 3889 observations (25th to 75th percentile range, 430–38 467). Forty-one (85.4%) studies were conducted on hospitalized patients, three (6.3%) on outpatients and four (8.3%) on both. Of the in-patient studies, 15 (31.6%) were conducted in specialized patient services (table 1).

View this table:
Table 1

Study characteristics

Twenty-seven (56.3%) studies were prospective, while 14 (29.2%) were retrospective, four (8.3%) were retrospective case–controls, two (4.2%) had both prospective and retrospective components, and one (2.1%) was a prospective case series. Twenty-eight (58.3%) studies derived and/or evaluated electronic ADE detection rules, while 11 (22.9%) studies reported ADE incidence, four (8.3%) studies described the detection technology and methodology, three (6.3%) studies were interventions involving electronic ADE detection, and two (5%) other studies included a cost analysis and a risk association study.

Rule characteristics

Twenty-nine (60.4%) studies utilized two or more information system sources for their ADE detection rules, while 19 (39.6%) studies utilized only single information systems (seven used laboratory data, six used pharmacy data, and six used administrative data) (table 2). The detection rules in 28 (58.3%) studies were based on a single criterion and the rules in the other 19 (39.6%) studies used combined criteria from multiple information systems. One study used rules with a single criterion as well as rules with multiple criteria.34 The rule criteria in the majority of studies using multiple databases was combinations of drug, serum drug, and laboratory data to detect ADEs. Eighteen (37.5%) studies4 ,10 ,11 ,21 ,25 ,27 ,28 ,31 ,32 ,35 ,3740 ,45 ,4850 ,54 ,56 used antidotes, alone or in combination with other criteria, while 10 (20.8%) studies1012 ,1618 ,28 ,31 ,32 ,34 ,44 ,55 used International Classification of Diseases Ninth Revision (ICD-9) or ICD-10 codes, alone or in combination with other criteria, to detect ADEs.

View this table:
Table 2

Characteristics of reviewed studies

Only eight (16.7%) studies used detection rules to detect specific ADE types. Three (6.3%) studies30 ,33 ,58 targeted specific drugs that caused ADEs, while four (8.3%) studies14 ,43 ,44 ,51 targeted certain conditions. One (2.1%) study,46 which chose rules based on the most common ADEs, targeted a combination of condition specific and drug specific ADEs. Studies that did not detect specific types used rules designed to capture the presence of any ADE. Twenty studies (41%) evaluated rules which were used retrospectively to identify ADEs, while 28 studies (58%) evaluated rules which were used to alert physicians to the presence of an ADE.

Studies that assessed the accuracy of detection rules

Half of the studies (50%) reported some measure of accuracy. Fourteen (29.2%) of these studies used a ‘gold standard,’ while the remaining 10 (20.8%) studies calculated accuracy using the results of ADE verification. Verification, which usually included pharmacist/physician review, was part of the process of determining whether an ADE trigger was truly an ADE and thus was not a ‘gold standard.’ Since the absolute number of ADEs was unknown, these studies could only calculate the positive predictive value as a measure of rule accuracy (table 3).

View this table:
Table 3

Detailed description of 24 studies that assessed rule accuracy

Of the gold standard studies, nine (18.8%) used chart review in every study patient independent of the results of electronic detection. Two (4.2%) studies used chart review in the rule positive patients only, so, as in the studies using verification, only PPV could be calculated. However, one of these studies also reported sensitivity and specificity by using estimates of visits with or without ADEs.11 ,31

Three (6.3%) studies used methods other than chart review alone. Two (4.2%) of these studies compared the number of ADEs detected electronically to the total number of ADEs found with two methods (electronic detection plus surveillance53 or electronic detection plus stimulated spontaneous reporting).22 Another study measured true and false positive results based on whether the attending physician wrote orders consistent with the alert recommendation.46

Accuracy results for detection rules are shown in table 3. Overall detection rule accuracy was variable across studies and sometimes was generally quite low. Ranges for sensitivity, specificity, and positive predictive value were 40%–94%, 1.4%–89%, and 0.9%–64%, respectively. Only two studies presented negative predictive values. The list of individual rules and individual rule accuracy, where available, are found in online supplementary appendix 2.

Reason for chosen ADE detection rules

It was unclear in four (8.3%) studies12 ,16 ,45 ,54 why they selected the rules they employed to detect ADEs. Twelve (31.3%) studies5 ,10 ,13 ,1720 ,23 ,24 ,26 ,28 ,36 ,55 chose their rules based on the clinical information available in their information system(s), 16 (33.3%) studies11 ,21 ,22 ,25 ,27 ,31 ,32 ,3739 ,41 ,42 ,4750 ,52 used previous available rules and four (8.3%) studies4 ,15 ,34 ,35 used a combination of these reasons. Three (6.3%) studies43 ,51 ,57 based their decision on a specific research question or condition of interest and five (10.4%) studies29 ,37 ,40 ,53 ,56 on previous research or common practice (eg, use of known antidotes for ADRs).

Five (10.4%) studies derived or used detection rules that were defined by clinical need or the underlying ADE prevalence (table 4). Three of these studies detected ADEs that were directly considered to be the most common types.4 ,6 ,58 Of the two remaining studies, one detected a specific ADE type because there were increased reports of its occurrence,2 and the other study focused detection on an ADE type thought to have the greatest harm reduction potential.8 Except for one study,8 actual baseline institutional data were not used to determine what the most frequent ADE types were, and for which detection rules should have been derived or used.

View this table:
Table 4

Studies with detection rules defined by clinical need


This systematic review on electronic ADE detection revealed some key limitations in the current literature: (1) most studies could not properly assess rule accuracy because they did not utilize a gold standard or did not apply the gold standard to all patients; (2) the accuracy of detection rules varied widely because of inconsistent event definitions and methodologies used for derivation and validation; (3) extremely few studies considered the underlying ADE prevalence when choosing which rules to derive or use; and (4) the majority of rules did not detect specific ADE types.

Currently, there is no systematic approach to validating ADE detection methods. An appropriate comparative gold standard is not routinely used, and if used, differs across studies resulting in varying measures of accuracy. Of the 24 studies in our systematic review that assessed rule accuracy, slightly less than half had appropriately used a gold standard and were thus able to measure rule sensitivity and specificity. The accuracy reported from these few studies was generally low but displayed a wide range, making it difficult to draw conclusions about the overall effectiveness of electronic ADE detection. The variability in evaluations we observed is likely, in part, caused by the expense of such efforts combined with the absence of funding. It can be challenging to obtain research funds to perform this work. As a result, most institutions will implement these rules with only partial evaluations, at best.

Difficulties making comparisons across studies are also compounded by non-standardized event definitions and detection methods.7 ,9 The included studies varied considerably in the information system(s) used and the rule criterion. We found that 40% of all studies only used a single information system. Single information systems restrict the number of criteria that can be used in electronic detection rules, which can reduce rule accuracy. System infrastructure which can support the linkage of multiple hospital information systems can positively influence the specificity of detection rules.2 ,59 Early electronic detection studies were limited by non-integrated information systems, and hence they derived less specific rules that were restricted to a single clinical criterion such as the prescription of an antidote or an abnormal laboratory result.13 ,21 ,42 ,54 In our systematic review, the detection rules in 58% of the studies were based on a single criterion. For example, rules such as ‘serum potassium less than 3.0 mmol/l or more than 6 mmol/l’13 or ‘patient receiving diphenhydramine’35 have high false positive rates because their occurrence is most commonly not related to an ADE. Whereas a rule such as ‘receiving ranitidine AND platelet count has fallen to <50% of previous value’ would be more discriminate identifying ranitidine-induced thrombocytopenia.4

Detection rules are only as good as the data they use and frequently ADE information is non-specific as a result of the incomplete documentation in the medical record. For example, if a surgical patient develops respiratory compromise in the hospital, he may have similar treatments and tests, irrespective of the cause. Whether the respiratory compromise is due to an excess post-operative intravenous rate or diastolic dysfunction, the patient will still likely undergo testing with a chest x-ray, cardiac enzymes, and an electrocardiogram and be treated with oxygen, diuretics, and morphine. Since most current electronic records do not record diagnoses but do capture tests and treatments, the two cases would look the same to an electronic ADE detection system even though in one case it is an ADE and the other it is not.

The same issue can lead to insensitive rules as well. Some ADEs may not have an associated laboratory test or treatment. In these cases, the adverse event will require documentation of an ADE-related diagnosis in the electronic record. Since these are sometimes not stored electronically, detection of such ADEs will be impossible.

We also examined how studies derived or utilized detection rules. Very few studies (10%) derived or used rules that were defined either by clinical need or the underlying ADE prevalence. A number of studies used or developed rules for particular ADEs only because they were possible or the required source data systems were available. These studies did not necessarily detect the ADEs that they should be detecting and cannot help us identify what the most prevalent and important ADEs are. Further, few studies (17%) derived or used electronic rules that detected specific ADE types. Attempting to identify all ADEs is an extremely complex task and greater success may be achieved by focusing on key ADEs.

In addition to the issues with the current literature described above, it was noted that there are few studies from hospitals outside of the USA. Thirty-one (65%) of the 48 studies in our systematic review were American. There is a need to study electronic triggers using patients in other countries, including Canada, for two reasons. First, other countries have different healthcare systems (eg, Canada has universal healthcare), so hospital utilization patterns and in-hospital medication use may be different. This could lead to different patterns of ADEs and consequently differences in how they are captured using electronic means. In addition, structural differences exist in information systems. For example, discharge abstracts in Canada are currently based on ICD-10, whereas in the USA they are based on ICD-9. Furthermore, hospital discharge abstracts from Canadian hospitals indicate whether or not recorded diagnoses are hospital complications. These differences in coding diagnoses affect the type of rule criteria that can be used in each country to detect ADEs.

This review characterized existing reports on the use of electronic detection of ADEs and assessed their accuracy. The main limitations of this study are that it is purely descriptive and the studies are very heterogeneous. In addition, our search strategy may not have located all relevant studies as a result of the exclusions applied. Nonetheless, our review did highlight some important methodological limitations in the existing published studies of electronic ADE detection. We identified three main limitations: (1) most studies did not properly assess rule accuracy; (2) extremely few studies considered the underlying ADE prevalence when choosing which rules to derive or use; and (3) the majority of rules did not detect specific ADE types. These limitations need to be addressed to realize the full potential of electronic detection of ADEs. Future research should also focus on the identification of rule characteristics which predict benefit. It is notable that we are unable to make recommendations on which information systems are most likely to be beneficial in the development of electronic alerts. This is a result of the relative lack of well performed studies and the poor performance of most rules. In the development of new alerts, investigators and system developers should pay more attention to the relative prevalence of specific ADEs and focus on those which are most common and serious.

We suggest a more rigorous approach to rule development and reporting. As a basis for future advances, there is a need for the industry to develop and adopt universal standards in ADE classification. There is existing work in this area, however, international consensus would be helpful. Even in the absence of this there are some general guidelines which should be followed for future publications reporting on the accuracy of electronic ADE rules. First, they need to identify the motivation for rule development, including whether it is informed by the prevalence and severity of the underlying ADE in a specific patient population, and whether the rule is to be used for detecting ADEs for quality control or alerting providers for modifying clinical care. Second, any report on the rule needs to explicitly specify the components comprising the ADE, that is, the medication, the population at risk, and the outcome. Third, the report needs to link the clinical concepts defined by the ADE to data definitions. Fourth, investigators need to adhere to appropriate epidemiological techniques for reporting rule test characteristics. Adherence to these guidelines will facilitate progress in this field as they will improve the generalizability and reproducibility of the published work.

Competing interests


Provenance and peer review

Not commissioned; externally peer reviewed.


View Abstract