OUP user menu

★ Research Paper ★

Automated Evidence-based Critiquing of Orders for Abdominal Radiographs: Impact on Utilization and Appropriateness

Linda H. Harpole, Ramin Khorasani, Julie Fiskio, Gilad J. Kuperman, David W. Bates
DOI: http://dx.doi.org/10.1136/jamia.1997.0040511 511-521 First published online: 1 November 1997


Objective: Inappropriate utilization of diagnostic testing has been well documented. The purpose of this study was to measure the impact of presenting real time, evidence-based critiques about the appropriateness of abdominal radiograph (KUB)orders on physician decision making.

Design: Prospective trial where evidence-based critiques were presented to ordering clinicians in two kinds of situations: (1) a KUB was likely to have a low probability of providing useful information, or (2) an alternative view(s) was more appropriate given the clinical circumstance. There were two phases of the trial: Phase 1 was a 9-week period where evidence-based critiques were presented at the time of ordering a KUB, followed by Phase 2, a 19-week period in which orderers were randomized to receive critiques either amended to include both institutional data regarding the utility of the critiques and stronger messages about the lack of utility of the study, or the same critiques as presented in Phase 1, depending upon indication. Based upon the radiologist's report of their interpretation of the exams, the results of the examinations were scored as positive, equivocal, or negative using structured criteria.

Results: 299 KUBs in Phase 1 and 385 KUBs in Phase 2 received at least one critique. Cancellation rates of low yield films were low, and were similar in Phase 1 and 2, 8/258 (3%) vs. 10/283 (4%). Compliance with the recommendation for alternative view(s) was higher: 19/104 (38%) in Phase 1 vs. 96/176 (55%) in Phase 2 (p = 0.006). The rules differentiated low-yield from non-low-yield films: 5% of low-yield films vs. 20% of non-low-yield films were positive in Phase 2 (p < 0.0001). Surgical physicians were less likely to cancel (p = 0.07) or to change to the suggested view(s) (p < 0.0001) than medical physicians or nurses.

Conclusions: The intervention identified clinical situations in which KUBs appeared to have a low clinical yield. In response to evidence-based critiques, providers were reluctant to cancel their order, but were more willing to change to different views. To reduce the number of inappropriate radiographic films, stronger incentives or interventions may be required.

Inappropriate utilization of diagnostic testing has been well documented.14 In attempts to improve the appropriateness of testing, multiple methodologies, including audit and feedback, education, rationing, and financial incentives, have been attempted.18 These strategies have enjoyed some success6,9 but require considerable time and effort to implement. Also, even when these approaches have been successful, their effects have often diminished soon after the interventions were discontinued.10

Interventions that improve physician decision making may be most useful when applied at the time of ordering. The advent of physician computer order entry (POE) allows for this possibility.1113 In POE, orders are entered directly into an automated information system. The system can require structured ordering, and reminders can be presented at the time orders are written.

To date, studies using computer order entry have evaluated the impact of real-time reminders on the cost of test ordering,10,14,15 redundant laboratory testing,16 drug-drug interactions, and drug-lab interventions.11 However, the effect of real-time, automated critiques on the ordering of radiologic tests has not been reported.

In this study, we presented real-time comments about the appropriateness of abdominal radiographs (KUB) to physicians using a POE system, and we evaluated the impact of these critiques on their decision-making behavior.

Abdominal radiographs were chosen because they are performed frequently yet often provide little information; additionally there is substantial evidence regarding their utility for specific indications.1732.

We hypothesized that real-time critiquing during the use of POE could decrease inappropriate KUB ordering, thereby eliminating unnecessary films, as well as improve the usefulness of the test by suggesting alternative view(s) that could provide better clinical information.



The study was conducted at the Brigham and Women's Hospital (BWH), a 720-bed tertiary care teaching hospital. Its integrated, computerized hospital information system runs on a 3,000-node local area network (LAN) with 486-based personal computers serving as workstations.11 Within this computerized information system, a POE application system was developed and was implemented in 1993.11 At BWH, the primary users of POE are housestaff and nurses. At the time of this study, all in-patient orders were entered through the POE system. Most orders are written using menus, and more than 90% are captured in coded form.11

Capturing the Reason for the Radiograph in Coded Form

To order a radiologic study at our institution, orderers are required to provide the “relevant history items” and items to “rule out or assess.” Prior to the study, physicians ordering abdominal radiographs were required to enter the “relevant history items” and items to “rule out or assess” into a free text box. For this study, to permit critiquing, the patient's relevant history as well as items the physician wanted ruled out or assessed when ordering an abdominal radiograph were captured in coded form. To develop lists for these two fields, the free text reasons for a 2-month period were reviewed, put into categories, and ranked. The most common categorical reasons then were used for the structured KUB ordering form, which contained the 11 most commonly entered “relevant history” items and 10 commonly entered “rule out or assess” items. This structured ordering form was then tested and re-ranked before determining the final version (Fig. 1). Physicians were required to indicate at least one item from each section when ordering a KUB. A box for “other,” in which free text could be entered if none of the items applied, was included. For all indications physicians could add free text to describe the patient's situation in more detail.

Figure 1

Structured KUB ordering screen. Capturing history/assessment items; user must enter at least one of each.

Development of Critiquing Messages

The literature regarding KUBs was reviewed,1732 and lists of appropriate and inappropriate reasons for ordering a KUB were developed. From the common history and assessment items listed on the structured ordering form, eight combinations of history and assessment items that reflected inappropriate reasons for ordering a KUB were determined, based upon the literature and expert opinion in our medicine, surgery, obstetrics/gynecology, and radiology departments. These situations reflected instances in which, given the identified history and assessment items, a KUB was unlikely to add diagnostic information (see Appendix). The POE system was modified so that, when a physician entered certain history and assessment items (Appendix), a screen would be displayed stating that the test was unlikely to yield worthwhile information. For example, if G (right lower quadrant pain) and S (appendicitis) were chosen from the structured KUB ordering form (Fig. 1), a message would be presented to the effect that a KUB would be low yield and that an ultrasound test might be more useful (Fig. 2). When receiving a message that indicated that a film was likely to be of little value, the orderer had the option to cancel the order or to continue despite the critique. We refer to these kinds of messages as “low yield” messages.

Figure 2

Example of low-yield critique. The user is notified that KUB is not likely to yield worthwhile information and that ultrasound might be more useful.

Two situations in which an alternate view(s) might be superior to a KUB were also identified (Appendix). For example, if a KUB with both upright and supine views is ordered when perforation was suspected, a message would be displayed suggesting that a chest PA view and a KUB (supine alone) film would be preferable, unless the patient was unable to stand, in which case a lateral decubitus film of the abdomen and a KUB (supine alone) film would be recommended (Fig. 3). If both upright and supine views were ordered but neither perforation nor obstruction was suspected, a message would be displayed suggesting that the supine view alone would be sufficient. When a message suggested an alternate view(s), the orderer had the option to change to the suggested view(s), continue as ordered, or to cancel altogether.

Figure 3

Example of alternate-view critique. In some situations, an alternate view(s) might be superior to KUB.

For all exams, instructions directed the orderer to enter the most likely indication for the study first, the second next, and so forth. If more than one selection were chosen by the orderer for the “relevant history” or “item to rule out or assess,” the evidence-based message would be triggered from the first item entered only. When available, full abstract(s) from scientific papers supporting the messages were accessible on-line at the time of receiving a message.18,19,21,22,2631


Prior to implementation of the evidence-based messages, a letter of introduction and an explanation of the intervention were mailed via computer to all order-entry users. The letter was endorsed by the chairmen of the departments of Medicine, Surgery, Obstetrics/Gynecology, and Radiology.

Study Period and Outcomes

In Phase 1, from August 1, 1995, to September 30, 1995, all KUB in-patient orders were subject to the evidence-based messages. The main outcome measures were cancellation rates and the frequency with which orders were changed to suggested view(s). A secondary outcome was the yield of important diagnostic information in films that were predicted to be of low utility but were nevertheless obtained. (Films were graded by guided implicit review, described in the following section.)

In Phase 2, based on the findings from Phase 1, the evidence-based messages were amended, and orderers were randomized to receive either the same (“control group”) or amended (“intervention group”) messages. One message (Appendix, low yield rules, no. 5) was supposed to be amended for all future orders, as too many positive films were found when suggestions to cancel were overridden; however, the original rule was inadvertently displayed to the control group of orderers. Other “low yield” messages were amended to include data from the Phase 1 experience, further emphasizing that, for the given indications, KUB was likely to have little value. In addition, one of the alternate exam messages was re-written more emphatically. In Phase 2 of the evaluation of evidence-based messages (November 10, 1995, to March 21, 1996), all users of order-entry were randomized to receive either the initial or amended evidence-based messages to evaluate whether feed-back of institutional data and stronger messages might have a larger impact on ordering behavior than delivery of messages without local data alone.

KUB Film Radiographic Evaluation

The radiologists' transcribed reports of all KUBs were reviewed using a guided implicit approach by one reviewer (LH) and were scored as positive, equivocal, or negative. Specifically, the reviewer graded films as positive if any of the following were present: bowel obstruction, bowel ischemia perforated viscous, free air not following surgery, volvulus, misplaced feeding tube, evaluation of ureteral stent placement, evaluation of barium prior to CT scan, foreign body; as equivocal if one of the following were present: possible ileus, possible ileus vs. small bowel obstruction, moderately dilated loops of bowel, gastric distention, questionable ischemic changes, possible urinary tract stone, possible pneumatosis; or as negative in the presence of none of the above (normal film). To evaluate the consistency of interpretation, a random sample of 50 films were reviewed by a second reviewer using the same criteria. Percent agreement was 94%, and the kappa between the reviewers was 0.80, suggesting good agreement.


Rates of cancellation and change to suggested view(s) were calculated for both Phase 1 and 2 and compared by the chi square test. Rates of cancellation and change to suggested views were also compared within Phase 2 between the “control” and “intervention” groups. Rates of cancellation and change to suggested view(s) for all films were compared by type of orderer (medicine physician, surgery physician, nurse). Characteristics of patients (age, gender, race, clinical service) who received the KUBs were obtained from the computerized information system and compared across the two phases using the chi-aquare test for categorical variables, and Student's t test for continuous variables. The KUB results for low-yield and non-low-yield films were compared within the two phases by the chi-square test. The potential impact of KUB results upon clinical care for all positive films was evaluated through medical record chart review. The potential impact of canceled KUBs upon clinical care was evaluated through medical record chart review of patients in whom a KUB was canceled. Interrater reliability for KUB film results was determined using the kappa statistic. Time trend analyses of the number of KUBs during a 26-month period was performed using piece-wise regression.33


Overall, 681 patients had 1,244 KUB films performed during the 6-month trial: 190 patients during Phase 1 (380 films) and 491 patients during Phase 2 (864 films). Neither the patient characteristics nor the orderer type differed between the two phases (Tables 1A,1B).

View this table:
Table 1A

Patient Characteristics

View this table:
Table 1B

Provider Characteristics

In Phase 1, 79% of KUBs ordered resulted in at least one critique, either low yield or alternate exam, as compared with Phase 2, in which 45% of KUBs ordered resulted in at least one message (Table 2). Between Phase 1 and the start of Phase 2, a rule that had accounted for 32% of low-yield film reminders in Phase 1 was removed for one group of orderers in Phase 2, accounting for the majority of this difference. In Phase 2, there were no differences in the rate of cancellation of low-yield films, change to suggested view(s), or results of low-yield films between the two randomized groups. For this reason, all Phase 2 films are considered together.

View this table:
Table 2

Actions: Phase 1 vs. Phase 2

Cancel rates for orders receiving a low yield message were low in both Phase 1-8/258 (3%)—and in Phase 2-10/283 (4%)—despite strengthening the messages and improving specificity (Table 2). The results were better for orders receiving an alternate exam message: 38% changed as suggested in Phase 1, versus 55% in Phase 2 (p = 0.006, Table 2). More specifically, for messages suggesting a KUB supine alone (Appendix, alternate exam rule no. 1), 33% (13/39) of the messages were followed as suggested in Phase 1, versus 40% (29/71) in Phase 2. For messages suggesting upright chest x-ray in addition to supine KUB (Appendix, alternate film rule no. 2), 40% (26/65) were followed in Phase 1 and 64% (67/105) in Phase 2.

The review of the KUB reports (Table 3) showed that the percentage of positive films was significantly lower and the percentage of negative films significantly higher for films that received a low-yield critique as opposed to those that did not in both Phase 1 and Phase 2. After the removal of one rule that included 80% of the positive films in Phase 1, only 3% of the remaining films within the low-yield category of Phase 1 were positive. With removal of this rule from one group of orderers prior to Phase 2 of the trial, 5% of films were positive. Of note, this rule was to have been removed from all orderers in Phase 2, but it inadvertently was displayed to the “control group” of orderers. If this had not occurred, the positive film rate in Phase 2 would have been even lower, approximately 3.5%.

View this table:
Table 3

Findings of Films: Phase 1 vs. Phase 2

After excluding the rule that was problematic from Phase 1, review of the positive films that resulted from indications prompting a low-yield message revealed a few results of clinical importance (Table 4). For Phase 1, of 3 films which were positive, none had any apparent effect on clinical outcome. In Phase 2, 12 positive films resulted from indications receiving a low-yield message. Of the 12 films, 6 were determined to have a significant effect on clinical outcome, and 6 were determined to have an indeterminate effect. Of these 12 films, 5 resulted from the orderer entering an inaccurate indication for the study, thus resulting in the presentation of a low-yield message that would not have occurred if the orderer had indicated the correct reason for the study. For example, one low-yield critique resulted from the orderer entering “nephrolithiasis” as the reason for ordering a film in a patient who was receiving follow-up KUBs for a known small bowel obstruction. Of the four positive films remaining (excluding the films resulting from mistakenly delivered reminders n = 3), two had a significant effect on clinical management (small bowel obstruction resulting in enema and lactulose, partial colonic obstruction resulting in flexible sigmoidoscopy), and two had indeterminate effects (feeding tube looped in mid-esophagus but no further radiologic evaluation; feeding tube at gastroesophageal junction but no further radiologic evaluation). For films scored as equivocal, 87% were read as possible ileus, 9% as possible stone in the urinary system, and 4% with other questionable findings (possible ischemic changes, possible splenic enlargement).

View this table:
Table 4

Positive Findings in Low-yield Films

Low-yield film critiques were more likely to be received by medicine physicians and nurses than by surgical physicians for KUBs ordered (Table 5). In addition, alternative view(s) critiques were also more often received by medicine physicians and nurses than by surgical physicians. However, for orders receiving low-yield critiques, medicine physicians and nurses more often canceled the KUB than did surgical physicians, although this trend did not reach statistical significance (p = 0.07). Additionally, for orders receiving alternative view(s) critiques, suggestions to switch to an alternative view were more often adhered to by medicine physicians and nurses than by surgical physicians. The results of films performed, in spite of a low-yield critique, did not differ significantly among nurses, medicine, and surgical physicians (p = 0.65).

View this table:
Table 5

Response to Critique by Type of Provider

Because physicians might learn over time not to order examinations that are low yield, we evaluated the frequency with which KUBs were performed during the study period (Fig. 4) as compared with an 18-month period prior to the intervention period. There was no significant difference in the number of KUBs performed per month in the pre- as compared with the post-intervention period (p = 0.19). Similar results were found comparing the number of KUBs per admission per month between the two periods.

Figure 4

KUB frequencies over time (January 1994-February 1996) to evaluate whether there was a time series effect associated with the intervention. This was evaluated using piecewise regression. No significant effect was found.

With the current results, annual charge savings of only $6,000 of a potential $98,500 were realized. This estimate is based upon a 4% cancellation rate of low-yield film orders (50% KUB one-view orders only—$83 charge—and 50% KUB two-view orders—$138 charge), and a 40% adherence to the critique to change from two KUB views to one ($55 charge saving per order).


In this study, we found that orderers were reluctant to cancel radiographs that were likely to be of low yield when presented with evidence-based messages in real time. This was the case even though review of the results of the films demonstrated that our rules for identifying low-yield indications were valid. Orderers were, however, more amenable to changing to the suggested view(s) when presented with suggestions to change the ordered test to an alternate examination. Thus, providers were willing to substitute but not forgo imaging once the decision to order a KUB had been made.

Changing physician behavior has been difficult,34 and efforts to improve decision-making behavior have met with a variety of results. A number of studies attempting to decrease inappropriate resource utilization have been unsuccessful, despite using a variety of methodologies, including education and feedback,3 education and audit,1 risk estimates, and triage recommendations.35 Interventions developed to improve the quality of care by suggesting additional and/or alternative testing have been more effective,15,3641 compatible with the findings of this study that providers were more willing to substitute for or change the number of view(s) of KUBs than to cancel a planned examination.

There are several possible reasons why the orderers were unwilling to cancel the requested exams. First, the orderers in this study were residents, who were not directly responsible for the financial consequences of their actions. Second, the orderers may not have been the decision makers (e.g., 22% of KUB orders entered by nurses) but merely individuals carrying out the commands of more senior housestaff or attending physicians. They may not have felt that they were in a position to act independently upon the critiques. Third, failure may have been due to the mechanism by which the intervention was introduced. Except for a brief letter of introduction prior to turning on the intervention, no formal, direct education with the housestaff occurred, and critiques regarding the utility of KUBs for selected indications were presented by computer only. Perhaps if the housestaff had received educational lectures on the utility of KUBs in addition to the information provided on-line, a greater impact would have been realized. Alternatively, providing individualized retrospective data to the housestaff about their KUB-ordering behavior and its yield might have been helpful, a technique that has been variably successful.1,3,9,14 Fourth, the limited impact of the intervention could have been due in part to users' distrust regarding the validity of the critiques because of the display of a rule that, although later removed, initially had poor predictive value. However, it is unlikely that this played a large role, as there were few encounters with the KUB critiques per user, and thus it is unlikely that they “learned” to distrust the intervention. Involving the housestaff in the development of the criteria, another potentially successful technique,6 could also have been helpful in getting the housestaff to “buy in” to the validity of the evidence-based critiques.

Although the rules from which the critiques were developed successfully differentiated low-yield from non-low-yield KUBs, the intervention had only a small effect on utilization. The implementation of rules, or “guidelines,” can be difficult; guidelines can be perceived to be of varying quality, ranging from clearly defined and proven rules to unproved suggestions.42 In addition, physicians may be hesitant to make a decision based upon guideline recommendations, particularly if they are concerned about the medical-legal implications of following a guideline that could result in an unfavorable clinical result. This concern has some validity, since guidelines have been used to incriminate as well as exonerate physicians, although generally physicians are on firmer ground when they follow guidelines.43 Additionally, although the rules utilized here were tested for their ability to identify low-yield utilization during the study and appeared to be effective, including data about their local performance in Phase 2 was no more successful at changing ordering behavior than the presentation of the literature-based critiques in Phase 1. Stronger interventions, such as requiring a radiology consultation of examinations likely to be of low yield or giving individual house officers' data on their respective ordering behavior to their superiors could have had a larger impact, but they would be more intrusive.

If orderers had canceled all low-yield films as suggested, two positive films that significantly affected clinical management would have resulted (0.8% of all low-yield films, excluding those that resulted from the orderer entering an incorrect reason for the KUB). Although it was our impression that films that were scored as equivocal would not result in a change in clinical management, it is possible that not performing these films could have affected patient outcomes. However, none of these films appeared to provide a definitive answer to a clinical question. Further diagnostic testing or reliance on clinical history and exam would be required, suggesting that the KUB was of little clinical benefit to the patient in these circumstances. In addition, there were 18 KUBs canceled in total; upon review of the medical record, none of these cancellations appeared to affect clinical management.

Interestingly, there were significant differences in ordering behavior between medicine and surgical physicians. Although surgical physicians were more resistant to suggestions to cancel and to change their orders, they were also less likely to receive a low-yield or alternate exam critique, suggesting that either their clinical judgment as to when a KUB study was necessary was better than the other orderers, or that the prevalence of disease that a KUB would detect was higher in their population. Different strategies may be necessary to affect behavior for different provider and patient groups. Future studies could focus on this as well as on other important areas, including user satisfaction with the decision-support systems. A satisfaction study performed at our institution prior to this study found high levels of satisfaction overall with the POE system,44 and a follow-up questionnaire found an average satisfaction score of 3.5 on a scale of 1–7 for suggestions regarding low-yield studies (unpublished data).

This study has several limitations. For one, it is not clear how the results will generalize to other settings. For example, if the KUB decision-support system were instituted in a setting where incentives were better aligned with its purpose—in a more mature managed-care environment, for instance, or one in which the orderers felt the financial impact of their actions—the results might have been more favorable. Also, while the financial savings were small, they could have been larger had the intervention been implemented in emergency and outpatient units, where the literature suggests the yield of KUBs may be lower than in the in-patient setting.26

We conclude that an intervention to identify low-yield KUBs was effective for identifying low-yield examinations, although simply presenting a computerized critique was rarely sufficient to convince providers to cancel orders. Providers were, however, more willing to “change direction”—that is, to order more appropriate views. To substantially reduce the number of inappropriate radiographic exams, stronger incentives or interventions may be required. Computers offer the opportunity to give patient-specific, real-time feedback to providers. This will undoubtedly prove an important tool for changing behavior; we are still learning how best to use it.


View this table:


  • Supported in part by R01 HS08927 from the Agency for Health Care Policy and Research, Rockville, MD.


View Abstract