OUP user menu

★ Comment ★

A Salient Problem in Informatics?

Titus Schleyer
DOI: http://dx.doi.org/10.1197/jamia.M2752 707 First published online: 1 September 2008

The Jan/Feb issue of JAMIA contained an interesting series of articles about the automated identification of smoking status from medical discharge records. It profiled the comparative performance of 11 different systems for the classification of patient records into five general categories for smoking status. The various classification approaches used, such as Bayesian classifiers, natural language processing, support vector machines and neural networks, illustrated the rich and diverse set of algorithms used in automated text processing and classification today. Even more impressive was the performance of some of these systems, which, in certain aspects, approximated the gold standard.

I wonder, however, whether the organizers of the i2b2 challenge could not have picked a test task that would have appeared more salient to the outside world. When I read the papers, I pretended, for a moment, not to be an informatician. The first question most likely to occur to a person like that would be: “Why develop a computer program to interpret free text in order to find out whether a person smokes or not? Why not store the answers to the questions that I (sometimes/often/always) get asked regarding smoking by my doctor/dentist in a database directly?” Clearly, this scenario oversimplifies the real issues. The layperson most likely would be unaware of the long-raging debate about free text versus structured medical records, the difficulties of changing behavior in healthcare providers, the discrepancies between the patient status and what is actually recorded in the record, and the validity and reliability of such data from the perspective of epidemiology. On the other hand, as healthcare professionals, we have known for a long time that knowing a patient's smoking status is beneficial for a number of reasons, ranging from risk assessment and disease prevention to smoking cessation intervention and policy decisions. So, the layperson may again justifiably ask: “So why doesn't everyone capture and make use of this data?”

Along the same lines, the layperson could rightly ask why the biomedical informatics community is “wasting its time” fine-tuning algorithms which in practice may be far inferior to asking all (or most) patients a set of simple questions and taking care that the answers are correctly recorded in the paper or computer record. The biomedical literature contains plenty of topics describing problems with capturing a patient's smoking status that may strike the layperson as a lot more worthy of everyone's time and effort.

I am not arguing to discontinue automated text processing and classification as a research area in biomedical informatics. This type of research has many beneficial applications and outcomes as long as free text and the necessity to classify information in it persist. What I am arguing for is that the biomedical informatics community consider how the non-informatics community perceives our work. From that viewpoint, some of our work scores pretty low, regardless of whether the view is justified or not.

Biomedical informatics has gone (and, I suspect, will continue to go) through periodic crises of identity and relevance. In my opinion, recent years have brought significant and positive change in how informatics is perceived by the healthcare community and the public. Much of this change can be directly attributed to what AMIA and its members and stakeholders have done. However, many still see us as closeted in our ivory tower, either oblivious or only dimly aware of the problems and challenges that beset the real world. I think trying to change this view is up to all of us—in what research questions we pursue, what we publish, and to what degree we effect positive change.