D_LIB Magazine Editorial

Visiting Medical Informatics

In late October, I went to the annual fall symposium of the American Medical Informatics Association here in Washington, which about 2,000 people attended. The conference's theme was exploiting Internet/World Wide Web technologies, and the discussions resonated with topics I hear elsewhere: information retrieval; data visualization; interoperability among heterogeneous sources and systems; confidentiality, security, and privacy; and user interfaces, to name a few. Not terribly surprisingly since medicine offers a compatible environment for deploying digital library technologies. The field relies heavily on visual observation, whether of tissue samples, MRI scans, or X-rays; generates complex databases of patent records, which are a potential research resource; and requires juggling multiple and evolving information sources from billing to genetic sequences to drug interactions. Finally, hospitals require complex information flows and represent settings that seem ripe for intranet applications.

Indeed, the appeal of browsers and web technologies in local settings was obvious. While relatively little at the conference represented new software technology, at least two characteristics of the research suggested ways in which applications in this subject domain can push the technology in some interesting directions. One is in the area of confidentiality, security, and privacy, and the second is in the general area of clinical decision support and the capture of information in what is frequently loosely structured text.

Nearly every presentation assumed that the systems for information capture and retrieval would be embedded in clinical settings. Not surprisingly, then, there is intense interest in natural language retrieval systems and their relationship to numerous controlled vocabularies and systems. The expectation of clinical use also means, says the program chair James J. Cimino of Columbia University, that physicians "want to extract relevant information when they need it." While the web has provided a much easier environment to do this, he looks to the research community to provide "tools that integrate across resources and across functionalities." Note, however, that he expects an informational rather than a directional response =96 this protein has these properties, not a list of 3 or 300 citations to the literature -- and his examples are based on access to medical reference texts which quickly provide authoritative answers to questions.

But the medical literature advances rapidly. The most current information is probably not in the authoritative textbooks, and information overload, says Stanford's Gio Wiederhold, is still a significant problem. What is needed is "rapid abstraction of significant results," recognizing that "significance" is context-dependent. "Many smarter tools" that, for example, can organize results hierarchically is the goal. Again not surprisingly, there were numerous sessions devoted to expert systems, which is an area in which medical applications have been early, witness MYCIN in the 1970s.

Dr. Wiederhold is and has been a major figure in digital libraries research and himself gave a paper on an architecture for data security, which illustrates the nuances of notions of security and confidentiality. Much of the discussion of security in electronic publishing, for example, revolves around access: based on stated terms and conditions of use, which can include strategies for payment and protection of intellectual property, authorized users are granted access to a collection. By contrast, patient records are a thicket of information to which many people can have legitimate access but for different purposes from billing to research. Whereas the transaction can be made secure and parties authenticated, more difficult, Wiederhold says, are problems that can arise when individuals with legitimate access for one purpose use it for another, or inadvertently violate the privacy of a patient. In this sense, the medical model of information is more similar to corporate or government information, where hierarchies of access to partitioned subsets of information are common.

In principle, some medical information, like social security numbers, can be de-coupled from the records. But excessive partitioning of the records by all possible symptoms, relationships, or diagnoses is neither practically feasible nor necessarily desirable, since a potential value of patient records is that they can support data mining, epidemiological research, or simulations of patient trials. As a first step, Wiederhold proposes an architecture based on a human "Security Mediator", located in the firewall and equipped with a number of tools, enabling him or her to evaluate a request Another approach, described by MIT student Latanya Sweeney, is "scrubbing" the records of identificational information, which reduces although does not necessarily eliminate the problem of breaches based on inference.

Sweeney and her colleagues are dealing with the problem of removing identifying information from physicans' notes, correspondence, and discharge records. Such loosely structured information inheres in the clinical setting and, as a practical matter, means that the writers rely heavily on context, employing jargon, nicknames, private shorthand, and acronyms. The MIT researchers have used a series of detection algorithms and replacement mechanisms which, they say, allow them to "reliably remove explicit personally identifying information." One of the interesting features of the Scrub research is that the system recognizes types of documents by characteristics of documents. This potentially goes toward managing unstructured text, which is common in medical records but also proliferates in offices, labs, archives, and libraries. Such information must be handled preferably on the fly and without extensive editing and offers different problems from the formal structure of books, monographs, and articles which lend themselves to SGML or related schema.

Andreas Paepcke has already observed in this magazine, that "searching is not enough" and digital library technologies must handle a broad variety of information capture, storage, and retrieval problems in many settings. Hospitals are interesting place to start.

Written by
From: Amy Friedlander (afriedl@CNRI.Reston.VA.US)