Those of us who have worked with electronic healthcare data have been long aware of the limitations of billing data (aka claims data, aka administrative data) for research. They are often too coarse grained for clinical research and are inherently biased to maximize income. It is motivated by these limitations that Natural Language Processing (NLP) has become increasingly important in mining clinical records for research. What a doctor writes in her notes is much more revealing of her patient's state than what she bills for. Notwithstanding there are some significant challenges in the de-identification of textual records and in transforming these records into standardized clinical categories (e.g. SNOMED). Yet the appeal of using the clinical narrative text rather than claims data is compelling. In our work in i2b2, we have seen significant overrepresentation of diagnostic codes where a diagnostic encounter to "rule out" a disease was codified as that disease in the claims data. For example, a radiologist asked to rule out rheumatoid arthritis based on an X-ray will often classify the X-ray with a billing code corresponding to rheumatoid arthritis when perusal of the full narrative text of the radiologist's notes that there were NO findings consistent with rheumatoid arthritis.
A recent article in the Boston Globe points out additional challenges in using claims data for personal medical records. The same limitations of claims data for research appear to impinge on their utility for clinical care. My colleague John Halamka makes several useful suggestions on how to improve the use of such data, including recruiting patients themselves as collaborators in refining the categorization of their clinical records or even removing gross errors. Notwithstanding, a small number of codes are likely to be quite limiting and it may be that codifying the patient's record by using the entirety of their clinical documentation (i.e. what their care providers wrote about them) will ensure the most nuanced and most faithful representation available of what the clinician was thinking about in each clinical encounter with that patient.