In this special guest feature, Anand Shroff, CTO of Health Fidelity, discusses the challenges to using natural language processing in the hospital setting. Anand is co-founder, Chief Technology and Product Officer of San Mateo, California-based Health Fidelity, a provider of technology solutions for healthcare organizations.
Ask any healthcare IT executive about the key to realize the promise of data-driven healthcare, and you’re likely to hear the words “unstructured data” and “natural language processing.” Especially with the recent rise in Electronic Health Records (EHRs), the healthcare community is keenly aware that vitally important information is hidden away in unstructured physician notes as well as lab reports, admission/discharge papers and other forms of free text. Natural language processing, or NLP, is the technology that can extract new value from that information.
The uses for this data, once transformed, are many. Hospitals and health systems, health plans, biotech and pharmaceutical companies, research institutions and government agencies all seek the insights that can be gleaned from previously unstructured data. Hospitals and clinics, for example, are looking for faster alternatives to manual coding as well as better ways to satisfy Affordable Care Act mandates. Payers and providers alike wish to improve their revenue cycle management processes, while the U.S. Department of Health and Human Services is quickly moving from fee-for-service to pay-per-performance as its model for reimbursement—a model that relies heavily on data-driven population risk factors.
In August 2015, MarketsandMarkets predicted that healthcare NLP will become a $2.67 billion category by 2020, with a CAGR of 19.2%. With 80% of all clinical data residing in unstructured formats transformable only through NLP, the interest is huge.
For years, NLP has been a subject of great interest both inside and outside the healthcare community. Unfortunately, reliable solutions have been slow to develop. NLP is a highly complex undertaking that includes such esoteric challenges as named entity recognition, morphological segmentation, disambiguation and sentiment analysis. In the medical realm, these factors can have huge implications; a physician diagnosis recorded in an EHR narrative can have widely divergent meanings due to expression variances and organization-specific language.
Further complicating the situation is the amount of industry-specific terminology and jargon used, including SNOMED, ICD-9, ICD-10, LOINC, RxNorm and other code systems. Finally, the massive cost of homegrown solutions is beyond the resources of most IT staffs. Extraction algorithms are not only immensely complex, but also require constant updating to be relevant and accurate.
Researchers at universities and major companies have been working on viable NLP systems since the 1950s. MIT and Stanford were early sources of general investigation; more recently, Columbia University has pioneered healthcare NLP research that has gained a significant amount of acceptance at academic medical centers due to breakthroughs such as a semantically based parser for determining the structure of text.
Dr. Carol Friedman, Professor of Biomedical Informatics at Columbia, has spearheaded most of this research and development that includes the earliest patents, the most peer-reviewed publication associations, and validations in hundreds of successful projects. Dr. Friedman’s work has taken healthcare NLP out of the lab and into everyday use.
Now that accurate, cost-effective healthcare NLP is a possibility, analytics and research teams that rely upon data need to consider how to bring this technology into their organizations. The first step is to identify the use cases. Reimbursement, population health, quality improvement and improved clinical research are just some of the possibilities. It’s critical to build strong justifications for the correct use cases and then articulate their strategic value to management.
Once agreement has been reached, the next step is to examine the technical requirements. Unstructured data from the EHR must be readily available—so too must the infrastructure for data extraction, normalization, and integration. Systems must tie together efficiently in order to deliver the business value promised.
Finally, it’s vital to select a vendor whose goals and track record align with the organization’s priorities. No value is obtained by choosing NLP technology that isn’t already proven for the specific use cases in question. In addition to this assurance, look for an NLP solution that continuously learns and improves its output. The system should have general precision/recall metrics in the 90-plus percentage range over several use cases, as well as the ability to improve results over time. The best NLP technologies are capable of delivering these results in even the toughest healthcare environments.
With the right system in place, supported by a well-planned business case, NLP is ready to have transformational impact on the quality and productivity of modern healthcare. The inclusion of unstructured information into the big data mix can improve insights and increase efficiency many times over—and for healthcare IT application companies in particular, unlocking the full potential of data from EHRs and other previously unreachable sources can add significant new appeal to the products they offer.
As most every healthcare IT professional knows, the future of the industry demands a data-driven approach. Natural language processing isn’t a far-off dream anymore. For IT organizations ready to jump in, NLP is here—and it’s delivering real business value.
Sign up for the free insideBIGDATA newsletter.