The Untapped Potential in Unstructured Text

Print Friendly, PDF & Email

Do you ever look up and just watch people walking down busy streets or sitting in crowded coffee shops? It’s impossible not to notice everyone typing away on their phones, tablets and laptops in nearly every setting these days. As I watch people engrossed in their devices, I envision the characters being typed – the shorthand, acronyms, misspellings and character substitutions.

This is the process of our thoughts being transferred into the digital world. Our research, opinions, facts, feedback and calls to action are transformed from human language into data through those keyboards. But what are we doing with all that data?

Unstructured text is the largest human-generated data source, and it grows exponentially every day. The free-form text we type on our keyboards or mobile devices is a significant means by which humans communicate our thoughts and document our efforts. Yet many companies don’t tap into the potential of their unstructured text data, whether it be internal reports, customer interactions, service logs or case files. Decision makers are missing opportunities to take meaningful action around existing and emerging issues.

Natural language processing (NLP) is a branch of artificial intelligence (AI) that helps computers understand, interpret and manipulate human language. In general terms, NLP tasks break down language into shorter, elemental pieces, and tries to understand relationships among those pieces to explore how they work together to create meaning. The combination of NLP, machine learning and human subject matter expertise holds the potential to revolutionize how we approach new and existing problems.

The applications of NLP are incredibly diverse and ideal for nearly any situation involving the need to rapidly and tirelessly analyze unstructured text. For example, a hospital system has an ever-growing corpus of data in the form of electronic health records. Patterns of symptoms and root cause analysis would be nearly impossible for a human to detect by combing through every individual record. But an AI system can work around the clock to analyze the test results, patient reports, listed symptoms and more. NLP has proven powerful in its application to predict and alert hospitals when data in electronic health records indicates the presence of sepsis.

Sepsis is a leading cause of death in hospitals, according to the Sepsis Alliance. Early diagnosis and rapid intervention is critical in sepsis treatment, but symptoms aren’t always apparent for its early onset stages. Mortality rates increase 8 percent for every hour treatment is delayed. With heavy caseloads and possibly asymptomatic patients in the early stages of sepsis, the human eye may not notice the correlation between data in medical records and early indicators of a deadly condition. But the application of NLP to data in those electronic health records is a key input for predictive models that trigger alert systems, notifying doctors and nurses that a patient may need medical intervention. Research has shown that providing full medical treatment for sepsis in the first 180 minutes of onset can save 80 percent of the lives that would have otherwise been lost.

And the applications of NLP expand beyond the medical setting. It can be used to analyze legal case files, social media feeds, call center logs, research documents, warranty claims and more. The majority of data held by organizations is in the form of unstructured text.

To be able to make sense of all this information requires a combination of three capabilities:

  • Natural language processing. NLP performs linguistic analysis that essentially helps a machine read text. It analyzes and converts text into form representations for text processing and understanding. This includes methods such as tokenization, part of speech tagging, stemming, named entity recognition and more.
  • Machine learning. Once NLP has been applied to text, machine learning uses the output for data mining and machine learning algorithms to automate the generation of key insights and descriptive analytics.
  • Human input. When it comes to analyzing text, human input is still incredibly important. Subject matter expertise is applied in the form of linguistic rules to help the machine capture slang, detect sarcasm and provide relevant context.

The technology to analyze unstructured text actively learns from the data as it comes in by combining machine learning with human direction to generate new insights. The end goal is to build and deploy text analytics models for operational impact by enabling understanding through topic detection, contextual extraction, document categorization and sentiment analysis.

Natural language processing holds the power to improve how we live and work. It can help bring progress to areas that have been slow or difficult to change without the partnership between human and technology. Look at your organization and consider the unstructured text you gather and the possible revelations it may hold. That data reflects the voices of those you serve and holds the potential to help you deliver better experiences, improve quality of care and enrich human engagement. There are powerful stories to be told from your unstructured text data. Are you listening?

About the Author

Mary Beth Moore is an Artificial Intelligence and Language Analytics Strategist at SAS. She is responsible for providing strategic marketing direction and leads global SAS messaging for artificial intelligence and text analytics. She frequently presents and writes on a wide range of technology topics, including AI, NLP and SAS’ Data for Good initiatives. Prior to SAS, Moore served in the United States Marine Corps and spent several years as an intelligence analyst and senior instructor in the US Department of Defense and Intelligence Community, primarily supporting expeditionary units and special operations. She is also a special education advocate, a disability rights consultant and a believer in community inclusion for people of all abilities.

 

Sign up for the free insideBIGDATA newsletter.

 

 

 

Speak Your Mind

*