Now You’re Speaking (Understanding) My Language

Print Friendly, PDF & Email

In this special guest feature, Luca Scagliarini, Chief Product Officer at, discusses how addressing the challenges associated with language data is a daunting task, but it is one that will define your organization’s success moving forward. A hybrid approach is your first step in the right direction. As CPO, Luca is responsible for leading the product management function and overseeing the company’s product strategy. Previously, Luca held the roles of EVP Strategy & Business Development and CMO at and served as CEO and co-founder of semantic advertising spinoff ADmantX. Luca received an MBA from Santa Clara University and a degree in Engineering from the Polytechnic University of Milan, Italy.

There is no way to place a value on language. It is a core method of human interaction, communicating a seemingly infinite amount of information. In fact, try to imagine a world without it. People use language to convey all kinds of details about events, feelings and beliefs with extraordinary depth, nuance and precision.

The term “natural language” is used to distinguish human languages from the artificial languages that have been developed for purposes such as software programming. Natural language within the context of work is used in  texts, emails, voicemails, documents, memos, social media posts and other formats.

The ease of creating content, in different formats, using natural language through various technology tools has unleashed immense volumes of language data. Consider that Americans send about 6 billion text messages a day. That number will only increase with the growing use of digital communications.

Why Machines Struggle with Language

Unfortunately, the flexibility that makes language so useful for humans befuddles machines. Much of this has to do with the fact that language  is unstructured. According to industry estimates, as much as 90% of all generated enterprise data is unstructured and often in the form of language.

While humans can quickly master the intricacies of human language to establish context and derive meaning from text, machines are not equipped to do so. Just consider the breadth and depth of the English language. A single word can have multiple meanings, e.g., the word “plane” can refer to a geometric entity, airplane, flat surface etc.

This makes it incredibly difficult to automate the complex business cases that rely on the language data embedded within business documents, emails and other sources of data. For instance, customer transactions, insurance policies, patient records and legal documents all require an understanding of nuanced and often domain-specific terminology.

Because machines struggle to understand language data, humans are left to manage this enormous, growing, largely untapped resource. More specifically, it is those individuals who possess a specialized knowledge and understanding (i.e., subject matter experts) who are tasked with processing this data. This type of expertise is extremely difficult to replicate in a scalable manner.

Transforming Language into Data

Enterprises must all overcome the challenge of transforming language data into knowledge and business insight to make faster, better and more consistent decisions. But how can they successfully and consistently extract data from documents to accelerate language-intensive applications? And how can they make their systems to understand language at a human-like level?

Artificial intelligence (AI) is an ideal solution, but it is not that simple. Selecting the right AI approach is key to making any real impact with it. So, while machine learning models can provide predictive insights from structured data (e.g., predicting equipment failure from product inventory, buying patterns and sensor data), it is an imperfect solution for understanding unstructured data and addressing the challenges of language data. For instance, a model might be able to guess the next word in a sentence, but that does not mean it understands the sentence or even the context of the added word.

The only way to achieve this understanding is through knowledge. This requires a knowledge-based, symbolic approach to AI. And the most efficient way to establish a knowledge-base for language understanding is to create a repository of related concepts, or a knowledge graph. The knowledge graph provides the data with the right structure necessary for a machine to understand language the way people do. This is vital for as it adds “common sense” to “language processing.”

Taking a Hybrid Approach

While there is a clear advantage to symbolic when it comes to understanding language data, many thought leaders and analysts believe a hybrid approach that combines symbolic and machine learning methods is the solution to natural language challenges. The hybrid approach provides the robust statistical capabilities of machine learning with the transparent, resource efficient and customizable IF-THEN logic of symbolic. This eliminates the “black box” logic that plagues ML and helps to improve explainability without any compromise in terms of accuracy.

Enterprise processes that depend on an accurate understanding of language data need an AI solution that leverages both machine learning and symbolic approaches. This combination is key to scaling the expertise of the enterprise that, in turn, enables organizations to gain insight and accelerate knowledge-based processes from their rapidly growing trove of unstructured information.

Addressing the challenges associated with language data is a daunting task, but it is one that will define your organization’s success moving forward. A hybrid approach is your first step in the right direction. Keep the momentum going.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 –

Speak Your Mind