Sign up for our newsletter and get the latest big data news and analysis.

The World’s Most Ambitious Knowledge Graph 

Knowledge graphs have become one of the foremost expressions of Artificial Intelligence today. Almost everyone—from vendors to organizations, analysts to regulators—relies on these applications at some point to compile, harmonize, and scrutinize specialized business information.

Use cases range the gamut of data management requisites from data quality solutions to internal applications of employees’ aptitudes. The ubiquity of knowledge graphs isn’t just because of their unmatched relationship discernment, innate reasoning capabilities, or standardization of divers data sources.  According to CTO Marco Varone, it’s for something much simpler and, perhaps, more universal.

“Any knowledge is added value for any use case,” Varone observed. “It’s always better to have more knowledge than less. If you’ve got more than you need you can discard it, but if you don’t have knowledge you can’t create it out of thin air.”

Of all the knowledge graphs wrought, the most extensive, heavily populated, and nuanced are those pertaining to Natural Language Understanding (NLU) which, arguably, is the most arduous of AI tasks. As Varone pointed out, the context of computer vision deployments is relatively the same across the globe. Language, however, has far more distinctions pertaining to accents, regions, dialects, use cases, and other cardinal points of differentiation of what words mean in varying contexts.

Devising a knowledge graph then, to facilitate language understanding for each of these intricacies—in an assortment of languages—is surely one of the most ambitious undertakings of these applications ever completed. Success is critical for accurate computer understanding of language for enterprise AI.

“For simple use cases for language understanding, you can do well without knowledge graphs,” Varone commented. “But as soon as you move from super basic ones to the really complex, knowledge is something you need.”

Subject Area Models

The underpinnings of any true knowledge graph will always be the subject area models (sometimes termed ontologies) upon which enterprise knowledge is based. This fact is particularly prominent for a knowledge graph focused on NLU, which expert.AI built across a sundry of language and domains that, Varone estimated, required hundreds of “man years of work”. Consequently, “it’s not separate knowledge graphs: one for chemistry, one for sports, one for finance,” Varone indicated. “As much as possible, we put everything into one knowledge graph.”

The means of doing so lies in a binary approach in which the resulting graph was, conceptually, split into two parts. The first is comprised of the concepts represented by language itself, which Varone characterized as “language independent.” Since language is the very substrate of knowledge, the exhaustive nature of such subject area models is readily apparent. The second focuses on the linguistic application of these concepts, which is naturally codified according to respective languages.

Vocabularies, Taxonomies

Whereas ontologies are necessary for representing knowledge in a unified manner that Varone denoted “minimizes the entropy” not uncommon to knowledge graphs, taxonomies are necessary for the specific applications of those concepts. “That’s where you have the vocabulary, the terminology,” Varone explained. Naturally, there are different taxonomies for different languages and distinctions, like business units.

Taxonomies also include synonyms and hierarchies of definitions for the varying terms that relate to concepts in ontologies. In this respect, this second aspect of a NLU knowledge graph “is the thesaurus or vocabulary on top of the language independent part,” Varone revealed.

Modeling Language

Although it may conceptually help to think of the knowledge graph according to these two halves, the difference between taxonomies and ontologies isn’t always clear. Some ontologies involve taxonomies; arguably, all taxonomies are founded in some way in the concepts of these subject area models. “Ontologies can be very complex,” Varone mentioned. “They can have any type of relation, attributes, and number of nodes.”

Frameworks like Cyc—an expert system specializing in language, for which there is now an open source variety—were integral to fine-tuning the complexities of’s knowledge graph so that it was applicable to the real world. Thus, for the first part of Varone’s knowledge graph, the subject area model component, “the secret was making a pragmatic compromise between things that are too generic, to abstract for real users and Cyc,” Varone disclosed.

Knowledge Enrichment 

The final aspect of constructing an exhaustive knowledge graph for NLU across traditional barriers like languages and domains was to actually put the knowledge into the graph. Varone articulated a variegated method that began with knowledge engineers manually inputting rules and definitions before eventually involving statistical AI models. It may be surprising that for what Varone called this “knowledge enrichment” facet of building the graph, the CTO eschewed neural network approaches popularized by BERT and transformers.

“This is absolutely not the way to do it because, with the deep learning and neural network approach, what you can do is only create a sort of implicit knowledge,” Varone specified. “It is a black box. But our knowledge graph is explicit. So, all of its information is explicit so you can see, modify, or link and enrich it.”

The explicit nature of the knowledge contained in knowledge graphs is attributed to their self-declarative nature, in which anyone can see what terms mean, look up their definitions, and understand them in relation to the subject area model. As such, the self-populating element of the knowledge enrichment phase involves what Varone termed “proprietary” algorithms, in addition to traditional machine learning approaches utilizing both supervised and unsupervised learning.

Untold Advantages 

The boons of devising a NLU knowledge graph with the methodology Varone advocated are manifold. It effectively gives one a single knowledge graph that’s applicable to almost any use for language understanding. Moreover, it’s highly extensible and adapts to the particular lexicon of any organization or its business units. “If you need to add new concepts, first you have to add them in the ontology part, and then you add the word in the particular language,” Varone clarified.

This AI application is also primed for translations between languages, as well as the myriad use cases throughout the world in which information must be exposed to customers in both English and a country’s native language—such as French, for example. Finally, this knowledge graph supports the burgeoning array of enterprise NLU use cases, including everything from Cognitive Process Automation to intelligent chatbots and text analytics.

About the Author

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 –

Leave a Comment


Resource Links: