In this special guest feature, Delroy Cameron, Data Scientist at URX, explores how brands and publishing platforms can benefit from big data provided by machine-learning powered knowledge graphs. URX is a deep linking technology platform that aims to make seamless mobile experiences within applications. Delroy’s work focuses on big data problems involving text mining, machine learning and structured background knowledge. He is interested in knowledge representation, information extraction and knowledge discovery from both structured and unstructured text. He earned his Ph.D. in Computer Science at Wright State University, a Master’s degree in Computer Science at the University of Georgia and Bachelor’s degree in Computer Science at Savannah State University.
A lot of the best content on mobile devices is “trapped” within the walls of mobile applications. The emergence of deep linking has made it easier to connect apps by standardizing protocols and configurations, but building a seamless mobile experience requires making it easy for users to navigate to where they want to go.
That’s where big data comes in. By using knowledge graphs and machine learning, publishers can surface the right content at the right time while pursuing alternative monetization channels that add value for users.
Knowledge graphs provide a foundation for making sense of immense amounts of data through explicit insights into how people, places and things relate to each other. For example, knowing that the Golden State Warriors are a basketball team isn’t particularly useful in a vacuum. However, if you understand how the “entity” of the Warriors fits into the bigger picture of professional athletes, sports venues, professional sports leagues, etc., you can form more useful insights.
Knowledge graphs must be accurate and comprehensive in order to serve users’ needs. One way to accomplish this is by ingesting vast amounts of data to train machine learning models that leverage existing knowledge graphs. Wikipedia is an incredibly valuable dataset for training machine learning models because it is a diverse, heterogeneous and comprehensive source of semantic information extending across people, sports, politics, companies and many things in between. It is also regarded as a fairly accurate data set.
Wikipedia can also be used effectively to enrich existing knowledge graphs with new facts derived from trained machine learning models. But, given that Wikipedia is updated constantly, it is important to be able to parse the entire corpus (and use it for training) as quickly and effectively as possible.
Parsing Wikipedia can be done a number of ways, including open-source python libraries like gensim, and a suite of media wiki parsers such as mediawiki parser and mwparserfromhell. We implemented a fast approach for parsing Wikipedia (which takes approximately 20 minutes) by using a combination of pyspark, mwparserfromhell, and wikihadoop. The Wikihadoop library splits and parallelizes the large Wikipedia dump (12GB) into small chunks, from which text can then be extracted using the mwparserfromhell through a pyspark job. This quick parsing time greatly facilitates training.
The implication of such knowledge graphs for the mobile ecosystem is powerful: publishers can replace pesky, irrelevant advertisements with useful, relevant actions. This enhances the user experience, helps brands reach users when they are most engaged, and unlocks new revenue streams for publishers.
Let’s say that a mobile user is reading an article about Steph Curry’s contributions to the Warriors’ historic 22-0 start to the 2015 season. Knowing this, and the added context of the user’s location, a knowledge graph can be used to suggest that they may want to watch Steph Curry’s 2015 season highlights or buy tickets to the next Warriors “home game” on December 16, but would likely not want to buy tickets to the team’s next away game or read recipes for Indian curry dishes.
Big data is at the epicenter of building cohesive mobile experiences. With the help of machine learning, it is possible to put relevant, context-driven actions in front of the right users at the right time. There are a number of benefits a knowledge graph can gain from ingesting Wikipedia. More knowledge can be assimilated every single day, meaning knowledge graphs will only become more intelligent and accurate over time. By implementing deep linking and tapping into knowledge graphs, companies can harness the power of big data to improve the mobile user experience and open up new commerce opportunities.
Sign up for the free insideBIGDATA newsletter.