Sign up for our newsletter and get the latest big data news and analysis.

Tomorrow’s Machine Learning Today: Topological Data Analysis, Embedding, and Reinforcement Learning

Although machine learning is an integral component of Artificial Intelligence, it’s critical to realize that it’s just one of the many dimensions of this collection of technologies. Expressions of supervised and unsupervised learning may be the foundation of many contemporary AI applications, but they’re substantially enhanced by interacting with other aspects of cognitive computing.

“On the one hand you have symbolic rules based systems like graph databases that are really good at reasoning,” reflected Franz CEO Jans Aasman. “Then you have statistical machine learning, which is very good at perceiving and learning. You want to use them together because together, it gets really, really good.”

Certain visual approaches of graph aware systems will significantly shape the form machine learning takes in the near future, exponentially increasing its value to the enterprise. Developments in topological data analysis, embedding, and reinforcement learning are not only rendering this technology more useful, but much more dependable for a broader array of use cases.

Topological Data Analysis

Topological data analysis is arguably at the vanguard of machine learning trends because of its fine-grained pattern analysis that supersedes that of traditional supervised or unsupervised learning. Although technically part of unsupervised learning, topological data analysis “is a clustering technique where you get way better results,” Aasman explained. Clustering is a visual analytics approach supported by graphs that reveal where data are populated according to certain segments. Aasman used a simple example to explain the effectiveness of topological data analysis: “There’s five positions in basketball, but then if you analyze the players based on a set of features, you find that there’s like, 200 types of basketball players.”

The advantage of this approach is the granularity in which it’s able to micro-segment datasets. Topological data analysis is useful for pinpointing the nuanced, myriad facets involved in constructing predictive digital twins to model entire production environments. In healthcare, it can indicate that instead of two forms of diabetes, there are over 20 distinguishable forms, “in the sense of how they react to certain treatments or medications or their temporal unfolding,” Aasman revealed. A highly pragmatic, horizontal deployment of topological techniques is for understanding machine learning model results for interpretability and explainability. For this use case, these representations can reveal the inner workings of deep neural networks to illustrate for which features models learned well and which they didn’t. SAS Senior Manager of AI and Machine Learning Research and Development Ilknur Kabul described those representations as essentially graphs.

Embedding

The visual manifestations of graph settings are pivotal for generating the features on which to train machine learning models. Features are directly responsible for the prediction accuracy of models. According to Cambridge Semantics CTO Sean Martin, engineering those features “is a combination of deciding which facets of the data to use, and the transformation of those facets into something closer to a vector.” Vectors are simply data that have been converted into numbers, which become the basis for sophisticated equations for machine learning predictions so that “if you’ve got X you can solve for Y,” Martin maintained. Embedding is the process of plotting various vectors in a graph to perform this math to determine models’ features. It involves “reducing the graph to these vector spaces that you can then look to see if you can find equations,” Martin said. Graph embedding hinges on transforming vectors to decrease the amount of data plotted in graphs, while still including the full scope of that data for predictions.

There are several ways embedding with graphs makes machine learning more effectual. Specifically, it improves value derived from:

  • Transformations: In graphs, organizations can preserve the relationships between vectors before and after transformations, allowing them to contextualize them better for feature detection. This benefit underpins “a far less heavy lift to place those pivoting transformations on the data elements that you are finding important,” Martin noted.
  • Multi-dimensional data: High dimensional data is oftentimes cumbersome because of the large number of features (and factors) involved. When creating models to predict whether patients will require respiratory assistance after hospitalization, for example, organizations have to include all of their demographic data, medical history data, that of their family, and more. Flexible, relationship-savvy graph settings are ideal for the math required to generate credible features; the higher the data’s dimensionality, the more features it offers for accurate predictions.
  • Vectors: As the number of vectors increases for feature generation, it becomes more crucial to consistently “represent some sort of data point in juxtaposition with all of the other vectors…created,” Martin commented. Graphs can visually represent, and maintain, the connections between vectors and data points that make them meaningful for feature engineering.

Reinforcement Learning

In professional settings, reinforcement learning is likely the least used variety of machine learning. One of the caveats of deploying reinforcement learning pertains to how these statistical models learn. “An agent interacts with an environment and learns how to interact with that environment,” Kabul clarified. “The agent can make many mistakes in that environment, but when applying it to the real world, we don’t have the luxury about making so many mistakes.” The primary distinction between reinforcement learning and more commonplace applications of supervised/unsupervised learning is the latter involve some annotated training data. Conversely, the learning in the former is predicated on what Kabul termed a “sequential decision making process; we learn through sequentially interacting through the agent.”

Enterprise applications of reinforcement learning include aspects of automated model building in self-service data science platforms. Kabul mentioned that other use cases include energy efficiency in smart cities. However, reinforcement learning’s penchant for individualization may exceed that of unsupervised and supervised learning for customer interactions—which could potentially revamp both marketing and sales verticals. Kabul referenced a marketing use case in which various materials are sent to customers to try to elicit (and optimize) responses: “Traditionally you can segment the customers and [inter]act differently with those groups. But that’s not scalable; that’s not [individualized]. What we are trying to do is personalize those: create many journeys, create many interactions with the customer so that we can treat each one individually.”  

Advanced Machine Learning

Machine learning will assuredly continue to fortify AI deployments in both the public and private sectors, for consumers and the enterprise alike. As such, its advanced applications pertaining to wide data, topological data analysis, and reinforcement learning will have even greater sway over the underlying worth of this technology to business processes and personal life. How effectively organizations adapt to these applications and incorporate them into workflows will influence the overall effectiveness of their cognitive computing investments.  

About the Author

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: