Can Topological Data Analysis Save the Day?

tdaA challenging trend is happening in the Big Data industry, many data sets are increasingly foreign in terms of the data analysis tools available to extract intelligence from them. It isn’t the size of the data set that is daunting; by big data standards, size is often quite manageable. It is the sheer complexity and lack of formal structure that poses a problem. This “big data” looks nothing like the kinds of traditional data sets analysts would have encountered just a few years ago when the analysis paradigm involved forming a hypothesis, deciding precisely what one wished to measure, then building an apparatus to make that measurement as accurately as possible. Today’s big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete.

Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway. It is not sufficient, he argues, to simply collect and store massive amounts of data; they must be intelligently curated, and that requires a global framework.

As result of this need, Gunnar Carlsson, a mathematician at Stanford University, is representing cumbersome, complex big data sets as a network of nodes and edges, creating an intuitive map of data based solely on the similarity of the data points; this uses distance as an input that translates into a topological shape or network. The more similar the data points are, the closer they will be to each other on the resulting map; the more different they are, the further apart they will be on the map. This is the essence of topological data analysis (TDA) which is an outgrowth of machine learning, a set of techniques that serves as a standard workhorse of big data analysis. The idea behind topological data analysis is to reduce high-dimensional data sets to lower dimensions without sacrificing the most relevant topological properties. The question is whether TDA can save the day and pave a path toward solving this missing link in big data.

