A recent announcement appearing in MIT News, “Machine learning branches out,” highlights new research in probabilistic graphical models. In a paper being presented in December at the annual conference of the Neural Information Processing Systems Foundation, MIT researchers describe a new technique that expands the class of data sets whose structure can be efficiently deduced. In addition, their technique naturally describes the data in a way that makes it much easier to work with. The paper was co-authored by Ying Liu, a graduate student in MIT’s Department of Electrical Engineering and Computer Science along with his advisor, Alan Willsky, the Edwin Sibley Webster Professor of Electrical Engineering,
In the paper the researchers apply their technique to several sample data sets, including information about commercial airline flights. Using only flights’ scheduled and actual departure times, the algorithm can efficiently infer vital information about the propagation of flight delays through U.S. airports. It also identifies those airports where delays are most likely to have far-reaching repercussions, which makes it simpler to reason about the behavior of the network as a whole.
In technical terms, the researchers’ work concerns probabilistic graphical models. In this context, a graph is a mathematical construct that consists of nodes and edges, usually depicted as, respectively, circles and the lines that connect them. A network diagram is a familiar example of a graph; a family tree is another.
The new research paper shows that a structureless data set is equivalent to a graph in which every node is connected to every other node. Liu and Willsky’s algorithm goes through the graph, sequentially removing nodes that break loops and, using the efficient algorithm they demonstrated previously, calculating how close the statistical dependencies of the resulting graph are to those of the fully connected graph.