O’Reilly Media, famed for their authoritative publications and technical scholarship, organized their first conference on artificial intelligence, from September 26-27, at the Javits Center in New York, New York. The O’Reilly AI Conference acted as a bonus event for many attendees who were already planning to be in New York for the larger and overlapping O’Reilly Strata+Hadoop World event taking place at the same venue.
The Javits Center is located in a relatively quiet section of western Manhattan between 36th and 40th streets on 11th avenue. The venue sports 840,000 square feet of exhibit space (according to Wikipedia) and could easily handle these two simultaneous gatherings. The AI conference was held on the fourth floor in a giant room with large windows, at the back of the building overlooking the Hudson River. Half of the room was set up with a stage and seating for about 1,000 people, while the other half of the room, separated by a curtain, was used for mingling and sponsor kiosks. (The main exhibit floors were being prepared for Strata-Hadoop World.) Three smaller rooms on the second floor were used for breakout sessions. In the mingling area, a door led out to a balcony with a view of lower Manhattan.
Speakers reminded everyone about the recent successes of Deep Learning — self-driving cars, super-human image recognition, video captioning, AlphGo, etc. — but many focused their talks on how little insight we currently have about why it all works. (Excerpts from the keynotes have been made available on YouTube). Since we couldn’t cover everything, we decided to take this opportunity to summarize what we heard the experts reveal about Deep Learning’s drawbacks, and to report about a few interesting alternative machine learning solutions.
Deep Learning typically excels as a pattern recognition algorithm which finds patterns only when the answers have been hinted at in the training data. And to achieve this success, Deep Learning relies on the mysterious fine-tuning of unique-to-the-problem meta-parameters such as the learning rate or network architecture. One speaker reflected how Google’s Deep Mind trained their networks to win at forty-nine Atari games (the fiftieth game, Breakout, which Google was not able to defeat, was eventually defeated a few months ago by a company called Bonsai) but only at the computational expense of fifty different parameter sets, still far from the efficacy of the human brain. Gary Marcus, a professor of psychology at NYU and founder of the company Geometric Intelligence, explained in his keynote that there is a big difference between short-term progress playing Atari and solving the really hard problem of strong AI. He quipped, “We wanted Rosie the Robot [from the Jetsons cartoon fame], a robot that can take care of my kids and my dishes, and instead all we got was Rumba.…” Marcus half-jokingly proposed that the next Turing Test should be to build a robot to deliver pizza to an arbitrary location, as skillfully as an average teenager might (he called it the “Dominos Test”). Such a task would require navigating through traffic, transporting the pizza to the front of the building or house, communicating in natural language with a possible doorman, finding the correct door, etc. The task would require the execution of multiple independent tasks and planning, a real-world solution. Although Marcus uses Deep Learning as a tool every day at his new company, he remains skeptical that it will be the panacea to strong AI.
A total of forty-two breakout sessions were held during the afternoons. Each session ranged in technical level from “beginner” to “advanced.” In one of the advanced breakout talks, Diogo Almeida, a senior data scientist for Enlitic, expertly detailed the drawbacks of various Deep Learning algorithms. The title of his talk was “Deep Learning: Modular in Theory, Inflexible in Practice.” Almeida’s main concerns about Deep Learning’s limitations were threefold:
- There’s theoretically not enough training data. For some applications, all available data may not be enough, like in the classification of genomics. Nobody knows how much data is needed for the classification of this giant input space.
- Software is inefficient. Most Deep Learning tools are not modular. In fact, Peter Norvig in his keynote mentioned the same thing. In traditional software, changes are isolated. In Deep Learning neural network software, however, one change to one parameter affects the whole black box. Almeida noted that Tensorflow is improving upon this, but there is still a lot of work to be done in this area.
- Optimization theory is lacking. There is no generalization of which problems are easily solvable, how much data might be needed, or which network architectures to choose. Again, this goes back to the mysterious “just-right” fine tuning of meta-parameters.
Also interesting, Almeida claimed that the hierarchical nature of neural networks (concrete features getting discerned at the input layers and more abstract features getting discerned at the latter layers) may be only a function of the types of data we’re presenting to neural nets and not necessarily a fundamental property of their functionality. Since we have no strong theory as to why neural nets work in a hierarchical manner, there’s no guarantee that this useful narrative about Deep Learning will last.
Even Yann Lecun of Facebook, one of the fathers of Deep Learning, decided to focus his keynote on what artificial intelligence currently cannot do rather than what it can. And since Deep Learning seems to be leading the AI revolution at the moment, Lecun’s talk did not bode well for his child. Lecun put up a slide summarizing what he believes AI needs to achieve for us to make significant progress:
- Learn / understand how the world works, e.g. acquire some level of common sense.
- Learn a very large amount of background knowledge through observation and action.
- Perceive the state of the world so as to make accurate predictions and planning.
- Update and remember estimates of the state of the world (e.g. remember relevant events).
- Reason and plan: predict which sequence of actions lead to a desired state.
Lecun summarized these objectives with the following equation:
Intelligence & Common Sense = Perception + Predictive Model + Memory + Reasoning & Planning. People at lunch and in the halls were saying that these insights were some of the best of the conference.
So, where does Deep Learning’s utility remain untapped? One area is in emotion recognition, which is showing a bit of popularity.
Affectiva currently uses Deep Learning to recognize emotion in human faces. Affectiva’s CEO, Rana el Kaliouby, demonstrated her company’s software (AffdexMe — free at the Apple app store or Google Play) onstage and emphasized that, as people interact more with intelligent technology, EI (emotional intelligence) will become equally as important as AI. Kaliouby explained how the addition of emotion recognition will help machines better understand and communicate with users, and vice verse. There’s also a lot of money to be made by advertisers who want to know if their ads are effective or not. Anna Roth and Cristian Canton of Microsoft led a breakout to explain how they determined the best emotion recognition approaches (frame-by-frame image processing vs. recurrent neural network vs. long short-term memory network). In a keynote, Lili Cheng, also from Microsoft, presented her work developing their successful WeChat chat bot Xiaolce (roughly translated to mean “little Bing”). Microsoft continues to recover from a public relations fiasco after the Twitter version of this chat bot was manipulated to post racist tweets. Microsoft learned that the (mostly) Westerners who use Twitter interact with their social media much differently than the (mostly) Chinese on WeChat, and Microsoft has been using this gained knowledge to make their AI better. Cheng described specific differences between the two cultures in the areas of gender, myths, science fiction, and what each culture expects from technology. As AI begins to enter emotion recognition, we enter a realm where math, psychology, and sociology meet. To make effective AI, we can no longer solely rely on the mathematicians and computer scientists.
And as Deep Learning begins to mix more and more with social media and be present in more and more apps, we need ways to reduce its footprint on our phones. Song Han from Stanford presented his solution for neural network model compression. He showed that we can reduce the amount of memory taken by neural nets by 10x to 40x, and the amount of energy taken up by computations by up to 120x. He does this through a combination of pruning redundant network connections, and the sharing and coding of redundant weights.
Enough about Deep Learning. At the conference, Bayesian, evolutionary, and analogical approaches to AI were also talked about.
One of the more interesting analogical alternatives to neural networks in the language processing domain is the Retina Spark 2.0 framework presented by the Austrian cortical.io founder Francisco Webber. Webber has patented a proprietary algorithm similar to the open source Word2Vec algorithm, with the main difference being that Word2Vec produces dense vectors from words, while cortial.io’s algorithm produces 3% sparse vectors of 16384 dimensions. In both algorithms, similar vectors contain similar semantic content and both are trained on large-scale natural language corpuses like Wikipedia. But because of sparsity, cortical.io’s framework plays well with Numenta’s Hierarchical Temporal Memory framework, and the two companies partnered in 2015. The partnership is important, because Numenta’s technology (not represented at the conference) claims to be a viable AI to traditional Deep Learning technology. cortical.io’s main goal, among many, is to produce a revolutionary search engine. Webber showed many impressive analogies and semantic relationships produced by his system, including a proposal for language translation. Webber received his inspiration from the work of Douglas Hofstadter and other passionate analogy practitioners.
On the statistical front, Vikash Mansinghka and Richard Tibbetts from M.I.T. presented their BayesDB probabilistic programming language, which combines a database framework with an inference engine. By merging statistical models, Bayesian statistics, Monte Carlo simulations, and an SQLite core, BayesDB can take hundreds of data parameters, many with missing cells, and spit out probabilistic answers to queries. A regular non-Bayesian database assumes that the user has complete information, where most cells are full. But often the problem space provides much incomplete data. With BayesDB the user inputs their (often incomplete) data, using typical SQL syntax, and then the inference engine goes to work on making the complicated inferences presented by the data. An example was given in which Mansinghka consulted for a diabetes study. The study included 250 patients with 400 parameters per patient, where many of the parameters were missing. The goal was to find correlations between the parameters and the likelihood of acquiring diabetes. BayesDB did the trick. Mansinghka started a spinoff company based on this work and called it Empirical Systems.
Sentient Technologies uses genetic algorithms within its Ascend product to design websites. Babek Hodjat, Sentiet’s co-founder and chief technologist, presented a breakout session about this work, best summarized by the company’s own words:
Ascend uses evolutionary algorithms that follow the principles of Darwinian natural selection to continuously determine the best design for your website.The patented Ascend solution learns, adapts, and reacts to user interaction to find the best performing combination of your changes in less time and with less traffic than required by traditional testing methods. And it automates the testing process from end-to-end, saving countless hours of time for marketing teams.”
A number of companies made product announcements. Deepsense.io announced the launch of Neptune, its “innovative machine learning platform for managing multiple data science experiments.” Baidu Research announced DeepBench, “The first open source benchmarking tool for evaluating deep learning performance across different hardware platforms.” Six companies supported kiosks in the mingle area: O’Reilly, Intel, NVIDIA, Bonsai, CrowdFlower, DeepSense.io, and H2O.ai.
Contributed by: Howard Goldowsky who lives near Boston, MA, where he programs DSP algorithms, trains at chess, and studies AI and cognitive science. He has been writing about chess for almost 20 years and is the author of two chess books. Recently he has begun to write about machine learning and AI.
Sign up for the free insideBIGDATA newsletter.