Sign up for our newsletter and get the latest big data news and analysis.

Using Machine Learning at Scale

peter-cnuddeIn this special guest feature, Peter Cnudde, VP of Engineering at Yahoo, provides a bird’s eye view for the many ways that Yahoo is using machine learning at scale. This concept is driven by the irrational, yet popular notion that one day, machines will take over the jobs of humans. As Vice President of Engineering for Yahoo, Peter oversees the company’s big data and machine learning platforms. He is particularly interested in large scale machine learning and its impact on our society. In the past, Peter has worked at several wireless telecommunications companies including Alcatel and RF Micro Devices. He received his masters degree in Electrotechnical Engineering from the University of Ghent in Belgium.

Machine learning and deep learning have gained huge traction in recent years, as increased computing power has made real-life application of these technologies possible at Web scale.  At Yahoo, machine learning is central to many of our products, platforms, and serving engines that cater to over 1 billion users worldwide.

Making search easy and convenient for 1B+ users

Search is one of our core verticals, and we utilize machine learning to continually improve the experience for users through algorithms that personalize users’ search experiences. These monitor not only what users are clicking on, but also how long they’re reading a particular story and if they’re reading articles in relation to that story. In tracking this behavior, we’re then able to understand user preferences and use that data to inform our strategy across all of our verticals, including Search, Finance, News, and Sports. If we can understand what appeals to a reader, then we can continue delivering compelling content to them.

An example of a recent improvement we’ve developed with regard to Search is a novel advanced matching model based on the idea of semantic embeddings that improves upon our existing sponsored search algorithm, to better address advertiser and consumer needs. It matches user queries against ads with similar semantic vectors, instead of traditional syntactic matching. Essentially, we train machines to understand the meaning of words in their context.

Powering image detection to improve platform UX

Flickr is one of our popular platforms and hosts more than 20 billion photos. Using deep learning we’re able to categorize and organize the billions of photos on our Hadoop clusters to bring more accurate image-search results to users. A deep convolutional neural network transforms an input image into a short floating-point vector, which we pass to 1000+ binary classifiers, each trained to give a yes/no answer to identify a specific object or scene class.

With Yahoo Esports, we employ a computer vision and deep learning solution to automatically identify game highlights from live-streamed videos without any human intervention. The network learns the important visual characteristics that define game highlights. We then use that network to quickly and seamlessly deliver game and match highlights to users at all hours of the day.

Machine learning on big data

Implementing machine learning algorithms directly on top of Hadoop clusters have made scaling our algorithms easier, especially when it comes to data movement and security. We developed and open-sourced CaffeOnSpark which allows organizations to turn their existing Hadoop or Spark clusters into a powerful, fully distributed platform for deep learning. We have significant engagement from the community, and look forward to seeing it grow further.

Supporting open source for rapid innovation

Open source is a huge driver of innovation. Yahoo believes in the power of open source to support advancements in technology and we have been a significant contributor from the very beginning. With machine learning evolving by the day, open source will play a big role in the next level of innovation. Recently, we open sourced a deep learning solution allowing developers to use a classifier for detecting “Not Safe for Work” (NSFW) images. What’s unique about this detector is that it was built using a deep neural network that optimizes accuracy, speed, and memory, which to the best of our knowledge, did not previously exist. We are hoping that developers will find it useful to work on, and further develop the accuracy of the technology.

Machine learning is an exciting area and one we take seriously, as it is changing human-machine interactions and the expectations from those interactions in a fundamental way.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: