Sign up for our newsletter and get the latest big data news and analysis.

Machine Learning with Apache Mahout

TedDunningFIELD REPORT

This week I attended an event sponsored by my favorite Meetup group: LA Machine Learning. The topic was “Machine learning with Apache Mahout” presented by Ted Dunning, Chief Application Architect for MapR Technologies. The event was booked to capacity with 100 in attendance. Many of this Meetup’s events are held at lunch time over at the eHarmony HQ located in the Colorado Center (previously the Yahoo Center) in Santa Monica. As usual, lunch hosted by eHarmony was healthy and unlimited with a wide selection of drinks. The venue can’t be beat!

Machine learning at large scale is challenging, and the open source Apache Mahout project is a good way to tackle it. Plus, Mahout just got a lot better with release of the new 0.8 version on July 25, 2013. There are improvements throughout Mahout, but particularly strong additions in clustering and recommendation algorithms and in the excellent Mahout math library. Mahout is slimmer, faster and more effective.

practical-machine-learning-coverMahout committer Ted Dunning (MapR) talked about these updates to the opensource Apache Mahout project and then showed how you can build a simple but powerful recommender using co-occurrence to determine preferences. It’s easier than you think, particularly with some tips and tricks that were discussed. One of the best short cuts to practical and effective recommendation is to use search technology such as Solr for deploying the Mahout recommendation engine. Dunning’s talk is based on his book “Practical Machine Learning” which you can download HERE.

Dunning provided reference to his GitHub repository for the Ponies project he described in the talk. He admitted that the repository was still “work in progress”. It looks like you get music data from the public-domain source at musicbrainz.org, and treat the machine-generated log in the GitHub data file as “user listening behavior” against that music file. Pig scripts are used to process the data prior to analyzing it with Mahout.

Here is a video of Dunning making a presentation about recommendation algorithms during the Berlin Buzz Words conference last year:

I always enjoy the ML meetups at eHarmony because of the caliber of the topics and presenters. The talks are more like graduate seminars in computer science and statistics so I need to be on my toes to get the most out of the events. I found Dr. Dunning’s presentation to be on par with this level of excellence. He is a compelling and witty speaker who keeps the interest and attention of his audience. The Q&A session at the end was at the same level. He responded to each question in a very thorough and detailed manner. I definitely was impressed!

For more on Machine learning check out the insideBIGDATA guide to Machine Learning.

 

 

 

 

Resource Links: