Last week I attended the long-anticipated useR!2014 international conference at the UCLA campus, my alma mater. The four day event had something for everyone in attendance – all the brain cycles centered around the use of the R statistical environment. Since R is a primary tool for my work in data science and machine learning, I very much was looking forward to hearing what the world’s best and brightest had to say. I wasn’t disappointed. I found the invited lectures, user-contributed presentations, tutorials and poster presentations to be of a very high quality. Suffice it to say, I learned a lot and met a lot of very smart people.
userR!2014 was focused on:
- Promoting R as the “lingua franca” of data analysis and statistical computing
- Providing a venue for users to discuss and exchange ideas on the use of R for statistical computations, data analysis, visualization and exciting applications in various fields
- Providing an overview of the new features of the rapidly evolving R project
One of the highlights of the conference actually was during a coffee break when I met a father who was on-hand to give a talk along with his 8 year-old daughter who was attending her first R conference! She must have been the youngest ever userR! attendee. I always enjoy discovering a young person (especially a female) with a propensity for math and science. It was a delight chatting with them.
In order to summarize my experience at the conference, I thought I’d provide a “Best of” list of the presentations I was able to take in. Bear in mind, there were many talks going on concurrently, so sadly I couldn’t see everything. In that spirit, here’s the best of what I saw personally.
The Most Honored award must go to the main keynote address for the conference by the father of R, John Chambers and his talk “Interfaces, Efficiency and Big Data.” He identified three promising projects in the R community: Rcpp, LLVM for R: Compiling toolkit for R, and h2o: Interface and Java-based computations for big data. He summarized his talk with the simple points outlined in the above image. It was great to finally see Chambers in person.
Most Impressive (Genius Award)
Of all the talks I attended, the most impressive designation must go to “Adaptive Resampling in a Parallel World” by Dr. Max Kuhn of Pfizer Global R&D Nonclinical Statistics. Dr. Kuhn made a very cogent presentation, fast-paced and clearly detailed. I valued the insight he provided about adaptive resampling as summarized here:
- If the training set size is big enough, adaptive resampling can generate quality models.
- If the computationally complexity is large, it can also generate signicant speed-ups.
- Parallel processing does not obviate the gains generated from adaptive resampling
In light of my previous life in astrophysics research, I found the most interesting talk to be “R in the Midst of Exploding Stars: Distributed, Time-Domain Transient Classification,” presented by JPL’s Thomas J. Fuchs who talked about a novel framework for time domain astronomy using R and machine learning algorithms for an iterative, dynamical classification of astronomical transient events such as supernovae. I had a brief chat with Thomas after the talk and found out he is not an astronomer, which lends credence to something I’ve known for a while – data scientists can contribute in meaningful ways to many diverse problem domains.
I was most anticipating the tutorial (access the materials for “Data Manipulation dplyr” HERE) and short talk on dplyr by Hadley Wickham: “dplyr: a grammar of data manipulation.” As one of the main contributors to the R environment, Wichham is a powerhouse all to himself, albeit with much modesty (see Tweet).
This award goes to Hilary Parker of Etsy who presented a poster on Wednesday evening about a new R package her group created called “testdat” for unit testing of tabular data. I found Hilary to be the most pleasant person at the conference to speak to, very welcoming and informative.
This is an easy choice! I was very inspired by the talk “Practical use of R by blind people,” by A. Jonathan R. Godfrey (who is blind), a lecturer in statistics at the University of New Zealand. This was the first time I ever saw a blind person do a PowerPoint presentation and he did it with such ease that there was little difference with one delivered by a sighted person. He used a high-speed audio segment before each slide to jog his memory. You had to see it to appreciate it, this guy is a real superstar!
Another award goes to a poster presenter, Gergely Daroczi of rapporter.net, who did a lot of work on his project to track attendees of the useR! conference over the ages. His poster included a cool plot showing the overall number of attendees for all useR! conferences in the last 10 years. He really seemed to love what he was working on, and handed out some very high-quality hard copy reproductions of his poster – a nice touch. I really enjoyed hearing Gergley describe his work with such enthusiasm.
This award goes to several presenters with useful technologies I personally plan to use.
- “10 R packages to win Kaggle competitions,” by Xavier Conort of Data Robot
- “RForcecom: an R package which provides a connection to Force.com and Salesforce.com,” by Takekatsu Hiramura
- “Deploying R into Business Intelligence and Real-time Applications,” by Louis Bajuk-Yorgan of TIBCO
- “data.table: fast and flexible data manipulation,” by Matt Dowle
The Los Angeles R User Group meetup that was aligned with the conference (Tuesday evening) gets the most organized award. As usual, Szilard Pafka did a great job in attracting an expert panel discussion: R’s Place in the Production Environment.
Another poster presentation that I found quite fun to learn about was “Package ATPR for Statistical Analyses of Men’s Professional Tennis,” by Stephanie Kovalchik, Ph.D., a statistician from Rand Corporation. I found this research particularly interesting since I had been watching a lot of Wimbledon tournament matches in the last couple of weeks. Stephanie is a regular at the Los Angeles R User Group, so it was great to see her work firsthand.
Most Appreciated Random Meeting
I was very pleased to run into Norm Matloff, author of The Art of R Programming, my favorite R text. He was presenting “An R Package for Parallel Matrix Powers.” I got the opportunity to chat with Norm about how much I liked his book, and that it is the one I usually recommend to people wishing to get up to speed with R (since I am a TA for Coursera, students often ask for a good R text). He was very gracious and I urged him on to come out with a 2nd Edition.
Daniel, Managing Editor – insideBIGDATA
Sign up for the free insideBIGDATA newsletter.