Last evening I attended a great local Meetup of the Los Angeles R User Group. The theme of the night was “The Unsexy Part of Data Science: Data Munging.” The 90 attendees were treated to 5 short presentations from a panel of experts in the field (including yours truly!). The venue was great and right in the heart of Silicon Beach in Santa Monica at the offices of General Assembly.
In a nutshell, data munging is the often tedious and time-consuming task of preparing the data for consumption by machine learning algorithms. Given the state of some corporate data sets, the job of transforming the data can be considerable, and very non-sexy. In fact, I wear entirely different clothes when doing sexy algorithm design (cool clothes) versus data munging (a rumpled sweatshirt)!
Here is a run-down on all the talks (I’ve included links to the slides for each presentation):
Szilard Pafka, Ph.D. Chief Data Scientist at Epoch – “Data Munging Intro”
Yasmin Lucero, Ph.D. Statistician at Gravity – “Munging date-times in R: tools, tricks, gotchas”
Daniel D. Gutierrez – “Data Munging: the Good, the Bad and the Ugly”
Neal Fultz, Grad student in UCLA Statistics Department – “Tidy Data, Facts & Rules for R”
Eric Klusman, Manager of Data Insight at Demand Media – “Plyr for split-apply-combine”
Daniel – Managing Editor, insideBIGDATA