As another episode of the Big Data & Brews industry perspectives series, Stefan Groschupf, CEO of our friends over at Datameer, shares his thoughts on the future of Spark and how it is part of an evolution in the Hadoop environment.
“There are a number of Bayesian modelling packages available, but how do you know which one to use? This talk will take you through the positives and negatives of the major packages, focusing on the specifics of my work in health statistics, as well as providing a general overview of what these packages can do.”
In the Hadoop Summit 2015 presentation below, Bill Porto, senior analytics engineer at RedPoint Global, will discuss why continual, adaptive optimization is key to maintaining a leadership position in the market.
In this video from the PyData Seattle Conference, Lorena Barba from George Washington University presents: Data-driven Education and the Quantified Student. “Education has seen the rise of a new trend in the last few years: Learning Analytics. This talk will weave through the complex interacting issues and concerns involving learning analytics, at a high level. The goal is to whet the appetite and motivate reflection on how data scientists can work with educators and learning scientists in this swelling field.”
“The Seagate 1200.2 SSD family includes the next-generation of high-capacity, high-performance SAS SSDs designed with multiple endurance offerings optimized for demanding enterprise applications and maximum TCO savings. The 1200.2 SAS SSD family delivers ultra-fast, consistent and easily scalable performance that exceeds 12Gb/s SAS single port bandwidth. By removing the storage bottleneck, it closes the gap between processor and data storage performance and significantly improves overall system and application responsiveness.”
In the presentation below by Wendy Gradek, Sr. Manager EOS BI and Analytics, EMC, you’ll hear about the benefits they’re seeing in resource optimization, usability, and boot camps, plus direct feedback from the business teams who are using Alteryx and Tableau.
In the presentation below, the Nuclear Pharmaceutical Services division of Cardinal Health uses Alteryx to combine data from Salesforce.com, an Access Database, an Excel spreadsheet, and Teradata; then performs time series forecasting before writing the data back to a Teradata Datalab.
Clusters must be tuned properly to run memory-intensive systems like Spark, H2O, and Impala alongside traditional MapReduce jobs. This Hadoop Summit 2015 talk describes Altiscale’s experience running the new memory-intensive systems in production for its customers.
In the video presentation below, Charles Martin, Chief Scientist at Calculation Consulting, spoke to a class at Cal Berkeley’s Haas Business School. Given the business oriented audience, the discussion is high-level and not too technical but rather very practical in nature.
In the presentation below, Alec Radford, Head of Research at indico Data Solutions, talks about deep learning with Python and the Theano library.