Students from more than 20 prestigious colleges and universities recently tried their hand at “Big Data” analysis at seven different campuses around the country during DataFest, an annual month-long data-analytics competitive event sponsored by the American Statistics Association.
“The Hadoop framework has become the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high-performance interconnects. This talk will examine opportunities and challenges in optimizing performance of Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects.”
FIELD REPORT Last week I attended the long-anticipated useR!2014 international conference at the UCLA campus, my alma mater. The four day event had something for everyone in attendance – all the brain cycles centered around the use of the R statistical environment. Since R is a primary tool for my work in data science and […]
Scientific research in the life sciences is often akin to searching for needles in haystacks. Finding the one protein, chemical, or genome that behaves or responds in the way the scientist is looking for is the key to the discovery process. For decades, high performance computing (HPC) systems have accelerated this process, often by helping to identify and eliminate in feasible targets sooner.
“As InfiniBand is getting used in scientific computing environments, there is a big demand to harness its benefits for enterprise environments for handling big data and analytics. This talk will focus on high-performance and scalable designs of Hadoop using native RDMA support of InfiniBand and RoCE. Designs for various components in Hadoop (such as HDFS, MapReduce, RPC, and HBASE) and their benefits based on the RDMA package for Apache Hadoop will be presented. RDMA-based design for scalable Memcached (used in Web 2.0) and the associated benefits will be presented.”
“Splunk Enterprise is a platform for machine data. The technology delivers powerful and fast analytics to quickly unlock the value of machine data to IT and other users throughout an organization. In short, it’s a simple, effective way to collect, analyze and secure the massive streams of machine data generated by all IT systems and technology infrastructure.”
“Intel’s goal is to encourage more innovative and creative uses for data as well as to demonstrate how big data and analytics technologies are impacting many facets of our daily lives, including sports. For example, coaches and their staffs are using real-time statistics to adjust games on-the-fly and throughout the season. From intelligent cameras to wearable sensors, a massive amount of data is being produced that, if analyzed in real-time, can provide a significant competitive advantage. Intel is among those making big data technologies more affordable, available, and easier to use for everything from helping develop new scientific discoveries and business models to even gaining the upper hand on good-natured predictions of sporting events.”