Sign up for our newsletter and get the latest big data news and analysis.

Bridging the MapReduce Skills Gap

Mapreduce skills gap

Data is exploring at large organization. So is the adoption of Hadoop. Hadoop’s potential cost effectiveness and facility for accepting unstructured data is making it central to modern, “Big Data” architectures. Yet, a significant obstacle to Hadoop adoption has been a shortage of skilled MapReduce coders.

The Analytics Frontier of the Hadoop Eco-System

Ted Wilkie

“The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications.”

Performance Optimization of Hadoop Using InfiniBand RDMA

DK Panda

“The Hadoop framework has become the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high-performance interconnects. This talk will examine opportunities and challenges in optimizing performance of Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects.”

Interview: Replacing HDFS with Lustre for Maximum Performance

Gabriele Paciucci

“When organizations operate both Lustre and Apache Hadoop within a shared HPC infrastructure, there is a compelling use case for using Lustre as the file system for Hadoop analytics, as well as HPC storage. Intel Enterprise Edition for Lustre includes an Intel-developed adapter which allows users to run MapReduce applications directly on Lustre. This optimizes the performance of MapReduce operations while delivering faster, more scalable, and easier to manage storage.”

DK Panda Presents: Big Data – Hadoop and Memcached

imgres-1

DK Panda from Ohio State University presented this talk at the Stanford HPC & Exascale Conference. “As InfiniBand is getting used in scientific computing environments, there is a big demand to harness its benefits for enterprise environments for handling big data and analytics. This talk will focus on high-performance and scalable designs of Hadoop using native RDMA support of InfiniBand and RoCE.”

With Continuuity 2.0, Java Developers can easily Build Hadoop and HBase Apps

gray

With Continuuity Reactor 2.0, Java Developers can easily Build Hadoop and HBase Apps. The Big Data App server now includes new production-ready features including MapReduce Scheduling, High Availability, Resource Isolation and full REST API support.