“Apache Drill coupled with the proper NoSQL data store opens up the opportunity for a single data source to be used for both transactional and analytical processing. There is no need to export or transform the data it into an application-specific or even a star schema format in order to load into a data warehouse. Time saved by not having to create software to serialize and deserialize the data into the data structures for any given language, coupled with simplified software testing and less code to maintain, add up to a BIG savings for any business.”
Building out a Hadoop cluster with massive amounts of local storage is a considerably extensive and expensive undertaking, especially when the data already resides in a POSIX compliant Lustre file system. Now companies can adopt analytics written for Hadoop and run them on their HPC clusters.
Data is exploring at large organization. So is the adoption of Hadoop. Hadoop’s potential cost effectiveness and facility for accepting unstructured data is making it central to modern, “Big Data” architectures. Yet, a significant obstacle to Hadoop adoption has been a shortage of skilled MapReduce coders.
“The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications.”
“The Hadoop framework has become the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high-performance interconnects. This talk will examine opportunities and challenges in optimizing performance of Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects.”
“When organizations operate both Lustre and Apache Hadoop within a shared HPC infrastructure, there is a compelling use case for using Lustre as the file system for Hadoop analytics, as well as HPC storage. Intel Enterprise Edition for Lustre includes an Intel-developed adapter which allows users to run MapReduce applications directly on Lustre. This optimizes the performance of MapReduce operations while delivering faster, more scalable, and easier to manage storage.”
DK Panda from Ohio State University presented this talk at the Stanford HPC & Exascale Conference. “As InfiniBand is getting used in scientific computing environments, there is a big demand to harness its benefits for enterprise environments for handling big data and analytics. This talk will focus on high-performance and scalable designs of Hadoop using native RDMA support of InfiniBand and RoCE.”
With Continuuity Reactor 2.0, Java Developers can easily Build Hadoop and HBase Apps. The Big Data App server now includes new production-ready features including MapReduce Scheduling, High Availability, Resource Isolation and full REST API support.