Pepperdata Optimizes Amazon EMR Clusters to Increase Job Performance by 4x

Print Friendly, PDF & Email

Pepperdata-logoPepperdata, the experts in the performance of distributed systems at scale, announced a new offering that enables customers of Amazon Elastic MapReduce (EMR) to run jobs up to four times faster and simultaneously cut costs. With a one-click install, joint customers gain instant, granular visibility into their clusters’ run-time performance, which today is not possible through Amazon alone. Even after an Amazon EMR cluster has completed its work and terminated, users will be able to access fine-grained monitoring data that allows customers to view a run and analyze it, as well as compare it with historical data to improve future performance. Pepperdata customers can take advantage of this new service free of charge until December 31, 2016.

Because Amazon EMR clusters are short lived, once a run is complete the cluster terminates, taking all performance data along with it. As a result, visibility into job performance is essentially non-existent, making it very difficult to pinpoint areas of improvement that can decrease run times and costs for customers. Pepperdata’s granular analysis of runs – based on over 300 metrics, including CPU, memory, unused capacity, and job duration – helps DevOps teams optimize workloads and decrease run times caused by code inefficiencies. This instant visibility into cluster utilization also makes it easy for customers to determine the right amount of compute needed to complete jobs on time and at the lowest cost.

Amazon EMR is designed to help companies process huge amounts of data easily and cost-effectively without having to commit unnecessary resources,” said Sean Suchter, CTO, Pepperdata. “As customers embrace Hadoop in the cloud they need to be able to manage cost and performance without any big surprises. Pepperdata eliminates those blind spots with very granular insight into the performance of current and historical EMR runs.”

Even Small Reductions in Run Time Can Yield Significant Savings

Managing cost is the top priority for customers using Amazon EMR. Because billing in Amazon EMR is hourly, any reduction in run time can have a demonstrable impact on overall cost.

One of Pepperdata’s customers, a leading online real estate destination, wanted to reduce a specific run in Amazon EMR that consistently required 17 hours to complete. By analyzing the metrics that Pepperdata collected and stored after termination, the customer was able to find areas of improvement and use the workload analysis to decrease the same run to four hours. Pepperdata’s unique insights into cluster utilization quickly and accurately identified areas of inefficiency, leading to hundreds of thousands of dollars in annual cost savings for a single job.

Automated, Adaptive Scaling for Amazon EMR

In addition to the self-service option for Amazon EMR, Pepperdata is also today announcing the beta availability of Adaptive Scaling for Amazon EMR. With Adaptive Scaling, customers can specify a time or cost budget for job completion and Pepperdata will automatically purchase instances with Amazon EMR that will elastically grow or shrink as needed to meet these criteria. Adaptive Scaling for Amazon EMR will be publicly available in Q1 and Pepperdata is accepting sign-ups for the beta.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind