Pepperdata® Code Analyzer for Apache Spark Highlights Performance Bottlenecks for Developers

Print Friendly, PDF & Email

Pepperdata, the DevOps for Big Data company, announced Pepperdata Code Analyzer for Apache Spark, which provides Spark application developers the ability to identify performance issues and connect them to particular blocks of code within an application. Code Analyzer is a new product that follows on the heels of Pepperdata Application Profiler, which provides Hadoop and Spark developers with actionable recommendations for improving job performance.

One of the most significant challenges in Big Data is achieving optimal performance,” said Ash Munshi, CEO of Pepperdata. “Code Analyzer fills a huge void in application development for Spark, helping developers optimize Spark applications for large-scale production. Developers are now empowered to improve the performance of Spark applications with new information and insight around the code, build, test and release phases.”

The performance metrics from Spark Web UI have historically been a challenge for developers to understand and contextualize, especially without having granular, time-series data on hand. Developers cannot easily drill down into and understand the problematic sections of an application that require optimization. Further, as Spark clusters typically run many applications in parallel, the Spark Web UI doesn’t inform developers how applications are impacted by other applications running on the cluster.

Pepperdata Code Analyzer allows Spark application developers to precisely measure how cluster resources – including CPU, memory, and network and disk I/O–are consumed by any particular block of application code. Code Analyzer delivers additional insight by combining application information from the Spark engine with granular time-series data for all applications running on a cluster. Dev teams are empowered with the ability to pinpoint the specific segment of their application code responsible for performance issues.

I develop a lot of complex Spark code to perform ETL on Hadoop clusters. In these complex, large-scale systems, you must be able to understand where the performance bottlenecks are,” said Ian O’Connell, software engineer at Stripe and Pepperdata Technology Advisory Board member. “Pepperdata Code Analyzer for Apache Spark gives developers detailed time-series performance data for things like CPU, JVM memory and I/O usage overlaid against Spark job stages. I’m excited about the direction Pepperdata is moving — letting developers quickly see problems in time-series views and tie them back to their actual Spark application code will be a very useful tool for developers working on production Spark applications.”

Benefits of Code Analyzer include:

For Devs:

  • Identify which lines of code and which stages cause performance issues related to CPU, memory, garbage collection, network and disk I/O
  • Easily disambiguate resources used during parallel stages
  • Understand why run time variations occur for the same application
  • Determine whether performance issues are due to the application or other workloads on the cluster

For Ops:

  • Reduce the number of performance incidents in production
  • Easily communicate detailed performance issues back to developers

Chartboost is the world’s largest mobile games-only advertising platform, reaching one billion active players around the world every month. Chartboost utilizes Apache Spark on large Amazon EC2 Hadoop clusters for machine learning and ET​L​ workflows,” said Michael McGowan, manager of Data Engineering at Chartboost. “Understanding Spark application performance in these complex environments is always a challenge. As a current use​r​ of Pepperdata Hadoop performance management tools, it has been great to work with Pepperdata on the development of Code Analyzer. It will give us comprehensive insight into Spark jobs.”

Pepperdata products and services are designed to accelerate the production use of Big Data applications by ensuring that performance is tightly integrated into the DevOps for Big Data cycle. Code Analyzer is integrated with Pepperdata products to provide an end-to-end DevOps solution, combining overall cluster awareness (monitoring, troubleshooting and alerting) with deep recommendations for improving the performance of individual jobs.

Availability and Pricing

Code Analyzer for Apache Spark will be available June 5 in early access, with general availability expected in Q3 2017. Pepperdata products are delivered to market as a combination of software running on customers’ clusters, on-premises or in the cloud, and as SaaS solutions.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. Last 10 years, java and oracle technologies number one technologies. Next 10 years, only spark is number one framework to process and analyze any data. There is no doubt, recently spark 2.0 version solved many problems and improving performance.