Sign up for our newsletter and get the latest big data news and analysis.

Interview: Ash Munshi, CEO at Pepperdata

I recently caught up with Ash Munshi, CEO at Pepperdata, to get a rundown on his company, a sense for how big data and DevOps are related, some highlights on new product offerings, and his sense for where Pepperdata is headed in the future. Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.

Daniel D. Gutierrez – Managing Editor, insideBIGDATA

 

insideBIGDATA: What is Pepperdata?

Ash Munshi: Founded in 2012 by industry veterans Chad Carson and Sean Suchter, Pepperdata is a Big Data startup based in Cupertino, Calif. Leading companies such as Comcast, Philips Wellcentive and Zillow depend on Pepperdata to manage and improve the performance of Hadoop and Spark. Enterprise customers use Pepperdata products and services to troubleshoot performance problems in production, increase cluster utilization and enforce policies to support multi-tenancy. Pepperdata products and services work with Big Data systems both on-premise and in the cloud.

insideBIGDATA: DevOps and BigData – Why do they need each other?

Ash Munshi: DevOps is the modern standard for application development and delivery, and fosters collaboration and communication between developers, quality assurance and IT operations professionals. DevOps toolchains improve and automate stages and feedback loops within the DevOps cycle of plan, code, build, test, release, deploy, operate and monitor. DevOps can shorten time to delivery, improve user satisfaction, deliver better quality product, improve productivity and efficiency, and better meet user needs by allowing faster experimentation.

DevOps is a part of a number of successful Big Data environments, even if it’s not always recognized as such today. DevOps-style rapid iteration, feedback and release cycles are clearly used in many Big Data environments. And companies today are actively recruiting and hiring staff for these roles. DevOps for Big Data uses many of the same tools as traditional DevOps environments, such as source code management, bug tracking, continuous integration, and deployment tools. Some examples of Big Data specific DevOps tools include Anaconda, Apache Zeppelin and Jupyter notebooks.

Pepperdata expects there to be an increased focus on DevOps for Big Data. New practices, technology, and software will emerge to better support DevOps for Big Data—and Pepperdata will be contributing to that with a focus on performance aspects of DevOps for Big Data.

insideBIGDATA: Spark Is seeing a surge and some say is overtaking MapReduce as the performance tool of choice for developers. Why do you think this is?

Ash Munshi: Apache Spark has revolutionized how Big Data applications are developed and executed since it emerged several years ago. As a data processing engine for Hadoop, Spark is orders of magnitude faster than MapReduce. It’s also easier to code than MapReduce, and more flexible too. It’s better in almost every respect except perhaps one: code visibility.

Unlike MapReduce, which required people to think really hard about how the cluster was processing and how the cluster was operating in order to solve their problem, Spark really abstracts a lot of that away, so developers can think just about what they want to have happen, and then Spark takes care of it. That opens up a wider set of people to program in Spark and they can do things much faster. They can write applications much faster. But the flip side of that is because it’s hiding execution details, it’s really hard for developers to connect their code to the actual hardware usage.

insideBIGDATA: Are Big Data environments evolving for developers? Why?

Ash Munshi: For real-time, scale-out architectures like Apache Spark, monitoring and managing performance throughout the software lifecycle presents several unique challenges, as Spark depends sophisticated CPU, memory and disk management for applications that are broken down into many simultaneous tasks distributed across the cluster. Developers need specialized tools to build and deploy high-performance Spark applications and offer operations personnel the specifics they need to manage such applications. As enterprises move to DevOps, these tools are invaluable for fostering performance-related communication and collaboration for any Big Data initiative.

insideBIGDATA: What’s the future of Big Data and why should DevOps fit into that?

Ash Munshi: Big Data is on an accelerated path to becoming more and more mainstream. At Pepperdata, our focus is on companies that are running production Big Data environments to drive their business. We work with some of the top companies in the world that have strategically and operationally mastered Big Data.

Over the past 18 months this trend has accelerated, and as operations teams have become responsible for running Big Data deployments, we see two key trends. First is that Big Data must fit into the way IT teams work today, and that means DevOps. DevOps is the modern standard for application development and delivery. Secondly, as Big Data becomes more integrated into the business processes and operations of a company, there will be an accompanying technical convergence of Big Data architectures and workloads with mainstream data center technology.

Big Data has traditionally added significant complexity to IT environments because it comes with different technologies, different tools and different skill-sets required to support it. Over time, you will have Big Data and traditional workloads running side by side on the same orchestration framework, with a common set of management tools.

insideBIGDATA: Tell me about your product Application Profiler – What is it?

Ash Munshi: Pepperdata Application Profiler is built on an open-source project built by LinkedIn, called Dr. Elephant. We’ve integrated Dr. Elephant’s open-source code as a Software-as-a-Service offering. With Application Profiler, we now bring performance feedback all the way back to developers. Application Profiler analyzes all Hadoop and Spark jobs running on the cluster and provides developers with technical insights on how each job performed. It gives developers actionable recommendations for tuning jobs and lets them validate tuning changes made to applications, with a before-and-after comparison. It also lets operators quickly green-light new jobs before they move to the production cluster.

insideBIGDATA: What’s next for Pepperdata?

Ash Munshi: We have had a busy 2017 already and look forward to continue that momentum this year and into 2018. In particular, we recently announced Code Analyzer for Apache Spark, which provides Spark application developers the ability to identify performance issues and connect them to particular blocks of code within an application. Code Analyzer fills a huge void in application development for Spark, helping developers optimize Spark applications for large-scale production, and this move underscores our continued dedication to address the entire DevOps cycle, while also delivering products that are indispensable for operating Big Data systems in production.

Our future will leverage the enormous data volume of metrics that we have collected (over 20 trillion data points), along with best practices and machine learning to enable operational systems that are simultaneously reliable, scalable and performant.

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: