Hadoop is well-known as the de-facto platform for Big Data analytics. Is it for everybody? Is it always the right framework for the job? Pepperdata seeks to answer these questions by enabling companies to streamline Hadoop to get the most out of the platform and optimize existing hardware. We sat down with Sean Suchter, Co-founder and CEO of the company, to discuss the technology further.
insideBIGDATA: Pepperdata certainly has an interesting name. What are the origins of it?
Sean Suchter: We spent our first three months without a name, during which time we talked to a lot of companies (we conducted more than 70 interviews). We learned that enterprises from a wide range of industries, not just tech, were being transformed by their use of data, in many cases creating entirely new features or even products based on their data! But we also heard over and over that some companies are being held back by being unable to rely on their systems. We wanted to make their data a little more active, a little hotter, a little spicier… Enter Pepperdata.
insideBIGDATA: What does your company do in the Big Data world?
Sean Suchter: Pepperdata is how companies can rely on Hadoop.
In talking to businesses, we determined that more often than not Hadoop found its way into their enterprise as a specialist or analytics project. Once people started discovering and using Hadoop, they began relying on it for answers. Over time, the problem evolved beyond getting Hadoop up and running (which Hadoop distribution vendors already solve well) to making it better; a tool on which the entire organization could rely.
Because Hadoop is so flexible and capable, it ends up doing many jobs and acquires lots of users; therefore, jobs step on each other because the (complicated) system is opaque, and as a result, it becomes inefficient and slow. Today, organizations usually have some business-critical use cases – some that must run every day, every hour, or every second – to make important business decisions. Pepperdata makes time-sensitive jobs happen easily, without the wasted time of hunting to diagnose and solve problems or wasted money spent on buying more hardware than you need in order to make sure things run smoothly.
We solve this – we give Hadoop the predictability it needs, let organizations see what it’s doing (with detailed usage metrics for every user, job, and task, in real time), and help organizations get the most out of their hardware investment. We are not for the organizations that have just entered into their first Hadoop project (because they don’t rely on it… yet). We are here for those who already rely on the business-critical data and functionality Hadoop can deliver.
insideBIGDATA: You’ve recently received $5 million in a Series A financing round – exciting news. Where will this funding go?
Sean Suchter: We are using it for continuing product development; we now have a great product that installs in less than an hour on existing clusters (Cloudera, Hortonworks, Apache, IBM, or any other standard distribution). In addition, we are continuing to scale the engineering team and really kick-starting our go-to-market activity (marketing, sales, and support).
insideBIGDATA: What are the limitations of Hadoop in data analytics?
Sean Suchter: There’s a lot going on in the analytics space – great visual tools are coming out, things are getting more real time and in-memory. In addition, increased demands on the cluster are creating more contention and complexity – as things are becoming more useful, more users and jobs create traffic jams that slow the journey. It’s also moving beyond analytics. Sometimes a great analytics result gets published directly as product – for example when social networks suggest additional connections based on mining existing data. It’s a critical growth driver for them – they rely on that data so the product can be built every six hours, so they can never miss that deadline.
insideBIGDATA: How can companies evaluate the importance of Hadoop in their business?
Sean Suchter: Here’s a question to think about: if it stopped running, how quickly would you notice? The old answer was “next month,” in which case you might not care about Pepperdata. Today more companies respond with “next hour” or “when my CEO gets the email” or “when someone gets paged immediately.” When you have dozens (or hundreds, nowadays) of people using it, the efficiency of the system is business critical.
insideBIGDATA: Your company is relatively new. Where do you see the technology going? What’s the future look like for Pepperdata?
Sean Suchter: The world is shifting from apps running on single computers (like Microsoft Office) to multiple apps running on clusters of computers. It used to be the domain of only huge tech companies that could do this, but fabrics like Hadoop are making this accessible to all companies.
Thomas Watson said there was a worldwide market for five computers, but clearly he was underestimating the impact of personal computing to businesses. The same transformation is happening for distributed computing today. Operating systems like Unix and Windows made single-machine computing accessible to the masses of businesses. Part of what they did was to make real-time uses work (where lots of things can be on one computer). The same real-time thing is happening on huge clusters of machines, and that transformation will drive a lot of value for businesses and the need for Pepperdata.