Sign up for our newsletter and get the latest big data news and analysis.

Interview: Nenshad Bardoliwalla, Co-founder and Chief Product Officer at Paxata

Nenshad BardoliwallaI recently caught up with Nenshad Bardoliwalla, Co-founder and Chief Product Officer, at Paxata, to discuss how few question the potential of Hadoop for distributed processing and storage of very large data sets, but many have been frustrated by a steep learning curve and difficulty in achieving business goals. Nenshad is an executive and thought leader with a proven track record of success leading product strategy, product management, and development in business analytics. He co-founded Tidemark Systems, Inc. where he drove the market, product, and technology efforts for their next-generation analytic applications built for the cloud. He formerly served as VP for product management, product development, and technology at SAP where he helped to craft the business analytics vision, strategy, and roadmap leading to the acquisitions of Pilot Software, OutlookSoft, and Business Objects. Prior to SAP, he helped launch Hyperion System 9 while at Hyperion Solutions. Nenshad began his career at Siebel Systems working on Siebel Analytics.

Daniel D. Gutierrez – Managing Editor, insideBIGDATA

insideBIGDATA: A few years ago Hadoop was seen as a key driver for Big Data implementation, but Gartner’s 2015 report on Hadoop adoption was pretty bleak and called out “sizable challenges around business value and skills.” Is it just too steep a technology curve for most enterprises?

Nenshad Bardoliwalla: No, like all technology adoption waves, Hadoop is going through its own lifecycle. Garter does a very good job talking about hype cycles wherein everyone gets really excited and then things go into darkness until they ultimately find their productivity cycle. What you see with Hadoop is organizations that had been a really pushing the boundaries of the Oracles and the Teradatas of the world realized they had hit an inflection point where they just literally could not contain the costs of being able to do this, so they made the leap. As those organizations are driving the evolution of Hadoop, companies like Cloudera, Hortonworks and MapR are all doing their part to further consumerize the technology. I see it as inevitable that Hadoop adoption will ultimately reach the same levels of maturities that we see for database technologies.

insideBIGDATA: Given the ability to store diverse types of data, Hadoop would seem to be an ideal fit for BI and analytics in today’s world. How does it change the data warehouse model?

Nenshad Bardoliwalla: Hadoop started as a very low cost distributed file system and then evolved into a very simple, but highly reliable and scalable programing model, which was MapReduce. In 2016, Hadoop has really become a general purpose data operating system and ecosystem that allows for a vast array of use cases. So you can use Hadoop systems for stream processing, for transactional key value stores, and graph processing.

Traditionally, most companies have asked IT to infer the types of questions that people were going to ask about data in transactional systems like ERPs and CRMs. Then they asked them to build data structures in a data warehouse and a set of reporting tools to deliver the answers. The Hadoop model inverts that and takes all the data from the transactional systems in the rawest form possible with very little, if any, transformation and pumps it into what we’re calling a data lake from which to derive business value. All of the transformation is delayed to the latest point possible, schemas are imposed on demand, transformations are done on demand as needed, and a new class of tools and technologies have emerged. These new technologies are uniquely suited for taking the raw data from Hadoop and dynamically transforming it on the fly into information that is useful for decision making.

insideBIGDATA: How can enterprises know they’re getting the right data to the right business teams without creating a whole new IT-centered delivery mechanism?

Nenshad Bardoliwalla: The balance between IT and the business has always been one of extremes, and those extremes have caused a lot of the problems we see today. One extreme is the top-down, completely governed, locked-down IT model. In this scenario, the IT team takes the business requirements and builds out an entire rigid infrastructure based on those requirements only to find that what end users really want is to be able pull data down to Excel.  Another extreme are the agile, self-service productivity tools or desktop tools that are hard to govern, not built for scale, and impossible to “operationalize.” IT has no idea what people are doing in them, there is no visibility into what data is being used, how old it is, what’s being done with it, and people can create new versions of data reality. What results is chaos at the very least and an unmitigated disaster in the worst cases, both for the end users as well as for the enterprise as a whole.

Paxata’s mission stems from our vision of providing to provide both of these constituencies enough of what they want so they can coexist productively. The end user should should be able to take data from a variety of different sources such as relational databases, Hadoop infrastructures, and explore that data in any raw format with the ability to make a variety of transformations to that data and ultimately be able to deliver that data to a destination that can be used for further analytics. That sounds like the Excel model, but where the governance comes in is that we record every single step that the user takes. From an IT perspective, we can now shine a light on everything that is happening with the data in the organization that you would never have had visibility into otherwise.

insideBIGDATA:  So what is Paxata doing that’s unique?

Nenshad Bardoliwalla: Paxata eliminates the pain of data preparation with an enterprise-scale platform purpose-built to provide an interactive, analyst-centric data prep experience powered by a unified set of technologies designed from the ground up for comprehensive data integration, data quality, semantic enrichment, collaboration and governance. Paxata’s Adaptive Data Preparation platform makes it possible for analysts to access data regardless of where it resides or in what format.

If you are a programmer you can use the facilities within Hadoop to transform data and make it useful for being consumed by analytical tools. The problem with this approach is there aren’t that many qualified programmers in the world.

We designed our platform so analysts don’t have to write code; they can literally point and click to transform data interactively even on very large scale data. An analyst can load a hundred million rows or a billion rows of data into a Paxata cluster and interactively profile, cleanse, transform, join, append data and turn it into the information they need in real time. It combines massive scalability with a very easy to use end-user interface. Most other interfaces are very script heavy or they require you to sample the data and do all the transformations in batch.

Second, this single fluid platform provides data integration, data quality and master data management technologies in a very iterative, recursive way.

Third, we believe that machine language and new algorithmic techniques should be used to automate many of the hardest parts of data preparation. So when you give us two data sets we don’t expect you to figure out how to link those two data sets together and join them; our system recommends it to you just like Amazon recommends what additional books or children’s’ toys you should be buying.

Finally, by bringing together several disruptive technologies, including the elastic cloud, machine learning, distributed computing and a modern user interface, we’re able to deliver the only solution that lets every business analyst streamline their data preparation projects at massive scale.

insideBIGDATA: How do they apply this today?

Nenshad Bardoliwalla: We know that typically 80 percent of the work of analysts is devoted to data preparation — searching for data, pulling that data together, trying to manipulate that data and ultimately turn it into information that can then be used for analysis. As a result, only 20 percent of their time is devoted to the analysis itself. We flip that ratio and allow people who are not that technical to be able to do this manipulation in a drag-and-drop, point-and-click, very Excel-friendly web-based environment, and get assistance from intelligent tools and visualizations that are built into the system. This enables them to be extremely productive in producing information and performing the value-added analysis.

insideBIGDATA: What about putting that analytic power in the hands of a broader set of knowledge workers?

Nenshad Bardoliwalla: We think there are probably 200,000 data scientists in the world who truly have a combination of data skills, programming skills and business domain knowledge. And there are a few million folks who know how to use traditional data preparation tools like ETL, data quality, master data management tools, etc. What we are doing today is expanding the market to the couple hundred million folks who have Excel skills. We believe there is a “data scientist” in all of us – we just need the right solutions.

Ultimately, our vision is to deliver information on demand to any business person, not just the analysts. Our belief is that using machine learning and crowdsourcing and the continued extensive use of our platform in an organization, the average person won’t need to go into the interface that analysts use. They’ll be able to search for and get back information that was either already prepared, or the system will be smart enough to construct the new information on the fly based on what the end user is requesting.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: