Hadoop: Time to Start Demanding Performance

Print Friendly, PDF & Email

Mike-Hoskins-280x280In this special guest feature, Mike Hoskins, CTO at Actian, offers some hard won tips for how to avoid the immaturity “cost iceberg” that hampers Hadoop adoption today. Michael directs Actian’s technology innovation strategies and evangelizes game-changing trends in big data, analytics, Hadoop and cloud to give insight into Accelerating Big Data 2.0™. Mike, a Distinguished and Centennial Alumnus of Ohio’s Bowling Green State University, is a respected technology thought leader.

Today’s Hadoop reality hasn’t lived up to the full promise many saw when the platform was introduced over ten years ago. In today’s world, many organizations must still hire a large number of expensive programmers to swim around in the data lake and toil endlessly on low-level open source infrastructure. While this may suit the programmer just fine, today’s businesses suffer because Hadoop still hasn’t grown up to satisfy business needs.

The biggest mistake customers make when adopting Hadoop today is underestimating the immaturity of the platform. Hadoop’s immaturity manifests itself most in the lack of high-productivity tooling and software – the Hadoop space is still dominated by expensive, opaque, low-performance, low-productivity and hard to manage custom code. Until Hadoop moves beyond this early-adopter and programmer-dominated stage, customers will continue to encounter an immaturity “cost iceberg” that hampers Hadoop adoption today.

This isn’t to say Hadoop isn’t capable of becoming the platform we all want it to be. I certainly haven’t given up on Hadoop, as it is still one of the best options we’ve got. The increase in the amount of data generated – 90% of the world’s data has been collected in the past two years – has forced organizations to look at data storage differently, and as a result, look at data access, analytics and security differently. Hadoop is definitely one of the answers.

Hadoop and similar scale-out, distributed data and distributed compute infrastructures like Cloud, are the single biggest IT game-changers we have seen in 30 years. The whole next generation of Big Data and advanced analytics will be built on these types of infrastructures.  However, for this promise to be realized, it is essential that these distributed Hadoop and Cloud ecosystems begin to elevate and “modernize” their game – precisely by reducing the love affair with high-cost, low-productivity programmer-led strategies, and moving to more proven strategies that rely on mature and production-grade software tools and products to drive the killer business advantage that comes from successful deployment of Big Data Analytics.

This sets the stage for the arrival of more “consumable” options. Instead of a focus on low-level infrastructure, organizations need to focus on where the value lies – in analytics. And with the increasing availability of mature, high-performance enterprise-class software products running natively in Hadoop, and the heavy investment being put into making Hadoop quickly and easily available in the Cloud, organizations can do just that. Instead of spending 80 percent of the time worrying about availability of the data, organizations can now focus on asking the right questions.

Organizations, particularly as they contemplate moving their Hadoop deployments into production, need to significantly raise the expectations bar. It is time for Hadoop to deliver the kind of capabilities – security, high-performance, high-concurrency, seamless integration with other enterprise tooling – that organizations expect. And with the right software and partners, Hadoop can become the modern, enterprise-class analytics platform that organizations want.  The key is to demand meaningful analytic outcomes right from the beginning, and not let custom code turn the data lake into a data swamp.

There are two ways organizations can ensure they are focusing on the right things and getting the most analytic value out of their Hadoop deployment.

  1. engage a mature Services partner who is focused on quickly ramping the Hadoop platform to serve the Advanced Analytic workloads that bring the real ROI from Big Data investments;
  2. deploy a proven standard of enterprise analytic software as a backbone of early Analytics projects – we think next-generation extreme high-performance columnar analytic databases are the obvious choice to drive early analytic success on Hadoop, with SQL being by far the most widely used data discovery and analysis language on the planet.

Hadoop is in our future, but expecting Hadoop to solve all of your problems “as is” will lead to a drain on your resources without much real business value to show for it. However with the right analytic software in place, organizations can expect rapid development cycles, and timely roll-out of new Advanced Analytic applications. It is time we start expecting – and demanding – more.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. surlypants says

    what is actian’s Matrix strategy?

  2. Mike,

    you should look into how http://www.iguaz.io addresses those challenges and seamlessly integrates into Hadoop & Spark, can check us out at Strata NYC