The Elephant in the Cloud

Print Friendly, PDF & Email

ashishthusooIn this special guest feature, Ashish Thusoo of Qubole makes a strong case for the virtues of Big-Data-in-the-cloud technology as a viable enterprise option for implementing enterprise-class data solutions. He discusses a handful of important capabilities a cloud provider should have in order to address the unique nature of big data analytics. Ashish Thusoo is CEO and co-founder of Qubole, a self-service platform for big data analytics. Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team.

There’s no denying that big data is essential for rapid business growth. Companies properly leveraging and analyzing big data achieve their business objectives more quickly, whether that’s customer retention, effective real-time campaigns, or improved operational efficiencies across the enterprise.

Over the last few years, Hadoop and related emerging technologies have sought to democratize processing and analytics on big data. While these technologies offer tremendous cost and capability benefits, many organizations still struggle with operating and expanding the use of these technologies for data processing.

According to a recent Capgemini report, “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational,” only 36 percent of businesses use the cloud for their big data projects (most leverage on-premises implementations). Surprisingly, only 13 percent of organizations surveyed have achieved full-scale production of their big data implementations, and only 27 percent of executives that were surveyed described their big data initiatives as successful.

Given the amount of investment in big data infrastructure over the past few years, these numbers are less than encouraging and expose the risks and challenges of successfully implementing big data initiatives.

One way to mitigate these risks, though, is by employing cloud-based big data analytics. The advantages of what is often referred to as Big Data-as-a-Service are clear and are aligned with the benefits of other enterprise cloud offerings, including little to no upfront investment, condensed time to get up and running, need for fewer administrators to manage infrastructure, and easy scalability. However, a cloud based Big Data as a Service offering needs certain specialized capabilities, as outlined below, that address the unique nature of big data analytics.


Managing and analyzing large amounts of data requires computing resources—the bigger the job and the faster it needs to get completed, the larger the size of the server cluster needed. With on-premises solutions, you’re limited by the number of servers deployed in your infrastructure and by the number of other jobs happening at the same time. That’s not the case in the cloud—you can theoretically have any size cluster and any number of jobs.

In the could, auto-scaling—both up and down—becomes extremely important to get the most value out of your data for the least cost. To truly deliver Big Data as a Service, it should take just a few clicks to set up the size cluster you need. And when the cluster is no longer needed, the system should wind itself down automatically.

Data access and collaboration

Organizations can’t fully leverage their big data infrastructure if it isn’t accessible to all parts of the business. Big Data analytics in the cloud, likewise, has an advantage here. Cloud technologies are inherently more accessible and break down data silos. Big Data-as-a-Service offerings should leverage this cloud traits to facilitate collaboration and sharing of results with colleagues across the organization so that the entire business can benefit. This also means that cloud based data analytics services need enterprise-strength security and built in data governance.


Organizations need the flexibility to combine data from multiple cloud sources. Connectors help move data from various data sources into and out of cloud platforms such as Amazon, Microsoft and Google. Pre-built connectors greatly simplify and speed deployment of cloud based Big Data platforms.

The right tools for the job

Big data technologies are evolving rapidly, and it seems there’s a new technology popping up every other month. For organizations with their own big data infrastructure, it’s tough to keep up. The IT team needs to evaluate the different technologies to see which ones might be appropriate for their needs, understand how it will impact their existing infrastructure, hire or develop the expertise and then train users on each new technology. By the time this process plays out, the new software may already be made obsolete by the next new tool.

Big data in the cloud can eliminate this issue. Some big data in the cloud offerings focus on one technology or one set of technologies. True Big Data-as-a-service offerings, however, provide access to a range of the latest technologies and evolve as technology evolves. That way data scientists can chose the right tool for the job at hand, whether that’s Hive, Presto, Pig or Spark, on the fly.

Big data is about insights not technology

Ultimately, big data initiatives are all about gaining new insights from all the data companies have access to, to propel their businesses forward. The faster data scientists can begin mining their data, and the more organizations can focus on unearthing and sharing insights without having to worry about maintaining specialized hardware and software infrastructure the better. As a wide body of analyst research shows, investments in big data infrastructure is taking too long, while rarely yielding the desired results. With more organizations moving more data to the cloud anyway, cloud-based data analytics offers the perfect antidote.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind