The Drive for Big Data in the Cloud

Print Friendly, PDF & Email

Alan_Clark_OpenstackIn this special guest feature, Alan Clark of OpenStack describes how the OpenStack open source IaaS provides a solution for big data in the cloud and how it can offer an attractive Hadoop deployment strategy. Alan Clark is Chairman of the OpenStack Foundation board which promotes the development, distribution and adoption of the OpenStack cloud operating system. As the independent home for OpenStack, the Foundation serves more than 18,000 Individual Members from 140 countries around the world. Alan is an experienced industry leader, open source advocate, a member of the SUSE leadership team, and SUSE strategy adviser for new industry initiatives and open source.

We’ve all seen statistics demonstrating the amount of data being generated and gathered daily with average amounts in the Petabyte and Exabyte range. Processing such large amounts of data is what Big Data is built to do. It’s no wonder that the industry has become so large with the amount of data and the potential business around it.

While there are multiple solution sets targeting big data analytics, we have traditionally focused on maximizing dedicated hardware and large processing power. As of recent there is a fast growing convergence of Big Data and cloud, particularly where the data sets are unstructured with simple data models — an area of specific focus for the Apache Hadoop technology.

Convergence of Big Data and Cloud — OpenStack

The mission of OpenStack is to produce a ubiquitous open source cloud platform that will meet the needs of public and private cloud providers. OpenStack is about Infrastructure-as-a-Service (IaaS), an open source project, community and ecosystem that has dramatically grown to over the past five years. Today the project hosts over 27,000 individual members, 2,000 contributors and 500 supporting companies. The number of components has grown from the original two to over 25 today.

These statistics convey the significance of open source and the transformation of the cloud over time. Users within the open source project can see the power of open source and appreciate its ability to embrace new ideas and market needs including the convergence of Big Data on cloud.

While this transformation is taking place, it is important to note that OpenStack is not recreating Hadoop or any of the other Big Data technologies. The OpenStack effort was created to facilitate the care and management of Big Data within the IaaS infrastructure; sometimes called Analytics-as-a-Service. The technology effort within OpenStack is code named Sahara and was created to provision, launch and manage Hadoop clusters on top of OpenStack, making it simple to deploy and manage Big Data infrastructure and tools.


Where Big Data has traditionally opted for dedicated hardware, what is driving it to the cloud? Cloud espouses dynamic workloads, multi-tenancy and the sharing of resources. The clue is in the growth and expansion of analytics ideas and data based solutions, for example the expansion into real time on-demand analysis and response.   A great example of this would be the consumer shopping experience demonstrating need for real time on-demand analysis.

Retail establishments have traditionally used Big Data to predict retail trends by combining customer histories with current web browsing patterns and social media responses, thereby leveraging this data to target customer segments and prepare for customer demand. Such analysis is on going, and is predictable in scale, size and duration. Traditionally, workloads have been considered of low benefit for deployment within a cloud.

However, it is very evident that the customer shopping experience is changing. Not only are retailers looking to actively interact with enhanced intelligence; they are looking for immediate and personalized privileges and services. Such relationships evolve beyond a simple purchasing experience to active engagement throughout the life of the product. For example, in this day and age, a consumer doesn’t just buy a regular watch anymore; the watch now tells them when and where to eat based on a geographical location. A myriad of devices and even your shoes can record how many steps have been taken. Heart rates can be recorded and analyzed giving the consumer real-time health analysis.

Providing real-time services is the perfect fit for cloud, including real-time analytics needed for quick elasticity, rapid service deployment and agility. As businesses discover the potential of an enhanced customer relationship, ideas and innovative methods for that relationship drive business growth. Yet the services for this growth profile are different than traditional analytics.   Real-time analysis and storage vary over time and location. In other words, these types of services need to be elastic to rapidly respond to changing workload demands over time.

Yet today’s businesses maintain a need for technical efficiency to control cost, risk and security. Cloud has proven to be the most viable answer, providing businesses the agility and elasticity they are looking for while also providing centralized control and stability.

So Why OpenStack?

Flexibility – The large number of vendors contributing to OpenStack share ideas with a wide range of products and services. This variety enables a business to provide solutions tailored to fit their specific business need.

Openness – Open source is well known for preventing vendor lock-in as well as eliminating the ‘black box’ syndrome. While most businesses will not be interested in modifying the OpenStack code, understanding how and why the code behaves enables businesses to build and deploy services which perform faster, safer and more securely.

Technical prowess – Built within OpenStack are the services that are commonly touted by cloud, including centralized management, metering and monitoring, fast provisioning, self-service, elasticity for variant workloads and architecture extensibility,

Market momentum – All the latest statistics demonstrate OpenStack momentum in technology innovation, technical growth, market adoption and strength. Such momentum clearly demonstrates that Cloud innovation is going to occur within this fast moving ecosystem, making it the perfect platform to invest in for future needs and growth.

These combined benefits from OpenStack, cloud and Big Data signal a huge swell of new advances and services for real time scalable analytics solutions. Such innovation is great for the open source world, business growth and ultimately consumer needs. It’s an exciting time to be a part of such technical innovation.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind