Sign up for our newsletter and get the latest big data news and analysis.

Cloudy with a Chance of On-prem

In this special guest feature, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, explores different use cases that may be best run in the cloud versus on premises, points out opportunities to optimize cost and operational benefits, and explain how to get the data moved between locations. Jim has held positions running operations, engineering, architecture and QA teams. He is the cofounder of the Chicago Hadoop Users Group (CHUG), where he has coordinated the Chicago Hadoop community for the past four years. Jim has worked in the consumer packaged goods, digital advertising, digital mapping, chemical and pharmaceutical industries. He’s built systems that handle more than 50 billion transactions per day. Jim’s work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.

How companies store, process and apply data has been at the heart of the biggest enterprise technology change in the past 30 years. We are now in the midst of a digital transformation that is reshaping how business operates. Key assumptions from previous decades are now being questioned around enterprise technology. Not only that production and analytical systems must be separate, but that data must only be stored and utilized on-premise.

Why is digital transformation such an important topic. Well it’s not lost on business leaders that disruption through data is a significant competitive threat. Companies like Amazon, Airbnb and Uber have revolutionized their respective markets. Without owning any book stores, hotel rooms or cars their respective business driver has been data. Data is the key, and agility is how they have achieved success.

For those attempting to be a market disruptor, the next-generation applications combine the immediacy of real-time operational data with the insights of analytical workloads. These applications leverage continuous analytics, automated actions, and rapid response to better impact business as it happens. And it all happens with the integration of historical and real-time data in a single, unified platform.

Businesses are under immediate pressure to reduce the cost of legacy systems, deal with the volume, variety and velocity of new data and to deliver more to their customers. If that wasn’t enough, there is immense pressure to figure out how to get the most out of existing infrastructures and to leverage the cloud.

There are a couple of very important details that must not be overlooked when looking at the cloud. First and foremost is the topic of vendor lock-in. The cloud is looked upon by some as being the single biggest attempt at lock-in since the days of mainframe. Leverage the cloud via a concept like cloud neutrality. This enables freedom of choice of cloud providers and prevents the need to re-engineer software solutions in order to move between the available cloud offerings. The second is the concept of data gravity. This is all about how cloud providers let you get data into (ingress) their systems for free, but charge you for taking the data out (egress). The larger the volume of data, the more expensive it is to get your data out, creating a tendency to stay where you are. In addition to the cost of getting data out, there may be considerable complexity to actually copying the data out to another system.

Newer businesses which do not have an existing capital investment in infrastructure will find it much easier to start in the cloud. If your business has a significant existing capital investment in infrastructure it is unlikely that moving completely to the cloud will occur in a short term. While still possible, the cost is usually prohibitive until the existing infrastructure has outlived its usefulness.

If the cloud is an option for your business, there are three real approaches to consider. The first is all-in, the second is to leverage the cloud and on-premise infrastructure simultaneously, and the third is to leverage multiple cloud providers simultaneously. All of which can be done in a way which prevents cloud vendor lock-in.

Standard APIs are the key to preventing vendor lock-in. If we consider a converged data platform, it should offer standard open sourced APIs. It should run in a private data center, in the cloud, or both at the same time and handle all data movement eliminating the need for writing homegrown applications to manage data movement. This frees up resources to focus on the core competencies of the business and to get the most out of the cloud as infrastructure as a service (IaaS). By starting on-premise and putting data into a converged data platform moving to the cloud becomes effortless because it can also run in the cloud and can mirror or replicate the data automatically; removing additional data management complexities. It even has multi-master and hot-hot capabilities.

The cloud can be a great place to operate, or a scary proposition, depending on your vantage point. My suggestion is to become as knowledgeable as possible on the platform capabilities you select and always be sure that your business longevity is the number one priority in all your decisions.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: