In this special guest feature, John Mertic, Director of Program Management for ODPi and Open Mainframe Project at The Linux Foundation, makes the argument that “When it comes to your data in the cloud, there are certain pieces to the technology puzzle you should have nailed down with five baseline things to address – regardless of your IT, data output, cloud provider and security – before making the switch…” Previously, Mertic was director of business development software alliances at Bitnami. John comes from a PHP and Open Source background, being a developer, evangelist, and partnership leader at SugarCRM, board member at OW2, president of OpenSocial, and frequent conference speaker around the world. As an avid writer, he has published articles on IBM Developerworks, Apple Developer Connection, and PHP Architect, and authored the book The Definitive Guide to SugarCRM: Better Business Applications and the book Building on SugarCRM.
Over the years, as Apache Hadoop was maturing and making its way into mainstream enterprise environments, the traditional thought process was that organizations should run their own hardware. But this practice of solitary hardware management created an unforeseen barrier to its deployment – as Hadoop’s computational power and elasticity have evolved to a point where it’s often a preferred platform. While most big data users and producers have historically avoided the cloud for storing and delivering data, we’re starting to see a significant demand for the scaling capabilities of the cloud translated for the huge outputs of big data operations.
This rising overlap between tactical data storage and analyzation, and a noticeable step back from conventional physical hardware has blazed a trail for offering staple frameworks, like Hadoop, as a service.
Numerous projects under the Apache Software Foundation community – like Hadoop Compatible File System (HCFS), for example – have also helped to bridge this long-standing gap. HCFS is far more encompassing than its predecessors and enables both storage and cloud vendors to leverage meaningful data through their own native storage solutions.
This growing push of stored and processed data to the cloud allows users to move insights native to an exclusive platform, where it was once permanently stored, across the enterprise much more easily before – especially for large-scale, multi-location teams.
However, none of this is to say that the cloud is a be-all end-all solution for just any data-driven enterprise. When it comes to your data in the cloud, there are certain pieces to the technology puzzle you should have nailed down. Here are the five baseline things to address – regardless of your IT, data output, cloud provider and security – before making the switch:
- What are the outcome(s) you are trying to achieve?
While this may seem like an obvious starting point, we often look at outcomes in terms of what technology we have at hand. However, the most successful implementations start by first determining what your organization is actually trying to solve. What’s the problem? Who’s the customer? What are you hoping to gain from this? What are their needs? What are the things they’re struggling with? Starting from here will help connect the personal and company needs in order to better focus the technology selection process.
- What data do I have available?
Knowing these outcomes, you’ll next need to determine what data assets are available to you. What data sets exist already? What additional data sets do you need to collect? How are you collecting this data? Is it residing inside of a data warehouse, or is it coming from a real-time stream of IoT data? Is this data coming to you at the pace you need for it to be useful and/or actionable? Are you needing to spend a lot of time and processing to get these assets into a form that is beneficial? Understanding what data, and from where, you have available and your average time investment in procuring these assets will be crucial in determining how well your current environment is performing for you.
- What actions can I take considering my objectives and available data?
This baseline is the marrying of the two above considerations, answering the poignant question “Here are my questions, here are my sources – how do I want to approach this?” For example, if my first question of my data is, “I need to proactively know if a car battery is going to fail, so that I can let the driver know not to turn off their car in the middle of the desert or else they will be stranded;” that is a much different tactic than, “What was our average gas mileage over the last 30 days and how might we improve it?” The varying questions you’re able to ask of your data will prove invaluable as your organization increasingly relies on actionable data for competitive insights.
- Will this technology help us grow and become more competitive?
This technology space is growing, just look at the number of new open source projects being launched. For example, the Apache Software Foundation – which champions much of this open source work – launches a new top-level project in the Hadoop/Big Data space every six to eight weeks. Add that to a constantly-growing startup scene and numerous legacy companies investing in products in this space, and that’s a lot to keep pace with – especially as the demands on IT from the various lines of business increase as organizations seek the holy grail of being “data driven.” Today’s IT leaders are being challenged to ask important questions, like “Does it make sense for me to invest in these technologies?” “Which peer companies have had success here?” “Which companies are including this in their technology offerings and why?” A key shift is happening across today’s organizations, where becoming a leader amongst their peers means they must differentiate their technology platform – which gives a company the unique ability to execute, engage the customer, and deliver better results than their competition when investing wisely.
- Can standards make my investment safer?
Making long-term platform investments will require you to think about the ecosystem as a whole, rather than simply what a single vendor can provide. Healthy, vibrant platforms have strong ecosystems that give organizations versatility and choice. Open source is a great start, but without the structure of trusted standards you could find yourself in an innovation dead-end quickly. Hadoop, for example, is a platform that has suffered from a lack of interoperability; end users have been forced to make a Hadoop platform choice first and then live in the vendor’s product vision from there. One crucial lesson from the cloud is that infrastructure is commodity and should be transparent to the users’ needs. Standardization not only makes it cheaper to evolve with innovation, versus being pigeonholed like many have before, but it allows companies to to grow in their solution. As you find more business challenges you need to tackle, having the ability to tack on new and compatible capabilities will empower your organization and benefit your customers.
After addressing these questions, I’m sure you’ll come to the same conclusion as I did: this decision will not be static. No matter your industry or company size, your organization’s needs for data insight will continuously change, regardless of the framework you currently lean toward. In fact, the more successful you become with delivering data insights, the more quickly the demand for more diverse data insights will come. So, while the idea of moving your data to the cloud – whether through a single public cloud provider, as a multi-cloud model, or as a hybrid model of on-premises, private cloud and public cloud – might seem like a daunting idea on paper, it will certainly remove lingering incompatibilities and inefficiencies. Before you make or change your investment, going over these baseline questions will help responsibly (and not reactionarily) determine how the cloud can work for your present organization.
Sign up for the free insideBIGDATA newsletter.