In this special guest feature, Prat Moghe, CEO of Cazena, highlights the important considerations for migrating data warehousing to the cloud. Prat is founder and CEO of Cazena and also a successful big data entrepreneur with more than 18 years of experience inventing next-generation technology products and building strong teams in the technology sector. As senior vice president of strategy, products and marketing at IBM Netezza, he led a worldwide 400-person team that launched the latest Netezza data warehouse appliance, which became a market leader in price and performance, as well as IBM’s first big data appliance. Following Netezza’s sale to IBM for $1.7 billion in 2010, Prat drove the company’s growth strategy and was the force behind its thought leadership in appliances and analytics.
Data warehousing has changed significantly over the last 15 years. First, the data warehouse appliance revolutionized the market with a preconfigured box that made analytic databases easier to deploy and maintain. Then, Amazon introduced Redshift, a cloud-based “Data Warehousing as a Service.” But for many enterprises, there have been barriers to data warehousing in the cloud.
The main challenge is that data warehousing is not just one thing – it’s a set of systems and processes, with interdependencies between many sources and applications. So while it’s easy to spin up a node on Redshift, it’s hard to simply move a data warehouse to the cloud.
However, leveraging cloud data warehousing doesn’t require a wholesale migration or “rip and replace.” Enterprises increasingly use the cloud to augment on-premises investments. With a hybrid architecture, companies are able to maintain existing on-premises systems and leverage the cloud for certain workloads or processes.
Why Cloud for Data Warehousing?
Beyond obvious drivers such as cost cutting and elasticity, there are specific factors that increase urgency for cloud data warehousing. Many appliances are aging, reaching capacity and approaching end-of-life on support contracts. At the same time, companies are consolidating datacenters and reducing infrastructure footprints and in the process, evaluating what could migrate to the cloud.
Related, with SaaS applications, big data and new cloud data sources (sensors, social, mobile, etc.) – a growing share of an enterprise’s data is generated outside its firewall. This shifting “data gravity” leads many companies to rethink their architectures. In the process, many look at cloud options and open-source technologies like Hadoop and Spark, which can be used for some workloads previously handled in the warehouse.
The increased emphasis on analytically-driven decisions has also meant the regular introduction of new data sources and demands for new capabilities. Ultimately, this need for analytic agility drives many cloud projects. It’s easier and faster to change a data warehouse in the cloud than in most datacenters. The cloud allows for simple procurement, granular expansion and fast adoption of new technologies – unlike buying a whole new appliance, which locks enterprises into a specific architectural paradigm.
Which Data Warehouse Workloads are Best for Cloud?
Many companies start by focusing on data that’s already in the cloud. This includes data from SaaS applications, purchased datasets, sensor logs, social or other big data. These datasets can be collected, stored, filtered, aggregated or analytically explored in a cloud data lake or the like. Then, a smaller subset of data is moved back on-premises.
Some workloads can be offloaded from data warehouses, freeing up capacity for other jobs. Practically speaking, many companies look to offload workloads that require the most capacity or elasticity. A common start is ad hoc analytics and data science jobs that require lots of processing power or high-volume big data. Other offload candidates are “warm” archive datasets that aren’t accessed often, but still need to be query-able. Some offload to cloud data marts for efficient information sharing with partners, customers or remote stakeholders.
Getting Started with Data Warehousing in the Cloud
Cloud data technology is more mature and secure than ever, but it’s still challenging for enterprises. Spinning up clusters may be easy but optimizing, securing, integrating, production hardening and operationalizing is hard. Consider these recommendations upfront:
- Choose workloads wisely. In addition to the data, evaluate related processes (loading, integration, quality, applications, etc.) as well as the tools and users reliant on that data.
- Select the right technology. Weigh all the options for each workload, including new advances like Hadoop. Also consider available skills. Provisioning and optimizing cloud technology can be challenging, and experienced hands are in short supply.
- Architect for security and flexibility. Cloud providers offer hundreds of security controls. Consider how to design for security, flexibility and avoiding vendor lock-in.
- Consider data movement early. You’ll need encryption, and functions for compliance, auditing and logging. Evaluate costs for cloud ingest and egress.
- Look carefully at integration points. Consider how to best connect sources and integrate with other analytics tools and systems, such as BI, ETL and MDM.
- Don’t underestimate monitoring and operations. Things change quickly in the cloud. Evaluate who has the time and skills for tasks like security monitoring, patching, platform updates, backup and restore, etc.
The good news is that you don’t have to build and maintain it by yourself. New options for Big Data as a Service cost up to 80 percent less than on-premises data warehouses. Providers offer different features and functions, so carefully evaluate your own requirements, as well as the key areas above.
Sign up for the free insideBIGDATA newsletter.