The need for database replication in multiple environments–including Hadoop–is ultra-important to data extraction and analytics in the Big Data world. Continuent’s seamless replication on open source software allows organizations to work with zero down-time, utilize automatic failover, and have a disaster recovery system in place. We caught up with Robert Hodges, CEO of Continuent, to learn more.
insideBIGDATA: What aspects of database management does Continuent’s platform offer?
Robert Hodges: Continuent’s product starts with the open source (GPL v2) Tungsten Replicator, which provides core replication functionality, primarily for MySQL and Oracle databases. With our most recent Tungsten Replicator 3.0 release, the solution now includes support for other databases such as Vertica, MongoDB and most recently Hadoop. Tungsten Replicator supports advanced topologies, including fan-in, star, and multi-master, and multi-site, multi-master and filtering of the replication data. Through the use of a single, identifiable, global transaction ID, replication can be started, stopped, and restarted easily.
The Continuent Tungsten clustering solution builds on Tungsten Replicator, and adds connectivity layer and cluster management, to provide an easy-to-use database-as-a-service solution with bare-metal, private, and public cloud-based environments. With Continuent Tungsten you gain automatic failover, zero-downtime, disaster recovery, load balancing, improve performance and a management framework to make using and managing your database installation much easier.
insideBIGDATA: What sets Continuent apart from other providers?
Robert Hodges: The speed and flexibility of our core replicator solution and the companion Continuent Tungsten clustering solution offer advanced functionality in a simple and easily usable format. Tungsten Replicator supports high-speed replication between MySQL and Oracle databases in an open source product. Continuent Tungsten supports billions of transactions a day, with our largest single installation managing over 700 million transactions a day and over 225 terabytes of data. Key to all this is the ease of deployment and use, and the flexible nature of the solution, enabling cross-database replication, and advanced filtering not found in other products.
insideBIGDATA: Real-time analytics are huge in the Big Data world. What does your company do in this arena for the enterprise? Other industries?
Robert Hodges: Continuent has for some time supported the exchange of data from the traditional RDBMS environments of MySQL and Oracle, including replication between these two solutions, but also out to newer Big Data stores such as Vertica, InfoBright and InfiniDB. The platform is flexible, and that’s shown with Tungsten Replicator 3.0 where we have added real-time replication of data directly into Hadoop, including those solutions from Cloudera, HortonWorks, Amazon’s Elastic Map Reduce, and IBM InfoSphere BigInsights. Hadoop is the leading platform for Big Data analytics, but traditional methods for loading data into Hadoop have relied on intermittent exports and loads from RDBMS environments, which are both time-consuming and slow. With Tungsten Replicator 3.0, data can be loaded into Hadoop at the same rate as it is loaded and modified in the source RDBMS, bringing the freshness of the analytic data up to the same level as the source data.
insideBIGDATA: Aside from having a really cool name, can you please tell us what the Tungsten Replicator is all about?
Robert Hodges: Tungsten Replicator, released under a GPL V2 license, is a high performance, open source, data replication engine for MySQL and Oracle. Tungsten Replicator has all the features you expect from enterprise-class data replication products but with the flexibility and price-performance of open source. Tungsten Replicator is also one of the core components of Continuent’s clustering solution, Continuent Tungsten. The clustering by Continuent Tungsten provides high availability and disaster recovery, whereas Tungsten Replicator transfers events from one database server to another.
insideBIGDATA: Open source data management has gained a ton of momentum over the years. What does this mean for Continuent?
Robert Hodges: It is a key part of our business, but it is also our response to the ever-growing challenge to keep databases running and accessible 24×7. Continuent Tungsten addresses these requirements by supporting databases and handling failures and planned maintenance, working with open source technology to manage the underlying data requirements, including replication, filtering, and support for complex topologies. The open source approach is about maximizing your database resources on commodity hardware, and Continuent Tungsten helps our customers achieve that, handling terabytes of data every day using their familiar, open source technology, and our own open source solutions.
insideBIGDATA: What is your relationship like with the Hadoop community and what does this mean technologically for you?
Robert Hodges: We are right at the start of our journey with the Hadoop community. Although some of our staff has been working with the community for many years, Tungsten Replicator 3.0 is the first company-related relationship, and we’ve certainly started enthusiastically with partnerships with HortonWorks and Cloudera, and by getting Tungsten Replicator 3.0 certified for use with Cloudera Enterprise 5.0.
insideBIGDATA: In some ways these technologies are in the nascent stages. What does the future hold for your company and the industry as a whole?
Robert Hodges: One of the key differences we have seen in the last 12-18 months is the move away from separate data silos being used with different databases and ultimately different use cases. Instead, customers are using data in a significantly more heterogeneous fashion, sharing and using data actively from different databases, and with complex data flows between the different database systems. In the future, we only see this increasing as the quantity of the data increases, and the speed and accessibility to access, process, and extract useful information from that data increases exponentially. This is perhaps the biggest change that Hadoop brings to users of existing DBMS systems.