I recently caught up with Ravi Mayuram, SVP Products & Engineering at Couchbase, to discuss recent developments in the NoSQL database industry such as the relationship with Hadoop and Spark, container technology, security, and much more. Ravi Mayuram is responsible for product development and delivery of Couchbase NoSQL offerings. He comes to Couchbase from Oracle where he was a senior director of engineering leading innovations in the areas of recommender systems and social graph, search and analytics, and lightweight client frameworks. He was also responsible for kickstarting the cloud collaboration platform. Previously in his career, Ravi has held senior technical and management positions at BEA, Siebel, Informix and HP in addition to couple of start ups including BroadBand office, a Kleiner Perkins funded venture. Ravi holds a MS in Mathematics from University of Delhi.
Daniel D. Gutierrez – Managing Editor, insideBIGDATA
insideBIGDATA: How do you see the continued convergence between operational technologies like NoSQL with analytical technologies like Hadoop and Spark?
Ravi Mayuram: The two technologies are highly complementary. NoSQL databases are the go to technology for operational databases, due to their scalability, performance, flexibility, and lower operational cost. Hadoop and Spark are the goto technology for ETL, analytical, and machine learning workloads due to their distributed, scalable data processing capabilities. Couchbase customers like PayPal, LinkedIn, and many others integrate these two technologies by using the Couchbase connectors for Hadoop and Spark. This integration creates key analytical insights, that in turn create richer customer experiences and improve enterprise operational efficiencies. Low latency, high volume NoSQL operational databases like Couchbase, combined with scalable, distributed processing platforms like Hadoop and Spark provide the competitive edge that innovative companies are searching for – near real-time actionable insights based on rich analytics and sophisticated machine learning.
This is all well and good, however, the above typically requires several things:
- Maintaining both a NoSQL database cluster as well as a cluster for Hadoop and/or Spark.
- Massive data exchange to read and write data between the two systems.
- Support for multiple application development contexts and APIs – one for the NoSQL database operations, and another for analytical and machine learning processing.
All of the above adds complexity, latency and cost to even the simplest analytical function. Additionally, customers use Couchbase as their operational NoSQL databases because we support a rich data model, based on flexible JSON documents. In many cases they want to run analytical functions over their JSON data directly, without requiring additional ETL processing.
So the question becomes, “Can we make this easier and faster? Does every analytical problem have to be solved outside of the NoSQL database? Is there a class of analytic workloads that could be handled directly within the NoSQL database itself?” This is clearly the next step for NoSQL databases. Adding simple analytical capabilities to NoSQL will increase the number of application use cases that can be solved using NoSQL, and it will address some of the issues that were identified above. Complex analytics and machine learning will still take place in Hadoop or Spark, but simple analytics and aggregation processing can be resolved directly in NoSQL.
From a Couchbase perspective, the biggest challenge isn’t adding the analytical functionality to the database – that’s relatively easy. The challenge is doing it in a way that retains the NoSQL characteristics of scalability, performance, flexibility, and lower operational cost. By leveraging our revolutionary Multi-Dimensional Scaling (MDS) architecture, we could potentially add a new service to the Couchbase Server – let’s call it an Analytics Service – that could be independently configured, resourced, and scaled just like our data, indexing, query, and full text search services are today. This would enable customers to perform analytical queries in Couchbase without compromising the performance and throughput of the other services within the cluster.
insideBIGDATA: Why does running NoSQL databases in container technologies like Docker hold a lot of promise and will quickly become well adopted?
Ravi Mayuram: NoSQL systems are designed to easily scale out across clusters of commodity servers. NoSQL databases, like Couchbase, make it easier to scale out by supporting automated data distribution and replication, automatic data rebalancing as nodes are added and removed from the cluster, and automatic failure detection and fail-over. Through our MDS architecture, Couchbase also makes it easy to scale up as well as scale out by supporting workload isolation and resource allocation, which allows customers to separately configure and scale data, indexing, query, and full text search data services within the Couchbase Server cluster.
Container technologies are highly synergistic with NoSQL databases because they make deploying systems into private, public, and hybrid clouds, as well as on-premise platforms much easier. Containers are a lightweight way for application developers and DevOps engineers to encapsulate just the application software needed to run an application – in this case, one or more NoSQL instances or nodes. Once the container is defined, it can quickly and easily be deployed in a variety of target environments. Containers are typically used to “spin up” simple NoSQL clusters, or additional nodes, or to migrate an application between development, QA, staging, and production environments. For example, containers make it very easy for application developers to quickly create and test multi-node NoSQL clusters on their local laptops in just a few minutes.
Docker and other container technologies simplify and accelerate deployment of NoSQL databases, which is why so many NoSQL providers have embraced container technology. Couchbase Server provides official Docker images on Docker Hub that allow users to start Couchbase as a Docker container. Multiple container orchestration frameworks are also supported, including Docker Swarm, Kubernetes, and Apache.
insideBIGDATA: How will security play a more important role in NoSQL, unlike the early years when the focus was on other features?
Ravi Mayuram: In the early years, NoSQL was really all about performance, throughput, and scalability. In the past, NoSQL was often used for a single or isolated application or service. Security was usually encapsulated within the application itself and very little was required from the NoSQL database. Most NoSQL databases started adding security primarily from an external system protection standpoint – adding capabilities like encryption over the wire and at rest, as well as simple authentication and authorization.
That said, it’s no longer the case that NoSQL is only used by a single or isolated service. Enterprises are implementing multiple NoSQL-based applications and the data is being shared across many different applications and users. NoSQL has also started to become “the system of record” for a wide variety of web, mobile, and IoT customer facing applications. As the data being stored in NoSQL systems becomes increasingly more critical, more sensitive, and more widely shared between applications it’s clear that a more comprehensive approach to security is required.
Security in NoSQL has already started to make the shift from external system protection to operational access control. Many features that we’re used to seeing in relational databases are starting to appear in NoSQL. Features like role-based access control (RBAC) for administrators and users, record and field level access control and data masking, as well as advanced auditing are being introduced to varying degrees within several NoSQL products.
At Couchbase, we’ve addressed the area of security by introducing integrated LDAP identity management, rich configurable auditing, RBAC for administrators, and improved X.509 certificate management. We’ve also partnered with industry leading technology providers like Centrify, Gemalto, and Vormetric to provide application, user and field authentication and access control, data masking, and encryption. Couchbase is the only NoSQL vendor to provide an end-to-end secure mobile solution that combines Couchbase Server, Couchbase Lite, and Couchbase Sync Gateway.
insideBIGDATA: How will the future see more NoSQL databases adopting features found in RDBMS, more real time analytical capabilities built into NoSQL?
Ravi Mayuram: As more and more companies move to the Digital Economy, they quickly discover that their new applications require a new kind of data management platform – one that provides scalability, performance, flexibility, and lower operational cost. NoSQL databases support many of the basic RDBMS capabilities within a modern, distributed architecture, that leverages database and storage technology advances in memory, network, and fast storage.
The most innovative enterprises are introducing NoSQL to replace or augment legacy databases, including Oracle, DB2, and SQL Server. Ultimately, what they want is the best of both worlds – the powerful query and enterprise features of a relational database, combined with the scalability, performance, and flexibility of a NoSQL database. To make the transition to NoSQL easier, vendors will need to provide RDBMS like features wherever possible. For example, in the last year Couchbase has added support for N1QL (a declarative query language, based on SQL), JOINs, interactive schema browsing, ad hoc query editing and execution, comprehensive indexing options to enhance query performance, advanced security options, and improved database backup/restore tools.
As more and more NoSQL based applications are deployed and the number of use cases for NoSQL grows, vendors will need to look beyond their own feature sets and focus on how to make it easier for enterprises, both developers and operations, to more broadly adopt NoSQL.
insideBIGDATA: What is currently in the pipeline for Couchbase?
Ravi Mayuram: From where I stand, the future’s looking very bright, indeed.
Our first priority is to listen and respond to our customers. They are the ones who are putting our products to the test, and solving real world challenges. Staying focused on the success of our customers is always our first job. Many of the features and enhancements that are in the pipeline are based on customer feedback.
As we mentioned above, we’ve introduced significant security features, and there’s more to come. Upcoming releases of Couchbase will include RBAC for applications, data auditing, ondisk encryption, and Kerberos integration. We also plan to extend our technology partnerships with industry leading solution providers in the Security Information and Event Management (SIEM) space.
We released our query and Global Secondary Indexing services, just under a year ago. Clearly we will continue to focus on improvements to N1QL, indexing, and query processing capabilities.
We recently added a builtin Full Text Search service as a Developer Preview in Couchbase Server 4.5. We’ll be gathering feedback from some of our early customers and working on the General Release of this new service.
Couchbase Mobile, for building fast, powerful, and secure mobile and web applications, has seen tremendous growth and adoption over the past year. We’re working on significant enhancements to Couchbase Mobile and strengthening the integration of the Couchbase Sync Gateway with the Couchbase Server.
Sign up for the free insideBIGDATA newsletter.