Turning to Open Source Apache Cassandra Gave Our Data a Highly-Available Home

Print Friendly, PDF & Email

A growing cross-network advertising platform, we continue to be drawn to solutions that free up our internal resources from being bogged down by infrastructure management. While it’s a strategy that has been critical to our success, it hasn’t come without key infrastructure changes to make it work. Our challenge from day one has been balancing the fact that the strength of our database capabilities is absolutely essential to our product, but devoting all possible resources toward product development would give us the competitive differentiators we need to be successful.

Looking across our industry, this is by no means an uncommon position for an emerging business to find itself in, and solutions exist precisely to address this predicament. In our case, however, we faced a twist: quicker-than-anticipated growth propelled us beyond the scope of the startup-oriented solution we were relying on.

As background to why data is so important for us, we provide digital ad automation and optimization for advertisers. This includes insights gleaned from ongoing performance trends and analysis of large amounts of data from five major advertising networks. In a big way, our database needs are our clients’ needs – if our database experiences downtime or even high latency, our product cannot process the data needed to optimize advertisers’ campaigns, and is effectively down. Also, because our product is so data intensive, a single new customer can quickly amass data volumes well into the terabytes, meaning that we needed a database able to match that demand for scalability. Add to this the fact that we were adding new advertisers at a high clip during our early growth stage, and our concerns over high-availability and scalability became mission-critical requirements.

We selected the Apache Cassandra database early in our development cycle due to the key criteria of performance, ability to scale, and high availability. In the initial stages of development, we were able to rely on the commercialized version provided through DataStax Startup Program. This offers startups free use of DataStax Enterprise, a commercial variant of Apache Cassandra, without licensing fees as long as they fit the company’s terms and criteria for what size of business counts as a startup. We used Instaclustr in the early days to manage our clusters, deployed with the proprietary version of open source Cassandra.

However, as success scaled our production deployment to nearly one hundred nodes, we were proceeding headlong toward the threshold where we wouldn’t be eligible for the program as a “startup” anymore – and would need to pay expensive licensing fees to DataStax to continue. As these fees were cost-prohibitive and in no way viable for us in the long term, we decided to move to open source Apache Cassandra (as the DataStax Enterprise version provided no real critical advantages or benefits over the open source version). In addition, the vendor lock-in associated with a proprietary solution provided us with some real concerns for maintaining a cost effective and capable solution in the longer term.

Going the managed services provider route, Instaclustr was able to transition us to the open source version of Apache Cassandra with zero downtime. We now have a total of three clusters and 80+ nodes in our open source Apache Cassandra deployment, and find it to be perfect for our needs. With Cassandra, our platform maintains the low latency access to data and consistent throughput it requires to provide value. We’ve also found Cassandra ensures that our environment has no single point of failure, thanks to the open source database’s high-availability architecture. Finally, all scalability needs can be addressed by simply adding more nodes as the demand arises, and Instaclustr’s new dynamic scaling capability helps us to meet our rapid growth demands with ease.

Perhaps best of all, with open source Apache Cassandra in place and managed on our behalf, we’re are continually able to focus on our product and our customers.

About the Author

Jason Wu is the Chief Technology Officer at AdStage, a cross-network PPC automation and reporting solution for advertisers.





Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. You really need to check your facts. Last I check datastax gave a huge advantage over OS C* and Datastax has no vendor lock requirements.