Sign up for our newsletter and get the latest big data news and analysis.

Is Open Source Analytics in Your Budget?

In this special guest feature, Thomas Hazel, Founder, CTO and Chief Scientist of CHAOSSEARCH, provide tips on how to evaluate the total cost ownership of open source solutions. CHAOSSEARCH is a managed services provider whose log analytics platform enables massive scale at disruptive prices. Thomas is a serial entrepreneur at the forefront of communication, virtualization, and database technology, and the inventor of CHAOSSEARCH’s patent pending IP. He holds a Bachelor of Science in Computer Science from University of New Hampshire, and founded both student and professional chapters of the Association for Computing Machinery (ACM).

Open source has gone mainstream. And not only at small to mid-sized businesses who are attracted by its promise of “free” software. Fortune 500 businesses have also jumped on the bandwagon.

There’s good reason for this big uptick in adoption. First, it’s simple to get started with open source software. Transparency of source code makes it easy to evaluate. There are typically no licenses to purchase, no vendor interactions required. You simply download the software.

Open source code is generally high-quality and the software’s capabilities are often innovative, thanks to the community of developers who continually test and enhance it. And because that community can be extremely large and located throughout the world, this can happen around the clock, resulting in fast code review and fixes and reliable, cutting-edge software.

And did I mention cost? Although that’s not always the driving factor – the software must be solid and have the right features – it is a top consideration. IT decision-makers always have a budget they need to stay within. But herein lies the challenge, particularly for self-hosted open source use cases that rely on big data. What starts out as a low-cost, relatively easy option can quickly become a surprisingly complex and costly endeavor, especially when terabytes of monthly data are involved, as they are in applications like log analytics.

Evaluating Cost Considerations

So how do you determine whether open source fits in your big data analytics budget? You need to carefully consider all known and hidden costs in order to arrive at the likely year-over-year total cost of ownership. At a high level, these costs can be bucketed into three categories: infrastructure, customization, and ongoing operations.

1.       Infrastructure 

Since infrastructure varies based on the amount of data you expect to generate, estimating your data volume accurately is very important. Particularly if you’re running in the cloud, where every increase in storage and compute costs money. If you over-provision, you waste money. But if you under-provision, you can lose data, miss critical insights, and impede business performance.

Here are key factors to consider when estimating your capacity needs:

  • Daily data volume. In the case of log analytics, this would include data from applications, systems, and networks
  • When/if your organization typically experiences spikes in data volume. You need to ensure your environment can scale with ease so that influxes of data don’t become bottlenecks.
  • How long you need to retain data  
  • How much your volume will grow year over year
  • How many additional servers you’ll need and their configurations, e.g., processor class, memory, storage. Expect to add servers as volumes grow to maintain performance.

 2.       Customization

If you can get value from your downloaded open source software with little to no customization, then your costs will be contained. But with analytics software, there is often a considerable amount of customization required to get to a production-ready solution. In order to determine how many developers you need to dedicate to this project, factor in these customization components:

  • Configuring the solution to ingest, clean and parse data from all sources – and maintaining what could be hundreds of configurations needed to accommodate the large variety of frameworks and data formats.
  •  Building a resilient data pipeline and ensuring that you don’t lose data if your system generates events faster than it can index them.
  • Handling mapping exceptions. To ensure the solution indexes documents instead of returning failure messages and dropping data that doesn’t fit, you have to keep formats consistent and consistently monitor exceptions. 
  • Ensuring data consistency. Apply relevant parsing abilities in the data collection component of your solution to ensure you have correct fields for searching data and visualizing results.
  • Implementing monitoring and alerting capabilities that notify you of performance and potential security issues.

3.       Ongoing Operations

As data volumes increase – which they are likely to do – more resources are consumed and new complexities and issues arise. Experts must be on hand to respond to these issues and perform the day-to-day maintenance that self-hosted open source implementations require. Keep in mind that the number of people required to handle operations will grow as your data expands. Here are some of the jobs they’ll be tasked with:

  • Maintaining your infrastructure and planning capacity increases.
  • Reindexing outdated indices to stave off potential failures and data losses.
  • Monitoring cluster health and responding to failures.
  • Handling software upgrades, including thoroughly researching what the changes are before deciding whether to implement them. To make sure you don’t lose data during upgrades, run tests in a non-production environment first.

It’s not unusual for self-hosted open source analytics solutions to see TCOs that double every year as data volumes grow along with the costs of hosting, customizing, scaling, and maintaining an increasingly complex infrastructure. Make sure you plan for this. Alternatively, you can look to managed services providers who leverage open source analytics software in their own solutions and do the heavy lifting for you.

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: