Smart Business: Treat Your Data Like a Stream of Events

Print Friendly, PDF & Email

In this special guest feature, Bobby Johnson, Co-founder and CTO of Interana, examines the value of “event data” to the enterprise and if you treat all of your data as a stream of events, problems become easier to solve. Prior, Bobby was an early Facebook executive, responsible for the infrastructure engineering team that scaled Facebook during its heaviest growth years from 2006-2012, taking the social-media giant from a few million college users to over a billion users globally. During his tenure at Facebook, Bobby’s team solved difficult scaling and infrastructure challenges and built technology such as Hive and Cassandra. Bobby personally wrote Scribe.

We have all called customer service to complain when something is not working, and we have all been so excited by a new product or service that we have written a positive review. In most customer interactions, customers are either the best or the worst of themselves, but this isn’t a good indication of what they actually think most of the time. It is the interactions in-between these communications that are really telling. For example, a customer who is slowly losing interest in your product or service is a problem you’d like to know about, but it might never show up with an angry phone call.

The most dependable way your customers talk to you is through their day-to-day actions, which always speak louder than words. However, these daily actions are still a mystery to most companies.

If it is these thousands of little interactions done every day that are the most valuable, why aren’t companies paying more attention to this data?

You may think it is because this amount of data is so big no one can analyze it—however, big or small, it is not the amount of data but rather a unique kind of data that makes this challenging for companies, known as event data. Event data is a different kind of data, and most tools have trouble capturing it and analyzing it in real-time. Only the most innovative of companies are analyzing huge volumes of event data to truly understand their customers’ behavior. Let’s back up.

What is event data?

At its core, event data is a different, but simple to spot, category of data. An event is a description of something that happened at a moment in time. Event data refers to any data point that has a time-stamp, entity and attributes of an action. Simply put, as we shop, search, socialize, play games, research, drive, interact with content and with our devices, we are creating an endless stream of event data.

By analyzing event data, companies are able to find out how well a product is working, what features customers like the best, what marketing campaigns customers respond to the most, and even if something is broken within an app.

Demand for event data is the fastest growing category of data that most people have never heard of. Although the term “event data” is gaining more attention, event data itself has been around for ages. Businesses have been generating (and collecting) event data for more than 50 years, but the ability to gain actionable insights from the data has been a hurdle.

How has event data been used historically?

Fifty years ago, businesses couldn’t do much with this data. As we’ve all seen in PBS documentaries and news footage of the era, computers were much slower and more expensive in the mid-twentieth century, which limited what companies could do with data. For example, Visa or MasterCard might have had records for 100 million credit card transactions in a given year, but they wouldn’t be able to process that raw data in real-time to provide immediate insights into user behavior. The process was instead much slower. These businesses would summarize their transaction data, and then try to draw insights from those summaries to run their businesses.

Most companies are stuck in the past as they summarize data and use it for ad hoc queries and reports. While some may use the raw data, these outdated processes typically require significant amounts of time and skill. For example, an organization may keep summaries in an enterprise data warehouse and raw data on Hadoop. If they can use the summary data, they can answer questions much more quickly. However, if they can’t access the summary data, they have to choose between using the raw data (which may require writing code and waiting for long queries to finish) or using data that isn’t quite right (and risk not getting the answer to the right question).

Additionally, making data accessible is slow, expensive and hard. Most data is squirreled away all over an organization and legacy data tools are slow, hard to use and not suited to large data sizes.

Simply put, businesses can do better.

What can businesses do better?

Understanding day-to-day behavior is critical to retention and conversion for a company, which is why it needs to be everyone’s job to listen to the customer and listen to corresponding customer data.

Over the past 50 years, computers have become dramatically faster, cheaper and more powerful. If you organize your data into a stream of events, you can answer questions quickly without having to summarize that data. Additionally, you can look for correlation between events and patterns between daily interactions.

The event model

If you treat all of your data as a stream of events, problems become easier to solve.

It’s critical that every event is stored in chronological order, distributed across a set of machines. If you organize event data this way, both writing and reading data becomes much faster. For example, writing data becomes faster because you simply write an event to storage, and don’t have to navigate and update indexes (or worry about transactional systems). Reading data is faster because the data is organized in chronological order. Most metrics can be calculated in a single pass over the data: counts, sums, order statistics, counts, moving average, ratios, correlations and more. You can even answer questions about sequences of events (paths, funnels, etc.) on demand. In addition, this method enables cluster or regression analysis with a smaller number of passes.

Conceptually, it’s useful to draw a parallel between what event data provides and MapReduce: both offer a way to think about problems that makes computation more efficient.

Given the choice between knowing or not knowing what’s going on, it is better business to know.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. Great article and exactly why we developed a real time, event streaming solution (the Data Stream Manager), as we saw many clients struggling with this problem across the myriad of data sources and technologies they had within their data ecosystems.