Rubicon.IO is a start-up in the threat intelligence space that real-time analytic capabilities by scouring metadata from various sources: threat feeds, social media, SIEM data, and PCAPs. It uses an HPC engine that aggregates and humanizes geospatial, TECHINT, HUMINT, and OSINT data sources. Rubicon provides the necessary context for businesses to respond to attacks appropriately in real-time – all delivered using advanced visualizations via a multi-dimensional user interface. To provide this intelligence, Rubicon needs to find and store large amounts of data and access that data in near real-time. To do this, they use an open source, distributed database called Riak.
When Rubicon first began development, they planned to use CouchDB for the original proof-of-concept. However, as they started testing CouchDB, they found that it couldn’t handle the scale of data that they needed to store and access. Its document-only model also meant that they were constantly updating documents, rather than scaling out with immutable data. Wes Brown, Founder and CTO at Rubicon, knew they needed to find something else and saw this as an opportunity to use Riak.
I have tested all of the NoSQL database offerings in the past and Riak was the only one that lived up to its promise,” said Wes. “All of them fell apart at some point, except for Riak. Riak is a fantastic key/value store that provides the scale and low-latency Rubicon needs.”
As mentioned, Rubicon uses an immutable data model, meaning once data is put in, it does not change. This prevents the expensive cycle of reading and then modifying writes. In Riak, Rubicon stores a key for every atomic observation or “fact.” These facts have sub-fields that have normalized names. This makes it very simple for Rubicon to search and index facts as needed, to return any that are related. For example, they might search for anything pertaining to a certain IP address to provide additional context to clients regarding an attack. By providing this context, it allows their clients to better understand the attack, who’s behind it, where it came from, and what the appropriate response is. All of this information is provided in real-time and they use Infiniband to provide microsecond performance.
Rubicon is currently about six months out from being in production with Riak. They are currently using the Riak 2.0 Technical Preview and will launch with Riak 2.0 GA. They are planning to launch with eight nodes and will scale up to 100 nodes to store their petabytes of data at low-latency.
Riak has been a vital toolkit that helps us solve multiple problems, rather than just addressing one block problem,” says Wes. “By using Riak, we are able to take advantage of all the benefits and performance of a reliable key/value store, while continuing to build out our own functionality on top of it. We never need to worry about Riak, which invaluable for our business.”
Sign up for the free insideBIGDATA newsletter.