In this special guest feature, Michael Bushong from Plexxi writes about how new requirements in big data have spawned the need for a next generation of “data fabrics.” Michael Bushong is Vice President of Marketing at Plexxi.
In the data center, we are seeing applications move to a scale out model – lots and lots of nodes with distributed operation. This is a direct result of large increases in the amount of data required for certain types of calculations and analysis. Data is broken down into smaller and smaller chunks, and is thus driving compute and storage to scale out as well. As nodes across the data center are scaled out to meet the influx of data, it increases the need for the communication between them. These new requirements call for a next generation of Big Data fabrics. To be effective, these Big Data fabrics require six different features: availability, consistency, congestion, partitioning, scalability and awareness.
The most basic requirement for any application is that it works. Without an interconnect to allow the coordination of distributed workloads, a Big Data application simply cannot function. As a result, the top need for any Big Data fabric is to keep the network up and running. There is an array of factors that contribute to network availability: device uptime, maintenance domains, human error. But as it relates to a Big Data fabric, the two most important characteristics are how the network behaves when there are failures, and how quickly those failures are mitigated.
This means that Big Data fabrics now require rich multi-pathing capabilities with millisecond failover when issues occur. It is critical that failures do not create resource islands that are largely inaccessible, as that can render entire data sets and operations useless.
Not all Big Data applications are sensitive to network latency – often the major factor behind delays is on the compute side. However, these applications are generally synchronous, meaning that consistency of experience is paramount. This is because variations in network performance can lead to negative impacts on applications. As a result, Big Data fabrics need to provide uniform performance across space and time.
Due to the obvious nature of Big Data, there are inevitably periods of heavy traffic that weigh on the fabric. The issue with these periods of congestion on the Big Data fabric is not just that traffic gets delayed or dropped, but rather that delays can trigger retransmissions, further exacerbating problems. Big Data fabrics need to avoid congestion whenever possible, and provide a means to protect critical traffic when congestion is unavoidable.
Partitioning the network allows application traffic to be separated from residual traffic, protecting both the general transport and the application. Furthermore, Big Data fabrics often require isolation for compliance reasons. In both cases, physical or logical network partitioning is a key requirement.
The ability for a Big Data fabric to scale is paramount. However, there are two distinct aspects of scaling that are important in relation to Big Data fabrics. To start, the fabric needs to be able to scale to support potentially thousands of data and storage nodes. Secondly, and perhaps more pivotally, the path to scalability needs to be simple. Most Big Data deployments begin small and grow as needs increase. Big Data fabrics need to handle growth without requiring massive physical or control re-architecture efforts.
Not all Big Data applications have the same requirements. For example, some might be more bandwidth heavy, while others might be more sensitive to latency. Whatever the requirement, a Big Data fabric must have the ability to distinguish between applications and handle workloads differently.
While the rise of Big Data has presented us with unprecedented opportunities, it has also unearthed its fair share of challenges as well. As a result, we need to strive to find a Big Data fabric that has the qualities necessary to meet these new requirements, specifically the ability to drive increased communication between the growing number of nodes in the data center. Once we can embrace this type of innovation in the data center, we can truly embrace everything Big Data has to offer.
Sign up for the free insideBIGDATA newsletter.