Real-Time Big Data Analytics: Platform Requirements and Architecture Best Practice

Print Friendly, PDF & Email

In this special guest feature, Elaine Wang, CEO of MCoreLab, discusses the key platform requirements and architectural considerations for real-time big data analytics, and provide insights from best practices. Elaine has extensive experience as software architect, as well as extensive experience in helping clients achieve their business visions through system innovation. MCoreLab provides ultra high performance server software solutions to enterprise and cloud. MCoreCloud application server provides 100x lower latency and order of magnitude higher throughput and scalability compared to existing application servers. MCoreCloud in-memory data and cache server provides order of magnitude higher throughput and lower latency than existing memcached servers. MCoreLab also provides high performance low latency network stack to deliver real network performance to real applications.

Real-time big-data analytics is a fast-emerging trend for leading-edge business systems. The business benefits of real-time big-data analysis can be  significant. With real-time data analysis, systems make decisions and act on events at real time, which provides business value and advantages that  cannot be achieved with traditional big data projects. For example, gain competitive advantages through taking actions on opportunities in real-time, improve operational efficiency through real-time decisions, and stay in front of potential errors and downtime.

Because real-time big-data analytics takes real-time streaming data and events as input, and the volume and velocity of such real-time data and events can be very high, these projects present a new set of challenges.

In this article, we will discuss the key platform requirements and architectural considerations, and provide insights from best practices.

#1: Capacity and Velocity of Front-End Servers

Traditional big-data projects often act on static data that is stored at one or more locations. Real-time big-data analytics, however, take input from data sources in real-time. That means a critical piece of infrastructure you need for real-time big data analytics system is high throughput and highly  scalable front-end servers. The front-end server cluster must have the ability to receive and ingest data and events delivered at high velocity. And the  front-end servers also often need to have the ability to stream data at high velocity outward to the clients as well.

If data comes from the cloud, a high-performance application server that speaks HTTP or Websocket probably will be required. If data comes from local/secured network, a high performance TCP server would do.

While the platform requirements obviously will differ depending on the organization’s business case, capacity and velocity of the front-end servers are critically important.

#2: Real Request-Processing Capacity, Not Just Bandwidth

The data characteristics of these projects are quite different from those of traditional big data projects.

In traditional big data projects, large amounts of data is fetched from storage and analyzed in batchs. The amount of data per request is large, while  the number of requests per second is relatively low. Therefore, for traditional big data, the raw network bandwidth in Gbits counts, and you can get  away with much lower real processing capacity, both in your network and server infrastructure.

In contrast, with real-time data source input, the data-size per request is often small. (Large data tends to result from an aggregate of events over  time, rather than an event in real time.) The number of requests, however, can be order of magnitude higher, if not several orders of magnitude higher.

That means both your servers and network need to be prepared to handle much higher number of requests per second, as well as much higher  number of packets flying over the network. In this case, your raw network bandwidth (in Gbits) will be far less important than the real capacity of your  servers and network to process very high numbers of requests and packets.

#3: Don’t Underestimate The Need for Speed

Automated clients and other real-time data sources can generate data and requests far faster than the capacity of existing Web or cloud-based systems  that handle human interactions. A human might interact with a system several times an hour. An IoT machine client, however, can generate  hundreds or thousands of interactions every second.

At the same time, response times need to be orders-of-magnitude lower. Automated clients and real-time data sources demand microsecond response  times and are far less tolerant of long latencies.

Last but not least, servers not only need to keep up, they also need to handle large number of concurrent clients while keeping up with the high-velocity workload.

Sum it Up – Best Practice:

The introduction of real-time big-data analytics requires a fresh look at your platform choices and infrastructure capacity. To build a great real-time big data analytics system, you need:

  1. At the front-end, high performance, low latency, and highly scalable servers, that can keep up with high-velocity data and events from a large number of concurrently active clients.
  2. At the analytics/business logic tier, high-performance data server and middleware that can keep up with real-time data, in both latency and throughput.
  3. High-performance network infrastructure – look for real packet handling capacity and low latency, in addition to bandwidth (Gbit) specifications.

 

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*