Large Austrian retailer MPREIS has long been using QlikView for business analytics. When the data got too large, the company turned to ParStream to provide immediate query response on billions of records for QlikView’s Direct Discovery. Now the retailer is able to view aggregated and highly detailed information in the same dashboard in real-time. In this use case examination, we’ll take a look at the problem the company faced and the steps taken toward a solution.
One of the biggest retailers in Austria with more than 200 markets and over 5,000 employees has been using QlikView extensively over the past three years for sales and other business analytics for all of its sites. A large amount of data is continuously collected at an individual transaction level for analysis. Large numbers of queries, both simple and complex, are run regularly to inform management about all aspects of the
business, from general sales numbers to detailed analysis such as when each item of each product was sold in each store at what time of the year. This information is useful for procurement negotiations (volume discounts, logistics, etc.), inventory management, and supply chain management as well as for P&L management.
Recently, the retailer began experiencing performance issues due to growing data volumes. They reached a performance threshold of 400 million to 500 million rows with their current solution, reducing their import capability to a level where they could only load aggregated sales data on a yearly/monthly basis into memory for analysis. To compound the issue, this system has up to 300 users, with 40-50 users accessing the system concurrently. The need for a more detailed analysis on individual store sales transactional data on a daily basis (sometime hourly) drove the decision to evaluate solutions that would allow them to continue using the advantages of QlikView while providing them the ability to work with massive data volumes for high-performance analysis.
With the introduction of the Direct Discovery feature in the current QlikView release, ParStream and QlikView were able to combine In-Memory data analysis from QlikView with On-Demand data analysis from ParStream. The two technologies, the QlikView in-memory engine and ParStream Real-Time Database, are complementary since they perform well in different situations:
- For QlikView, the size of the base fact table does matter. Response times increase with the increase in size of the data set.
- ParStream scales linearly due to the nature of the MPP architecture. Adding additional cores will maintain constant or improve response time, even with massive data sets.
The retailer will still load their aggregated data on a yearly/monthly basis into QlikView and they will use their dashboards and applications in the same way as in the past. In addition, loading data on a daily (and at a later stage continuous) basis into ParStream will enable the user to perform KPI and Sales Data analysis interactively on a very granular level over arbitrary time frames. Together with the retailer ParStream tested the solution on three years of POS data. This data was loaded into a hosted ParStream Cluster with three server nodes, and each server stored one specific year worth of data. Subsequently, a QlikView Instance running on AWS accessed the cluster.
6 billion rows of transactional, daily sales data for retail stores were generated. The data were loaded into ParStream in less than 3 hours. A dashboard solution for Direct Discovery data was built; this solution integrates with the existing In-Memory-driven dashboards. The combination allows the user to seamlessly navigate between the In-Memory- and ParStream-driven dashboards without noticing any difference. This unified dashboard solution approach, driven by QlikView and ParStream, improves the analytical capabilities tremendously and removes
scalability limitations.For the analysis of data residing outside of QlikView, Direct Discovery formulates the according queries for ParStream.
The response times are similar to the current QlikView Dashboards, but they are now visualizing detail data stored in ParStream. Depending on the complexity and granularity level of the query, response times up to 5 seconds were acceptable. The sample dashboard (see screenshot below) displayed Monthly Revenue, Revenue per Store, Revenue per Product and Daily Revenue per Store. Each change in a filter criterion issues 4 queries on the 6 billion rows of data in ParStream. Below are example response times to fully reload the dashboard after selecting certain filter
- 3 weeks, all stores and 1 department 1,2 sec
- Full time range, 1 store and 1 department 0,9 sec
- 3 weeks, 5 stores and 1 department 0,7 sec
- 4 days, 5 stores and 1 department 0,45 sec
Sign up for the free insideBIGDATA newsletter.