Drowning in the Data Lake: Why a Simple Data Analytics Strategy is Better than None at All

Print Friendly, PDF & Email

Abdul Razack InfosysIn this special guest feature, Abdul Razack, SVP of Platforms, Big Data and Analytics at Infosys, discusses how companies are drowning in data overload and are unable to tap into big data reserves to yield maximum benefits. Abdul heads the Platforms Group at Infosys, focusing on overseeing platforms and reusable components across Services, Big Data, Automation, and the Analytics business. Prior to Infosys, he worked at SAP, as Senior Vice President for Custom Development & Co-Innovation, where he was responsible for delivering unique and differentiating customer-specific solutions. In this capacity, he delivered over 40 innovations based on SAP HANA and Cloud to customers worldwide, across 12 different industry verticals. In a career that spans over two decades, he has been involved in several engineering and consulting roles at Commerce One, Sybase, KPMG Peat Marwick, and SAP. Abdul holds a Master’s degree in Electrical Engineering from Southern Illinois University, and a Bachelor’s degree in Electronics & Communication Engineering from the University of Mysore, India.

The end goal of any big data initiative is to deliver key insights very quickly, if not in real-time. While the first step of gathering data is challenging, today’s technology is more than capable of this. What comes next – extracting accurate insights in real-time and gaining foresight from it – is something enterprises have yet to nail.

When put to good use, data can provide endless opportunities for innovation and growth, saving money and time, while also expediting services. Despite the opportunity to yield big insights from big data, many businesses are struggling with one of two challenges: Those unable to tap their big data reserves and those drowning in data overload.

One of the major reasons is that big data is proving difficult to manage. Terms like “data lake” have been coined to describe repositories for storing relevant data requiring analysis. However, given the rapid accumulation of structured, unstructured and semi-structured data housed in data lakes, they’re more closely resembling a data wasteland for companies without the faintest idea of how to use this information. These companies know they should be able to make decisions in real-time, yet they struggle to integrate traditional and digital methods.

We’re seeing many CIOs and IT teams use the quick fix of mass collecting data in the hope that it will get easier to gather analysis in the future. However, this approach is not only inefficient and expensive, but it can often kill a data analytics project in its tracks. Instead, businesses should start these initiatives the other way around.

First – i.e., before collecting data – businesses should determine what insights they want and need from customers, competitors and allies, prioritizing high business value. This means that instead of trawling through data looking for common links or themes, businesses should be prepared with a strategy to discern the information that is most relevant to them, to effectively maximize their time and money. By applying methodologies like Design Thinking and Agile Development, businesses can ensure that they focus on the right problems and develop highly viable and feasible solutions.

The best way to analyze the gathered information is to build a custom analytics solution, designed to deliver the insights required. While a bespoke platform has historically required significant time and financial investment, the use of open source technology is changing the analytics landscape. Open source big data platforms offer incredible capabilities to build not only insights solutions, but also forecasting and providing predictive analytics solutions through mathematical and statistical models that have the ability to crunch through large volumes of data.

Open source opens up a world of access to the latest technology that would otherwise take extensive time and resources to develop. Where we used to rely on human intervention to process and compute data, open source means companies can effectively codify and automate significant pieces of their operations to make better use of their time and money while extracting better insights. The technology is perfect for enterprises that have a growing expectation of flexibility and faster results. There’s no vendor lock-in and the associated costs are much lower than proprietary solutions.

But while open source throws open immense possibilities, beware of its biggest challenge –assuring security, access control and governance of the data lake. There is also the risk that a poorly managed data lake will end up as an aggregate of data silos in one place. CIOs must caution teams about the need to train lay users in appreciating key nuances – contextual bias in data capture, incomplete nature of datasets, ways to merge and reconcile different data sources, and so on – which is a herculean task in every way.

Though untangling the web of big data may seem to be a daunting task, determining which insights you want to extract at the beginning of the process and quickly building an analytics platform with the flexibility of open source technology, businesses can now access actionable insights and foresights faster, easier and more cost-effectively.

Thanks to open source technology, you can have your data and analyze it too.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind