A new book “Data Lake Architecture – Designing the Data Lake and Avoiding the Garbage Dump” by the father of the data warehouse Bill Inmon is a simple, high-level introduction to this popular data organization. Written for enterprise thought-leaders and decision makers, the book offers a one-stop resource that explains how to build a useful data lake where data scientists and data analysts can solve business challenges and identify new business opportunities. Readers will learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. You’ll understand the role of the raw data pond and when to use an archival data pond and also leverage the four key ingredients for data lake success: metadata, integration mapping, context, and meta-process.
Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called “data lakes.” But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps. This book will put you on the right path. It is not a practitioner’s book, but rather more for someone who needs to understand the concepts and best practices. This is a good book if you want to understand how to organize Hadoop data. It doesn’t tell you how to do it but rather how it should be done architecturally.
One of my favorite chapters of the book was Chapter 4 “Data Ponds” because it sets the stage for Inmon’s unique take on how we can view data lakes as a collection of high-level structures like the “Raw Data Pond” and others. Chapter 5-8 continue with this theme by exploring the various kinds of data ponds. Chapter 9 offers a comparison of the different classes of data ponds. At only 156 pages, the book is often too brief in its discussions on important topics that affect big data decision making. I continued to find myself wanting more.
Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture. Bill Inmon – the “father of data warehouse” – has written 57 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional data base so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession.
Inmon’s new title serves as a good, simple and concise resource to kick-start a new enterprise data lake project.
Contributed by: Daniel D. Gutierrez, Managing Editor of insideBIGDATA. He is also a practicing data scientist through his consultancy AMULET Analytics.
Sign up for the free insideBIGDATA newsletter.