The Data Lake Awakening
There is a great disturbance in the Force. I have felt it…Ok, this is a bit tongue and cheek, but during the Big Data Strata Conference in New York this past September, the energy around data lakes was electric. The news coming from the event was the strongest indicator that the future is NOW. Though analysts have been slow to cover this paradigm shift in data management, the tremors are echoing and accelerating. The Data Warehouse anti pattern — of building data lake solutions as the core engine to business insights — in my opinion, will foster the next generation of dynamic and dominant companies.
I’m a big fan and advocate for data lakes. In some ways, I have been promoting and selling the data lake idea for years, hoping someone would take notice. About three years ago, I began writing a series of articles around the topic, starting with Big Data Doctrine: Warehouse vs. Data Lakes and recently wrote a blog on Data Lakes Reimagined. When colleagues would ask what the future might hold or what I plan to do next, the conversations would quickly jump to data analytics, data lake philosophy of ‘schema on read’, and the huge potential it would unleash – the Next Big Data Revolution.
Now, I am sure you are asking, how is this different than the big data movement we’ve been hearing so much about since 2000? The answer is simple, the first Big Data revolution was based on scaling database capacity. In other words, taking the new information age data and “shoving” it into structured and semi-structured databases for analysis. And when a database was at capacity, doing it again.
The issue with the first revolution was that not all company data could be stored; and if it was, this shoving was time consuming and costly. In the world of data analytics, shoving refers to the process of transforming data to fit into database structure (schema on write). However, it is often the case where transformation causes loss of data. Each transformation moves further from the raw truth, ultimately resulting in analytical errors. And with today’s ever more diverse data, this problem has gotten worst. Therefore, the average company has been left out of the big data analytics race and has made the data engineer one of the most sought after and valuable assets within an organization.
Read the entire article here, The Data Lake Awakening – Chaos Sumo
via the fine folks at Chaos Sumo