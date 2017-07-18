Back in 2015, I began a series of articles around big data, its key trends and likely future. These articles were published on the popular data management site DATAVERSITY. The general theme was there was a major change afoot in data analytics, initiated by large and inexpensive on-prem and cloud storage which fostered the deluge of data we see today. Due to this tsunami of data and the continued increase in computational power, analytics and machine learning began a unique partnership. In my inaugural column I talked about how this explosion of data, along with the advent of cloud computing and progress in machine learning, would ignite innovation and reinvent business; changing how databases specifically and big data generally, are used. Now, several years later, big data, machine learning, and business almost seem synonymous.

In my last dataversity column I compared and contrasted traditional data warehouse solutions to the up and coming data lake platforms such as Hadoop; asking the question, would they replace or augment traditional architectures. Well since then data lakes, particularly Hadoop, have taken a bit of a hit due to the complexity in designing, building, even hiring to, such solutions. In other words, storing data in a “schema on read” philosophy, is relatively easy and fast (the good part); however, post structuring purposely disjoined, disparate, and often schema-less data has turned the big data analytic dream into a chaotic nightmare. Today it is more common to hear how data lakes have turned into data swamps. To discover and organize what is in your data lakes, to be manually structured, to ultimately be analyzed, has proven darn near impossible (the bad part).

