Data June 21, 2018 3 min read

Five years of big data: what survived and what did not

A retrospective look at the big data wave: which promises were realised, which turned out to be hype, and what from that period is still worth applying today.

From around 2012 to 2013, "big data" became the dominant technology narrative for business. Hadoop, NoSQL, data lakes, predictive analytics - all of it promised a revolution in how companies make decisions. Investment flowed, projects launched, consultants worked.

Now, in mid-2018, it is a good moment to look back without nostalgia and without scepticism - and ask a simple question: what from all of this actually works?

What turned out to be hype

First - the idea that companies should collect "all data" for future use. This produced many expensive data lakes that became swamps: enormous stores of poorly structured data that nobody uses. Storage is cheap - but searching through chaos still costs something.

Second - the expectation that Hadoop would become the universal analytics platform. MapReduce turned out to be too cumbersome for interactive queries. Most companies that deployed Hadoop clusters at the peak of the hype either added Spark on top or gradually migrated to other solutions.

Third - the belief that data volume by itself creates value. More data became available - but the quality of decisions made did not improve automatically. Without well-formed questions and data discipline, large volumes only produce larger confusion.

What actually works

First - distributed computing for large volumes. The idea of horizontal scaling of data processing is real and functional. The tools have become more mature and manageable: Spark instead of MapReduce, managed cloud services instead of self-hosted clusters.

Second - stream processing of data. Processing events in real or near-real time has become standard infrastructure for companies with high transaction frequency - banks, e-commerce, telecoms. This is not hype, it is working infrastructure.

Third - columnar storage for analytics. This approach has completely changed the economics of analytical queries. Queries that previously took hours in a relational database now run in seconds. This is a concrete improvement that is genuinely used.

Fourth - the understanding that data requires engineering. One of the most valuable outcomes of the big data wave is the professionalisation of data work. The roles of data engineer and data analyst stopped being exotic and became normal positions in technology teams.

What to take from this period

For executives evaluating the accumulated experience, several practical conclusions:

Data should be collected for specific purposes, not just in case. This reduces cost and raises quality.

The tool must match the task. Hadoop is needed when data truly runs to petabytes and batch processing is sufficient. For most companies, modern cloud analytics services are a more sensible choice.

Investment in data quality returns more than investment in scale. Companies that spent time on normalisation, cataloguing and data governance got significantly more return than those who simply increased storage volumes.

A data lake works if it has an owner. A technical solution without a governance process is a future swamp.

The big data wave is neither a failure nor a success. It is a normal technology maturation cycle: inflated expectations, disappointment, and then real application for specific tasks. We are now at the third stage.

Back to all posts

Contact

What turned out to be hype

What actually works

What to take from this period

If this resonated, write to me. I reply personally.