m@ksim.pro
Back to all posts
Data 4 min read

Columnar storage and a new pace of analytics: why a months-long data warehouse build is already strange

When analytics can be up and running in days rather than months, both business expectations and the right way to design storage change.

The traditional approach to building a corporate data warehouse goes roughly like this: several months to design the schema, then ETL processes, then alignment with the business, then rework, then launch. Throughout all of this the business either waits or keeps working in Excel.

I am not saying that approach was bad - it matched what the tools of the time could do. But when the same goals can be reached in days rather than quarters, continuing to work the old way is no longer caution. It is falling behind. The question of which storage architecture fits an analytics workload - and when a Hadoop cluster makes sense versus a conventional data warehouse - matters before evaluating columnar options.

What changed with columnar storage

Classic relational databases store data row by row: everything about a customer or transaction sits together. This works well for operational tasks - find a record, update it, insert a new one.

Analytics works differently: it reads a few columns across millions of rows. With row-based storage that means reading everything off disk - including the parts you do not need. With columnar storage only the relevant columns are read. The speed difference at scale is an order of magnitude.

The practical effect: analytical queries that used to run for hours start returning results in minutes or seconds. That is not just convenience - it changes which questions can be explored interactively at all.

How business expectations shift

When an analyst can get an answer in seconds rather than overnight, the nature of the work changes: it becomes possible to test hypotheses in real time instead of waiting for the next day. A manager who has once used a fast dashboard does not want to go back to monthly reports.

This creates pressure on IT teams and data architects. "We need analytics on this product" no longer sounds like a six-month project request. It sounds like a request for next week. And if the infrastructure allows it, that is realistic.

What this means for design

The speed of columnar storage lowers the cost of getting the schema wrong. Classical data warehouses demanded careful "years ahead" design because rebuilding was expensive. More flexible storage allows iteration.

That does not eliminate the need to think about architecture. But it shifts the balance: it is better to launch a working schema quickly and improve it as you learn, than to spend six months designing the perfect one and never ship.

The key design questions shift:

  • What queries will be most frequent - design for those first.
  • How much data actually needs to be in the "hot" layer versus archived.
  • What the load pattern looks like - peak or steady.
  • Who writes the queries: analysts who need SQL, or tools that need an API.

Where the traps are

The speed of a columnar store does not mean you can skip thinking about the data model. A few common problems:

Denormalization without limits - fact tables with hundreds of columns where everything is dumped in for "convenience." A year later nobody knows what is in which field.

No layering - raw data, data marts, and aggregates mixed together without separation. Result: unclear what the source of truth is.

Ignoring load costs - a fast query does not offset a slow and brittle load process. If the ETL breaks once a week, read speed is irrelevant.

A practical question for managers

If analytics in your company runs once a month, it is worth asking: is that because the data only updates monthly, or because the infrastructure cannot handle more frequent runs?

If it is the second - that is not a technical question. It is a question about which decisions you are making with a month's delay when you could be making them faster.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp