Data May 25, 2023 2 min read

Operational data and analytics: why they need to be separated

Many companies try to build analytics on top of operational databases. I explain why this creates problems and how to think about the architectural separation.

When a business first starts thinking about analytics, the logic seems clear: the data is already in the system's database. Why not run reports directly from there? It is fast and requires no additional infrastructure.

After a while it turns out that reports are slow, the system lags during peak hours, and a change to the operational system breaks analytics. This is the classic signal of an architectural problem.

Why operational and analytical workloads conflict

An operational system is optimised for writing and reading individual records. A transaction, a CRM entry, an order being processed - these are small, fast operations against a small number of rows.

An analytical query is the opposite. It reads many rows, aggregates, sorts, and joins tables. This is a slow operation from the perspective of an operational database.

Running both types of workload against the same database means letting them interfere with each other. A heavy report slows down the operational system. Active operational use slows down the report. As data and users grow, this becomes increasingly painful.

What happens architecturally when you separate them

Separation means: the operational system continues working as before, and data for analytics is copied or replicated into a separate store optimised for analytical queries.

This store can be simple - a separate database updated on a regular schedule every few hours. Or more sophisticated - a full analytical data warehouse with history and transformations.

The right level of complexity depends on requirements. But the basic principle - separation - applies in most cases.

What this gives you in practice

The first benefit: the operational system stops slowing down because of analytical queries. This alone improves the user experience.

The second benefit: analytics can use a different data structure optimised for reporting. Denormalised tables, aggregates, pre-calculated metrics. This speeds up reports.

The third benefit: it becomes possible to store history. An operational system typically stores current state. An analytical store can preserve the history of changes - this opens up a different class of analysis entirely.

The fourth benefit: changes to the operational system stop immediately breaking analytics. There is a transformation layer that can be updated in a controlled way.

When this matters for you

Signs that separation is needed:

Heavy reports periodically slow down the main system.
Analysts report slow queries even when nobody else is actively using the system.
You need historical analysis, but the operational system only stores current state.
Changes to the operational data schema regularly break something in reports.

If even one of these applies to your situation, separating operational and analytical workloads is worth treating as an architectural priority.

Back to all posts

Contact

Why operational and analytical workloads conflict

What happens architecturally when you separate them

What this gives you in practice

When this matters for you

If this resonated, write to me. I reply personally.