m@ksim.pro
Back to all posts
Data 4 min read

ETL as a production line: where queues, stoppages, and grey operations appear

Translating data integration into the language of manufacturing - so a manager can see bottlenecks in process logic, not in code.

ETL stands for Extract, Transform, Load: pull data from sources, reshape it into the needed format, and load it into the destination system. Technologists understand what that means. Managers usually do not - and as a result they cannot see where time and quality are actually being lost.

I find a different frame more useful. Data integration is a production line. Raw material arrives from several warehouses, passes through a series of processing stations, and becomes a finished product. Bottlenecks, stoppages, and defects exist here in exactly the same way they do on a shop floor.

Extraction is raw material delivery

The first stage is getting data from its sources. That might be a database, a file, an API from an external system, an export from an ERP, or any other company system.

The problems here look like supplier problems:

  • the source is unavailable at the needed time - the system is in maintenance, the API is down;
  • the delivery arrives late or irregularly;
  • the volume of data shifts unpredictably - almost nothing one day, several times the normal amount the next;
  • the data format changes on the source side without warning.

If extraction is unstable, everything downstream in the chain runs with interruptions.

Transformation is the production operations

The most labor-intensive stage. Data needs to be unified in format, joined from multiple sources, deduplicated, cleaned of errors, and run through business rules. The upstream cause of most transformation failures is dirty source data - a problem I looked at separately in Data quality before analytics.

The manufacturing parallels are assembly, finishing, quality control. And just as on the floor:

  • operations can depend on previous ones - you cannot assemble the product until all components have arrived;
  • one slow operation stalls the entire conveyor;
  • defects that slip through here end up in the finished product.

A typical grey zone: transformations that "get done manually just in case." Someone tidies up data in Excel after an automated step, because once something went wrong there. That is a manual operation in the middle of an automated line. It is slow, opaque, and a potential source of errors.

Loading is finished-goods delivery

The final step - data arrives where it is needed: an analytics database, a report, a dashboard, the next system in the chain.

Problems here:

  • the destination system is overloaded and accepts data slowly;
  • the load runs at the moment the report is already needed - too late;
  • data loads partially because of errors in earlier stages, but the report looks complete.

The last one is especially dangerous. An empty report is immediately visible. A report with partial data that looks complete is a silent defect.

Where bottlenecks tend to hide

In practice, problems concentrate in a few places:

Manual interventions. Any step where a person does something by hand in the middle of the process is a potential stoppage and a source of instability.

Heterogeneous sources. When data arrives from systems that were never designed to work together, transformation logic becomes complex and brittle.

No buffer. In manufacturing, there is usually a buffer between operations. In ETL, queues serve that role. Without them, a failure at one stage immediately brings down the whole chain.

Undocumented dependencies. Data from report A feeds calculation B, which feeds system C - but nobody documented this. When A changes, C breaks, and nobody understands why.

Simple questions for an audit

To find the bottlenecks in your data integration, start with these:

  1. How many manual steps exist in the data path from source to report?
  2. How do you find out when data has not loaded, or has loaded incorrectly?
  3. What happens if one of the sources does not respond tonight?
  4. Is there data whose logic "lives" in one person's head or their Excel file?
  5. When was the last time someone on the team could explain the full data path from start to finish?

ETL rarely breaks in a dramatic way. It degrades gradually - more manual work, slower refresh cycles, quiet errors. The production language helps you see this before it becomes a crisis.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp