Notes on data, AI, IT
and security
No marketing fog. The way I think about real problems with founders and managers.
A data warehouse without a data team
How a small company can build a manageable data warehouse without hiring a BI department or buying an expensive platform.
Is MapReduce enough: where the batch model starts hurting business
Some business scenarios already demand a different computation tempo. Not because batch is bad, but because the latency has become too expensive.
Spark, Hadoop, or MPP: choose the workload type, not the brand
Different tasks need different computation and storage modes. Getting the platform wrong costs more than it appears at the start.
Technical debt in data pipelines: why \"we will rewrite it later\" almost never happens
Data pipelines age faster than they appear to, especially when nobody owns the schema. Deferring the refactor has a concrete cost.
KPI worth fighting for: how to tell signal from dashboard decoration
A KPI only matters if it can change a decision. Everything else is decoration on a dashboard.
The event log as source of truth: why event-driven beats point-to-point integration
How shifting from bilateral API calls to a shared event journal simplifies system coupling and removes the fragility of double-writes.
Experiments and A/B thinking: what digital products can teach everyone else
Not every decision needs a year-long project. Some of them should be tested with fast, cheap experiments - even in non-digital environments.
Data scientist, analyst, engineer: it is time to tell the roles apart
Why 'data person' is no longer a precise enough role, and how blurred expectations sink teams before work even begins.
ODS, data marts, DWH: why you cannot build an analytics landscape with one abbreviation
Each layer of an analytics architecture solves a different problem, operates at a different speed, and needs its own owner. Mixing the layers is the source of most analytics problems.
Streaming vs batch: when real-time is justified and when nightly batch is the honest choice
Before choosing a data processing architecture, it is worth honestly calculating the cost of latency - and finding out whether the business actually pays for it.
Sales, service, logistics: where predictive analytics earns its keep
An accurate model is only the beginning. Profit appears where there is an action scenario and a threshold that triggers it.
Processing speed as an argument: what faster data computation changes
Data engineering is getting a new balance between batch and faster processing - and that changes which problems become solvable at all.