Notes on data, AI, IT
and security
No marketing fog. The way I think about real problems with founders and managers.
An ETL pipeline is a production line - monitor it accordingly
Why ETL failures are an operational incident, not a technical glitch, and how to build visibility into data flows.
Why feature engineering still matters in the deep learning era
Deep learning automates feature extraction - but it does not remove the need to think carefully about what data you feed into the model.
Who owns data quality in a company that is not a data company
Data quality problems are common. Accountability for them is rare. A look at how to assign ownership without creating a bureaucratic layer that nobody uses.
Streaming data: when you need it and when batch is enough
How to decide whether your company needs streaming data processing, or whether that is unnecessary complexity for tasks that batch loading handles perfectly well.
Data warehouse or data lake: how to make the right call
A breakdown of two architectural approaches to corporate data storage and the criteria that actually matter for mid-size companies.
Five years of big data: what survived and what did not
A retrospective look at the big data wave: which promises were realised, which turned out to be hype, and what from that period is still worth applying today.
A single source of truth for operational reporting
Why most companies lack a single authoritative number, and what it takes to create one - without a large IT project.
A data pipeline is a production system, not a script
Why companies lose trust in their analytics when they treat data pipelines as one-off tasks rather than operated systems.
Data quality: four metrics that are worth tracking in practice
Most data quality programs stall because the metrics are too abstract. Here are four concrete measurements that show up problems early and connect to business outcomes.
PostgreSQL as your main database: what changed for business
Why PostgreSQL stopped being a niche choice and what to verify before making it the foundation of a corporate architecture.
Data lake: questions to ask before you start building
Why data lake projects often turn into data swamps, and what founders and managers should ask before committing budget.
Real-time data: when the business actually needs it, and when it is over-engineered
How to tell which business tasks genuinely need stream processing, and which ones work perfectly well with regular batch updates.