Notes on data, AI, IT
and security
No marketing fog. The way I think about real problems with founders and managers.
A data catalog: the discipline of knowing what you have
Why metadata management is not a technical project but an operational necessity for companies that work with data seriously.
A data lake without governance becomes a swamp
Why corporate data lake projects often end up as a file store nobody knows how to use.
Real-time data and right-time data: the difference and why it matters
Not every task requires real-time data. Getting this choice wrong costs money and complicates architecture without benefit.
Who owns the data pipeline when the answer is nobody
In most companies data pipelines are built by whoever needed the data, owned by nobody, and relied upon by everyone. That is a systemic fragility, not a technical problem.
What Pokemon Go's outages teach about location data at scale
Pokemon Go is not a business application, but its infrastructure story in the summer of 2016 is a real lesson in what location data at scale actually costs.
PostgreSQL JSONB: when you do not need a separate NoSQL database
Before adding MongoDB or another document store to your stack, it is worth checking what PostgreSQL's JSONB type can already do - and where it genuinely runs out.
Kafka as a data backbone: what it means for a company
Apache Kafka is no longer only a tool for large tech companies. How to explain its role without technical jargon.
Event log as source of truth: the business case
Event sourcing is not just an architecture pattern. It is a way to preserve the history of changes and give analytics an honest foundation.
Data collection in field operations: from paper to a structured flow
Companies with field teams lose data at the collection stage. I look at how to move from paper forms and spreadsheets to a managed process.
Event streaming: when should a business actually look at this
Apache Kafka and streaming architecture are not only for internet giants. I look at which business problems justify this approach and where it is overkill.
Why structuring data must come before any ML model
Before the conversation reaches algorithm selection, you need to establish whether there is data worth learning from. I walk through that step in detail.
Streaming data: when operational decisions cannot wait for a batch
When a business needs streaming instead of batch processing, and what needs to be decided before adopting Kafka or similar tools.