m@ksim.pro
Blog

Notes on data, AI, IT and security

No marketing fog. The way I think about real problems with founders and managers.

Data

A data catalog: the discipline of knowing what you have

Why metadata management is not a technical project but an operational necessity for companies that work with data seriously.

Read
Data

A data lake without governance becomes a swamp

Why corporate data lake projects often end up as a file store nobody knows how to use.

Read
Data

Real-time data and right-time data: the difference and why it matters

Not every task requires real-time data. Getting this choice wrong costs money and complicates architecture without benefit.

Read
Data

Who owns the data pipeline when the answer is nobody

In most companies data pipelines are built by whoever needed the data, owned by nobody, and relied upon by everyone. That is a systemic fragility, not a technical problem.

Read
Data

What Pokemon Go's outages teach about location data at scale

Pokemon Go is not a business application, but its infrastructure story in the summer of 2016 is a real lesson in what location data at scale actually costs.

Read
Data

PostgreSQL JSONB: when you do not need a separate NoSQL database

Before adding MongoDB or another document store to your stack, it is worth checking what PostgreSQL's JSONB type can already do - and where it genuinely runs out.

Read
Data

Kafka as a data backbone: what it means for a company

Apache Kafka is no longer only a tool for large tech companies. How to explain its role without technical jargon.

Read
Data

Event log as source of truth: the business case

Event sourcing is not just an architecture pattern. It is a way to preserve the history of changes and give analytics an honest foundation.

Read
Data

Data collection in field operations: from paper to a structured flow

Companies with field teams lose data at the collection stage. I look at how to move from paper forms and spreadsheets to a managed process.

Read
Data

Event streaming: when should a business actually look at this

Apache Kafka and streaming architecture are not only for internet giants. I look at which business problems justify this approach and where it is overkill.

Read
Data

Why structuring data must come before any ML model

Before the conversation reaches algorithm selection, you need to establish whether there is data worth learning from. I walk through that step in detail.

Read
Data

Streaming data: when operational decisions cannot wait for a batch

When a business needs streaming instead of batch processing, and what needs to be decided before adopting Kafka or similar tools.

Read