Personal data: map the flows before adding controls
Why protecting personal data starts not with encryption or policies, but with understanding what data the company actually collects and why.
When a company starts a conversation about personal data protection, the first impulse is almost always the same: get better antivirus, tighten access controls, write a policy. None of those are bad ideas. But they look like installing better locks on a house where no one has agreed on which rooms exist.
I have seen companies spend real resources on technical controls without being able to answer a simple question: where does our customer data actually live? When you ask, you usually get several answers that do not agree with each other, and at least one location nobody named upfront.
What a data flow map is
A data flow map is not a pretty diagram for a deck. It is a working document that answers four questions:
- What personal data does the company collect, and at which points?
- Where is it stored - in which systems, on which servers, in whose spreadsheets?
- Who has access to it, and on what basis?
- Where does it go - inside the company, to contractors, to partners?
Without answers to those questions, protection becomes accidental. You protect what you can see and miss what is sitting in a forgotten export on someone's desktop, or in a spreadsheet an analyst "quickly put together three years ago".
Why there is always more data than expected
In most mid-size companies, personal data is collected at several independent points that have never been compared. CRM holds customer contacts. The website captures forms. HR keeps employee records. The service department runs its own tracker. Marketing works from a mailing list.
Each of these grew separately. Someone exported data for analysis and the export stayed on the file server. A contractor asked for a call list and it was sent by email. Everything was done out of practical necessity and without bad intent. But the result is scattered copies of data, some of which nobody remembers.
A flow map makes this picture visible. That is exactly why building it is the first step, not the technical controls. The logic is the same as in Data quality before analytics: you cannot manage what you have not inventoried.
Why the data was collected in the first place
The second question worth asking in parallel is: why? Not in the legal sense, but in the practical sense - is this data actually used?
If personal data is collected "just in case" or "because we always have", that is a liability, not an asset. Data that is stored but not used carries responsibility without benefit. It can be leaked, it must be protected, it can create legal exposure - while generating no value.
A useful exercise is to ask, for each category of data: if we stopped collecting this tomorrow, what specifically would break? If the answer is "nothing concrete", it is worth discussing whether the data is needed at all.
How this connects to real protection
Once the flow map exists, the conversation about controls becomes specific. You can see where data is most exposed - not in theory, but in concrete systems and processes. You can see where access is excessive. You can see which transfers happen without a formal basis.
Only then does it make sense to place technical controls: who needs access restricted, what should be encrypted, what needs to go through legal review.
A practical start
If no such map exists, you do not need a large project to begin. Gather the heads of the key departments and work through a short list of questions together:
- What data about people does your team collect or use?
- Where is it stored right now?
- Who else has access to it?
- Where does it go outside your team?
- Are there exports, copies, or archives that you know of?
The first session almost always produces unexpected findings. That is the real starting point for personal data work - not a policy, not encryption, but understanding what actually exists.