A data warehouse without a data team
How a small company can build a manageable data warehouse without hiring a BI department or buying an expensive platform.
When people say "data warehouse", the picture that forms is usually a two-year corporate IT project and a team of ten. For a mid-size business that sounds like "not for us." But the absence of a warehouse has a concrete price - data lives in different systems, every report is assembled by hand each time, and answering "how much did we earn in that segment last year" takes a week.
I want to describe a more modest version - one that a small team can actually build and that still solves most practical problems.
What you actually need from a warehouse
Before thinking about technology, it is worth answering why you need it at all. For most mid-size companies the answer comes down to a few practical things.
You need one place where data from different sources - CRM, accounting system, website - comes together into a coherent picture. You need history to survive when systems get updated. You need an analyst to be able to answer a manager's question without two days of Excel archaeology.
None of that requires a platform scaled for a major bank. It requires discipline and a few simple decisions.
Where a working warehouse begins
The first step is an inventory of sources. What do you have: CRM, ERP, payment processor, ad platforms, accounting exports. For each source - who owns it, how often the data changes, whether you can connect to it automatically or only manually.
The second step is choosing a destination. This can be PostgreSQL on a rented server, Google BigQuery, or any other database you can write to from multiple sources and query from - Amazon Redshift changed the economics of this choice considerably when it launched in late 2012. At the start the goal is not to pick the perfect solution but to pick something the team can actually maintain.
The third step is setting up regular loading from at least two or three key sources. Even a script that runs once a day already constitutes a warehouse that works.
Where most projects go wrong
The most common mistake is starting with a tool instead of a problem. A company buys an expensive BI platform, spends six months implementing it, and gets polished dashboards nobody uses because the data in them is no more accurate than what was in Excel before.
The second mistake is trying to do everything at once. A warehouse that covers all sources from day one takes years to build and often never finishes. A working prototype from three sources in a month is worth more than a perfect design in a year.
The third mistake is not assigning an owner. If nobody is specifically responsible for the warehouse, in three months it becomes one more source of inconsistent data.
What to do when resources are tight
If there is no budget for a platform and no dedicated person - start small and explicit. Document your sources in one place. Agree who exports data from each and how often. Pick one common storage format. Assign one person who tracks freshness.
That is not a warehouse in the technical sense. But it is the discipline without which no technology will help.
Questions to check your readiness
Before investing in tools, answer five questions:
- Which decisions are you currently slowed down on by missing data?
- Which systems hold the data you need, and who controls them?
- Who on the team can own the technical maintenance?
- How often do you need updates - is once a day enough, or do you need near real-time?
- What will you count as success in three months?
If those questions have clear answers, you are ready to talk about technology. If not, start with the answers rather than platform demos.