AI April 8, 2019 3 min read

The real cost of an NLP pipeline before you are sold by the demo

What actually requires ongoing support in a production NLP system - from data labelling to quality control in live operation.

An NLP demo looks convincing. The model reads text, finds entities, classifies, answers questions. It feels like the main work - the algorithm - is already done, and what remains is to connect it to your data and turn it on.

That is exactly the moment when most cost estimates diverge from reality by a factor of three to five.

I am not talking about bad vendors or naive clients. I am talking about a systematic underestimation of what sits behind a production NLP system.

What the demo does not show

The demo shows the model on prepared examples. Several things stay out of frame.

Data labelling. Training and quality evaluation require examples labelled by humans. The more domain-specific the task - legal texts, medical documents, internal terminology - the more complex and expensive the labelling. This is not a one-time task: when requirements change or new patterns emerge, labelling must be updated.

Preprocessing. Real texts arrive in various formats, with typos, non-standard abbreviations, mixed languages. Cleaning and normalisation is a separate layer of work that often takes more time than the model itself.

Model version management. When the model is updated, you need to verify that the new version is not worse than the old one on the categories that matter to the business. This requires test sets and evaluation procedures - ongoing, not one-off.

Quality monitoring in production. The texts that arrive in the real world differ from those the model was trained on. This is called data drift. Without monitoring you may not notice that quality has dropped - sometimes for weeks.

Pipeline architecture, not just the model

The model is one component. A production NLP pipeline typically includes:

an input preprocessor: receiving text, normalisation, splitting into required units;
the model core itself with version management;
a postprocessor: translating model output into business objects;
logging: recording inputs and outputs for later analysis;
monitoring: tracking latency, throughput, and quality;
an update process: who updates the model when quality degrades, when, and how.

Each of these components requires development, testing, and maintenance. The cost of maintaining them often exceeds the cost of the original model development.

When NLP makes sense and when it does not

NLP makes sense when the task scales - hundreds or thousands of documents per day, and manual processing is either impossible or economically unjustifiable. If the volume is small, a well-organised manual process is often simpler and cheaper.

NLP makes sense when accuracy requirements are measurable and acceptable. If 15% errors are a disaster, you need to know whether the needed threshold is achievable for your specific task. Not "does NLP work in general" but "does it reach the needed quality on your actual data."

NLP requires long-term support. If the company does not have a person or team ready to maintain the pipeline - including updating labels and monitoring quality - this needs to be in the project economics from the very beginning.

Questions before starting the project

Where will the labelled training set come from, and who will keep it updated?
What defines success - which quality metrics, on which test set?
Who will own the pipeline in production?
How will quality degradation be detected?
What is the economics: how much does maintaining the pipeline cost per year relative to the value it creates?

If there are no answers, the value of the demo is correctly understood as the value of a demo - not as the value of a production system.

Back to all posts

Contact

What the demo does not show

Pipeline architecture, not just the model

When NLP makes sense and when it does not

Questions before starting the project

If this resonated, write to me. I reply personally.