m@ksim.pro
Back to all posts
AI 4 min read

NLP text classification as a practical enterprise baseline

Before the deep learning wave reshaped NLP, classical text classification already solved real problems. What it does well, where it stops, and how to start.

There is a lot of noise right now about deep learning and what it will do to natural language processing. The noise is partly justified - the results on benchmarks are impressive. But most enterprises are not running research benchmarks. They have specific, bounded text problems that they are solving today with keyword rules, manual routing, or nothing at all.

For those problems, classical text classification - the kind that does not require a GPU, five engineers, and six months - is still the right starting point.

What text classification actually is

The task is simple: given a text, assign it to one of a fixed set of categories. Support ticket routing. Invoice line-item classification. Email triage. Regulatory document tagging. Complaint categorisation. Customer feedback bucketing.

These tasks have been technically solvable for years using approaches like naive Bayes, logistic regression, and support vector machines on top of TF-IDF or bag-of-words representations. They are not fashionable, but they work reliably when the category set is stable and you have a few hundred labelled examples per class.

What you actually need to start

The honest answer is less than most people expect:

  • a labelled dataset of a few hundred to a few thousand examples;
  • a clear definition of the category set that does not shift week to week;
  • someone who can write Python or R for an afternoon;
  • a way to evaluate accuracy on a held-out test set.

That is the full list. No special infrastructure, no GPU, no large team. The first working prototype can exist within a week if the labelled data is available.

The labelled data is almost always the bottleneck. Building it is slow, manual, and requires domain experts to make judgment calls. This is where the real effort goes, not into the model.

Where it stops working

Classical text classification becomes unreliable when:

  • categories have fuzzy, overlapping boundaries that require reading between the lines;
  • the language is highly informal, abbreviated, or domain-specific in ways that are not in the training data;
  • you need to understand meaning across sentences rather than just keywords within them;
  • the category set is very large (hundreds of classes) and many categories have very few examples.

For these situations, the more sophisticated approaches - dense embeddings, sequence models, pre-trained language representations - genuinely help. But those are upgrades to make once the baseline is working and you understand where it falls short. Not the starting point.

The evaluation mistake to avoid

The most common mistake in enterprise text classification projects is evaluating only on overall accuracy. A model that classifies 94% of tickets correctly sounds good until you notice that 5% of one specific category - say, urgent escalations - is being routed to the wrong queue, and those are the tickets that matter most.

Always look at precision and recall per class. Pay special attention to the classes where a mistake is expensive. If necessary, tune the classification threshold for high-stakes classes separately.

Practical recommendation

If you have a text routing or categorisation problem today that is being handled manually or with keyword rules, the right move is to:

  1. Spend two to three weeks building a labelled dataset with domain experts.
  2. Train a logistic regression or gradient boosting classifier on TF-IDF features.
  3. Evaluate per-class, identify the failure modes, and decide whether they are acceptable or whether a more complex approach is needed.

In most cases, step three reveals that the baseline is good enough for most classes and that the remaining errors are in ambiguous cases that even humans disagree on. That is a useful finding, not a failure.

Deep learning will change what is possible in NLP. But it does not make the baseline obsolete. It makes it faster to build and more reliable to deploy.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp