m@ksim.pro
Back to all posts
AI 3 min read

How to evaluate an AI vendor when buying: a working filter

A set of questions and criteria for a manager choosing an AI solution or contractor - without relying on demos and marketing materials.

The AI solutions market right now has a structural feature: there are many offerings, demos work convincingly, and the gap between a product and a prototype is often invisible without the right questions. This is not a complaint about the market - it is simply where the market is at this stage.

For a manager choosing an AI solution or contractor, this means: standard procurement procedures are not enough. You need an additional filter specific to this class of technology.

Why a standard tender process does not work

A standard tender evaluates functional requirements coverage, price, company reputation, and delivery timeline. For AI systems this is insufficient for several reasons.

First: functionality in a demo and functionality on your data are different things. A language model that answers general questions brilliantly may produce unacceptable results in your corporate context.

Second: AI system quality degrades over time if not maintained. Data changes, context changes, models become outdated. This creates an operational burden that does not exist in traditional software.

Third: responsibility for AI system errors is an open question that needs to be addressed in the contract, not assumed by default.

Block 1: Technical maturity assessment

The first group of questions checks whether there is a real product behind the demo.

  • What data was the system trained or configured on? Is data similar to ours represented in the training set?
  • How does the system behave on inputs that differ from the demo? Show us queries where it fails.
  • What is the quality metric - and who measures it? How has it changed over the last 6 months?
  • How is feedback and improvement structured - is there a retraining or fine-tuning process?

Block 2: Operational readiness

The second group evaluates what happens after launch.

  • What does the SLA look like - not just uptime, but response time when quality degrades?
  • How is answer quality monitored in production? Who notices if the system starts producing poor responses?
  • What does the model update plan look like, and how is it coordinated with us in advance?
  • What is the rollback process if quality degrades after an update?

Block 3: Data and privacy

The third group covers questions about data passed to the system.

  • What data from our queries is used for model fine-tuning? By default or with consent?
  • Where is our data stored? In which jurisdictions is it processed?
  • How is data isolation structured between clients in a multi-tenant system?
  • Does the data processing comply with our regulators' requirements?

Block 4: Accountability and contract

The fourth group covers what is often left for later, but is better discussed before signing.

  • How does the contract describe accountability for system errors in critical decisions?
  • What happens to our data upon contract termination?
  • Are there clauses allowing unilateral changes to terms - especially around API access and pricing?

A practical test

The best way to evaluate an AI vendor is to ask for a pilot on your real data with a measurable result. Not a general demo - a specific task from your actual context.

If a vendor avoids such a pilot or cannot agree on evaluation metrics in advance - that is an informative answer in itself.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp