m@ksim.pro
Back to all posts
AI 3 min read

NVIDIA Blackwell and the economics of the next inference wave

What the Blackwell architecture announcement means for companies planning or already running AI systems in production: on cost, availability, and strategic decisions.

At the GTC conference in March 2024, NVIDIA announced the Blackwell architecture - the next generation of GPUs built for AI workloads. The numbers are significant: a large step up in inference performance compared to the previous Hopper generation, which was itself a step change relative to everything before it.

I am looking at this not as hardware news, but as a signal about how AI economics will shift over the next eighteen months or so. For companies currently making decisions about AI infrastructure, this has practical relevance.

Why inference matters more than training for most businesses

When people talk about AI and GPUs, training large models is what comes to mind. That is an expensive process that OpenAI and other large players run at enormous budgets.

But for most companies, training is someone else's problem. They take an already-trained model and use it: answering customer questions, classifying documents, generating content. This is called inference - applying the model to new data.

Inference will make up the bulk of the operational costs of AI systems in production. And this is exactly where the Blackwell architecture creates the most meaningful change.

What changes in the economics

Hopper, the H100, became a scarce commodity through 2022 and 2023. Long delivery queues, inflated cloud instance prices - these created a real barrier to scaling AI systems. Blackwell is designed to change that situation.

Higher inference performance means: the same amount of compute can handle more requests, or the same number of requests can be handled with less compute. In business terms - the cost per AI interaction with a user falls.

This has several consequences. Tasks that are currently too expensive in operational cost become economically viable. Models with larger context windows become more accessible - because running them requires fewer resources at the same performance level. Competition among cloud GPU providers will increase, putting additional downward pressure on prices.

What this means for strategic decisions

If a company is currently in the "we are thinking about whether to run AI in production or whether it is too expensive" phase - the economics in twelve to eighteen months will be noticeably different. The same tasks will cost less.

That is not an argument to wait. But it is a reason to treat current cost estimates for AI systems in production as figures that will go stale faster than expected.

A few practical implications:

Do not lock architectural decisions to specific hardware for too long. Cloud inference through a provider is a more flexible strategy than owning your own clusters during a period of such rapid generational change.

Models will get larger. If you are currently using a relatively small model because a larger one is too expensive in operation - recalculate the economics in a year.

Competitive parity will shift. If AI in production is a competitive advantage today, in two years it will be a baseline expectation in several industries.

Questions for assessing readiness

  1. Do you know the cost per AI interaction in your current or planned systems?
  2. Which AI tasks have you postponed specifically because of operational running costs?
  3. Does your AI provider allow you to switch between hardware generations without rebuilding the application?
  4. How do you track changes in inference cost when updating your AI budget?

Blackwell is a reminder that the infrastructure layer of AI moves quickly. Planning based on today's prices and today's constraints means your calculations will be wrong within a year.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp