AI in 2023: what actually changed and what is still open
A mid-November account of what the year delivered in practical terms - not a hype recap but an honest read of where things moved and where the gaps remain.
We are close enough to the end of 2023 that it makes sense to take stock. This has been an unusual year to work in technology: genuinely significant things happened, and they happened fast. But the gap between what was announced and what is actually running in production at real companies remains large. I want to be specific about both.
What demonstrably moved
The capability ceiling shifted. GPT-4 arrived in March and set a new reference point for reasoning, code generation, and instruction following. The gap between a well-prompted GPT-4 call and a domain-specific ML model built from scratch narrowed substantially for many tasks. Teams that had been working on custom NLP pipelines had to reassess.
RAG became a real pattern. Retrieval-augmented generation went from a research technique to something that engineering teams actually deploy. The tooling around it - vector databases, embedding APIs, orchestration frameworks - matured enough that an experienced team can build a working prototype in days rather than months.
The cost curve moved. API pricing dropped significantly through the year. Tasks that were economically implausible in January became plausible by Q4. This changed the set of problems worth attempting with LLMs.
Open weights models became serious. Llama 2 in July, followed by a wave of fine-tuned derivatives, changed the calculus for organisations with data privacy concerns or a need for deployment on private infrastructure. Running a capable model without sending data to an external API is now a realistic option for many use cases.
What has not changed as much as the coverage suggested
Production reliability. Hallucination rates improved but did not disappear. For any use case where accuracy matters - legal, medical, financial, customer-facing - the model output still requires human review or robust validation before action is taken. The tooling for that validation layer is still immature.
Data readiness at most organisations. The gap between "the model can theoretically do this" and "we have the data infrastructure to actually use the model for this" is wider than most companies discovered only after starting pilots. The data quality and access problems that predate AI have not resolved.
The skills to build and operate AI systems. Prompt engineering is learnable quickly. Building reliable production AI systems - with evaluation frameworks, monitoring, fallback behaviour, and cost management - requires skills that the industry does not yet have in abundance.
What I think 2024 brings
The enterprise focus will shift from proof of concept to operationalisation. The companies that spent 2023 running pilots will spend 2024 either converting them to production systems or quietly retiring them. The distinction between the two outcomes will depend less on the AI technology and more on data discipline, integration work, and organisational willingness to redesign workflows.
The open weights ecosystem will keep maturing. For many business applications, a mid-sized model running on private infrastructure will outperform a large external model - not because the model is better, but because the deployment constraints and data access are better.
I remain cautious about predictions. The pace of change this year was genuinely unusual. What I am confident about is that the gap between announcement and production will be the defining challenge in 2024, not capability.