DevDay, long context, and the tooling shift toward LLM production systems
What OpenAI's DevDay announcements mean for companies thinking about moving from LLM pilots to working production systems.
Yesterday OpenAI held its first DevDay - a developer conference. The announcements were concrete: GPT-4 Turbo with a 128,000-token context window, new function calling capabilities, an Assistants API that simplifies building systems with memory and tools. Plus price reductions.
This is not just a product update. It is a signal about the direction LLM tooling is moving. And that signal matters for any company thinking about what to do with language models in a business context.
What changed with long context
Until now one of the main constraints on LLMs in real tasks was the size of the context window. The model "forgets" everything outside it. This meant that working with long documents required cutting them into pieces, building complex retrieval and context-management schemes.
128,000 tokens is roughly 90-100 pages of dense text in a single request. An entire contract, an entire technical specification, several hours of meeting transcript - all of this now fits in context without chunking.
This does not mean RAG and vector search are dead - for very large corpora they are still needed. But for medium-scale tasks, architectural complexity drops significantly.
What the Assistants API means
Before this announcement, building an LLM system with session-to-session memory, a tool set, and file handling required significant engineering work. Managing conversation history, implementing tool calls, storing and retrieving files - all of this had to be built manually.
The Assistants API moves part of this logic to OpenAI's side. Less infrastructure code, more time for business logic. This lowers the entry threshold for teams that want to build productive LLM systems but cannot or do not want to invest in heavy infrastructure.
Why this is a shift, not just an improvement
In my view, today's announcements mark a moment when LLM tooling stops being primarily research-oriented and becomes primarily engineering- and product-oriented.
Until now most of the conversation was about what models can do. The conversation is now moving to how to build systems that work in production: reliably, scalably, with manageable costs.
This means that companies that have been watching from the sidelines, waiting for the technology to "mature" - are now getting a clearer entry point. The tools for assembling productive systems have become more concrete.
What this means in practice
For those already building on LLMs: it is worth reassessing architecture in light of the new capabilities. Some of the complexity you were solving manually in your RAG pipeline may disappear. Some of the cost will decrease.
For those who have not started yet: the entry threshold has come down. But that does not mean the questions about data, security, and operational costs have gone anywhere. They just get asked at a different point in the journey.
A few questions for assessing your position
- Are there specific tasks where long context changes the architectural answer?
- Which of our LLM pilots are ready for a production-infrastructure conversation?
- Do we understand where in our chain the bottleneck is - the model, the data, or the integration?
- Is there someone responsible for tracking changes in LLM tooling and evaluating their applicability?
The pace of change in this area is such that "we will see in a year" is no longer a working strategy. A year ago GPT-4 did not exist. Today it has a 128,000-token context and an API for assembling agents.