Why 128K Tokens Is the CRM Unlock
The headline benchmark numbers got the press, but the change that actually makes voice control over a CRM viable is quieter: the context window quadrupled from 32K to 128K tokens. TheNextWeb reports that this makes 'longer sessions and complex agentic flows feasible without external state stitching.' For a CRM workflow, that distinction matters. A 32K window forced developers to summarize, evict, or RAG the customer record on every turn — which works for a five-minute call but breaks for a salesperson who wants to update notes, schedule a follow-up, and reference last quarter's renewal in the same conversation. With 128K, the full customer history lives inside the session, and the model can reason against it natively. This is what Charlie Guo's demo, originally built on gpt-realtime-1.5 and amplified by Kwindla Hultman Kramer of Pipecat as 'perfect performance on a hard end-to-end' task, hints at: voice agents can now hold state, not just respond. Combine that with the move from a flat speech-to-speech model to one with GPT-5-class reasoning, and the architectural pattern changes — instead of a thin voice layer wrapped around a separate orchestrator, the Realtime model itself becomes the orchestrator.




