The Architecture That Lets a Machine Be Interrupted
The headline isn't that ChatGPT will sound nicer — it's that the model will work in a fundamentally different way. Today's Advanced Voice Mode is, underneath, a turn-taking system: you speak, it listens, then it speaks, and trying to cut in tends to make the assistant freeze or talk over you. GPT-Bidi-1's 'Bidi' stands for bidirectional, an architecture that lets the assistant listen and speak at the same time, absorb a user's interruption, and adjust mid-sentence instead of stalling [1]. Reporting describes a model designed to continuously process the speaker's voice so it can immediately change course the instant it is interrupted — the difference between a walkie-talkie and a real phone call [2].
The second piece is a control surface borrowed from the text side. The new voice stack reportedly exposes three intelligence tiers — High, Medium, and Instant — letting users explicitly trade reasoning depth for lower latency, the same dial that already governs how hard the text models think before answering [1]. Rather than force a migration, ChatGPT would let people toggle between a new 'Bidi (Latest)' mode and the existing Advanced Voice Mode, with signs of the feature now appearing across both web and mobile clients. That combination — full-duplex turn-taking plus an explicit speed-versus-smarts knob — is what would let voice finally behave like a conversation rather than a series of dictated exchanges.


