Environment details
- Programming language: TypeScript / JavaScript
- OS: macOS 15.x (Sequoia)
- Language runtime version: Node.js 22.16.0
- Package version:
@google/genai 1.44.0
Steps to reproduce
Model: gemini-2.5-flash-native-audio-preview-12-2025
API: Live API (WebSocket, v1alpha), ephemeral token auth
- Connect to the Live API with both
inputAudioTranscription: {} and outputAudioTranscription: {} enabled in the session config:
const session = await ai.live.connect({
model: "gemini-2.5-flash-native-audio-preview-12-2025",
config: {
responseModalities: [Modality.AUDIO],
inputAudioTranscription: {},
outputAudioTranscription: {},
},
callbacks: {
onmessage: (message: LiveServerMessage) => {
console.log(message.serverContent?.inputTranscription);
console.log(message.serverContent?.outputTranscription);
}
}
});
- Speak several complete sentences, then wait for the model to respond with a complete turn.
- Observe the
inputTranscription and outputTranscription objects logged from serverContent.
Expected behavior
Per the SDK's TypeScript types, and the documented pattern for streaming transcription, the final transcription message for each speaker turn should include finished: true, signaling that the turn is complete and the accumulated text can be finalized.
This is the natural mechanism for knowing when to flush a transcript buffer.
Referenced SDK Patterns
Actual behavior
The finished field is never present on either inputTranscription or outputTranscription messages across an entire conversation.
Each transcription message contains only a text field with a fragment. There is no per-transcription signal indicating that a turn has ended.
The only reliable turn-boundary signal is message.serverContent?.turnComplete === true, which arrives on a separate message with no transcription payload. It also fires only at the end of the model's turn, not the user's.
Workaround in use
- Accumulate
inputTranscription.text fragments into a buffer.
- Flush the user buffer when the first
outputTranscription.text fragment arrives, treating that as an implicit signal that the user has finished speaking.
- Flush the model buffer when
turnComplete: true fires.
This works, but it requires developers to reverse-engineer the signal sequence instead of relying on the documented finished flag.
Environment details
@google/genai1.44.0Steps to reproduce
Model:
gemini-2.5-flash-native-audio-preview-12-2025API: Live API (WebSocket,
v1alpha), ephemeral token authinputAudioTranscription: {}andoutputAudioTranscription: {}enabled in the session config:inputTranscriptionandoutputTranscriptionobjects logged fromserverContent.Expected behavior
Per the SDK's TypeScript types, and the documented pattern for streaming transcription, the final transcription message for each speaker turn should include
finished: true, signaling that the turn is complete and the accumulated text can be finalized.This is the natural mechanism for knowing when to flush a transcript buffer.
Referenced SDK Patterns
Actual behavior
The
finishedfield is never present on eitherinputTranscriptionoroutputTranscriptionmessages across an entire conversation.Each transcription message contains only a
textfield with a fragment. There is no per-transcription signal indicating that a turn has ended.The only reliable turn-boundary signal is
message.serverContent?.turnComplete === true, which arrives on a separate message with no transcription payload. It also fires only at the end of the model's turn, not the user's.Workaround in use
inputTranscription.textfragments into a buffer.outputTranscription.textfragment arrives, treating that as an implicit signal that the user has finished speaking.turnComplete: truefires.This works, but it requires developers to reverse-engineer the signal sequence instead of relying on the documented
finishedflag.