Changelog Update
- Released
veo-2.0-generate-001, a generally available (GA) text- and image-to-video model, capable of generating detailed and artistically nuanced videos. To learn more, see the Veo docs. Released
gemini-2.0-flash-live-001, a public preview version of the Live API model with billing enabled.Enhanced Session Management and Reliability
- Session Resumption: Keep sessions alive across temporary network disruptions. The API now supports server-side session state storage (for up to 24 hours) and provides handles (session_resumption) to reconnect and resume where you left off.
- Longer Sessions via Context Compression: Enable extended interactions beyond previous time limits. Configure context window compression with a sliding window mechanism to automatically manage context length, preventing abrupt terminations due to context limits.
- Graceful Disconnect Notification: Receive a
GoAwayserver message indicating when a connection is about to close, allowing for graceful handling before termination.
More Control over Interaction Dynamics
Configurable Voice Activity Detection (VAD): Choose sensitivity levels or disable automatic VAD entirely and use new client events (
activityStart,activityEnd) for manual turn control.Configurable Interruption Handling: Decide whether user input should interrupt the model's response.
Configurable Turn Coverage: Choose whether the API processes all audio and video input continuously or only captures it when the end-user is detected speaking.
Configurable Media Resolution: Optimize for quality or token usage by selecting the resolution for input media.
Richer Output and Features
Expanded Voice & Language Options: Choose from two new voices and 30 new languages for audio output. The output language is now configurable within
speechConfig.Text Streaming: Receive text responses incrementally as they are generated, enabling faster display to the user.
Token Usage Reporting: Gain insights into usage with detailed token counts provided in the
usageMetadatafield of server messages, broken down by modality and prompt or response phases.