AI Changelog Aggregator

Released veo-2.0-generate-001 , a generally available (GA) text- and image-to-video model, capable of generating detailed and artistically nuanced videos. To learn more, see the Veo docs .
Released gemini-2.0-flash-live-001 , a public preview version of the Live API model with billing enabled.
- Enhanced Session Management and Reliability
  - Session Resumption: Keep sessions alive across temporary network disruptions. The API now supports server-side session state storage (for up to 24 hours) and provides handles (session_resumption) to reconnect and resume where you left off.
  - Longer Sessions via Context Compression: Enable extended interactions beyond previous time limits. Configure context window compression with a sliding window mechanism to automatically manage context length, preventing abrupt terminations due to context limits.
  - Graceful Disconnect Notification: Receive a GoAway server message indicating when a connection is about to close, allowing for graceful handling before termination.
- More Control over Interaction Dynamics
- Configurable Voice Activity Detection (VAD): Choose sensitivity levels or disable automatic VAD entirely and use new client events ( activityStart , activityEnd ) for manual turn control.
- Configurable Interruption Handling: Decide whether user input should interrupt the model's response.
- Configurable Turn Coverage: Choose whether the API processes all audio and video input continuously or only captures it when the end-user is detected speaking.
- Configurable Media Resolution: Optimize for quality or token usage by selecting the resolution for input media.
- Richer Output and Features
- Expanded Voice & Language Options: Choose from two new voices and 30 new languages for audio output. The output language is now configurable within speechConfig .
- Text Streaming: Receive text responses incrementally as they are generated, enabling faster display to the user.
- Token Usage Reporting: Gain insights into usage with detailed token counts provided in the usageMetadata field of server messages, broken down by modality and prompt or response phases.