Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:

Improved response quality: Aims to match 2.5 Flash performance.
Improved instruction following: Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
Improved audio input: Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
Expanded thinking support: You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels. This feature lets you balance response quality and speed for your specific use case.

Try in Vertex AI (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.

Technical specifications
Model ID	`gemini-3.1-flash-lite-preview`
Supported inputs & outputs	Inputs: Text, Code, Images, Audio, Video, PDF Outputs: Text
Token limits	Maximum input tokens: 1,048,576 Maximum output tokens: 65,535 (default)
Capabilities	Supported Grounding with Google Search Code execution System instructions Function calling Count Tokens Structured output Thinking Implicit context caching Explicit context caching Vertex AI RAG Engine Chat completions Not supported Gemini Live API Content Credentials (C2PA)
Consumption options	Supported Provisioned Throughput Standard PayGo Flex PayGo Priority PayGo Batch prediction Not supported
Consumption options	See Consumption options for more information.
	Images	Maximum images per prompt: 3,000 Maximum file size per file for inline data or direct uploads through the console: 7 MB Maximum file size per file from Google Cloud Storage: 30 MB Maximum number of output images per prompt: 10 Supported MIME types: `image/png`, `image/jpeg`, `image/webp`, `image/heic`, `image/heif`
	Documents	Maximum number of files per prompt: 3,000 Maximum number of pages per file: 1,000 Maximum file size per file: 50 MB Supported MIME types: `application/pdf`, `text/plain`
	Video	Maximum video length (with audio): Approximately 45 minutes Maximum video length (without audio): Approximately 1 hour Maximum number of videos per prompt: 10 Supported MIME types: `video/x-flv`, `video/quicktime`, `video/mpeg`, `video/mpegs`, `video/mpg`, `video/mp4`, `video/webm`, `video/wmv`, `video/3gpp`
	Audio	Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens Maximum number of audio files per prompt: 1 Supported MIME types: `audio/x-aac`, `audio/flac`, `audio/mp3`, `audio/m4a`, `audio/mpeg`, `audio/mpga`, `audio/mp4`, `audio/ogg`, `audio/pcm`, `audio/wav`, `audio/webm`
	Parameter defaults	Temperature: 0.0-2.0 (default 1.0) topP: 0.0-1.0 (default 0.95) topK: 64 (fixed) candidateCount: 1–8 (default 1)
Supported regions
	Model availability	Global global
	See Deployments and endpoints for more information.
Knowledge cutoff date	January 2025
Versions	`gemini-3.1-flash-lite-preview` Launch stage: Public preview Release date: March 3, 2026
Supported languages	See Supported languages.
Pricing	See Pricing.

Gemini 3.1 Flash-Lite Stay organized with collections Save and categorize content based on your preferences.

Gemini 3.1 Flash-Lite