ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/Google/Gemini 2.5 Flash Lite
G

Gemini 2.5 Flash Lite

Input:$0.08/M
Output:$0.32/M
Context:1M
Max Output:65K
An optimized Gemini 2.5 Flash model for high cost-effectiveness and high throughput. The smallest, most cost-effective model, built for large-scale use.
New
Commercial Use
Playground
Overview
Features
Pricing
API
Versions

Technical Details

  • Adaptive Reasoning: Gemini 2.5 Flash-Lite supports on-demand thinking, allowing developers to allocate compute resources only when deeper reasoning is required.
  • Tool Integrations: Full compatibility with Gemini 2.5’s native tools, including Grounding with Google Search, Code Execution, URL Context, and Function Calling for seamless multimodal workflows.
  • Model Context Protocol (MCP): Leverages Google’s MCP to fetch real-time web data, ensuring responses are up-to-date and contextually relevant.
  • Deployment Options: Available through the CometAPI, Gemini API, Vertex AI, and Google AI Studio, with a preview track for early adopters to experiment and provide feedback .

Benchmark Performance of Gemini 2.5 Flash-Lite

  • Latency: Achieves up to 50% lower median response times compared to Gemini 2.5 Flash, with typical sub-100 ms latencies on standard classification and summarization benchmarks.
  • Throughput: Optimized for high-volume workloads, sustaining tens of thousands of requests per minute without degradation in performance.
  • Price-Performance: Demonstrates a 25% reduction in cost per 1,000 tokens versus its Flash counterpart, making it the Pareto-optimal choice for cost-sensitive deployments.
  • Industry Adoption: Early users report seamless integration into production pipelines, with performance metrics aligning with or exceeding initial projections .

Gemini 2.5 Flash Lite


Ideal Use Cases

  • High-Frequency, Low-Complexity Tasks: Automated tagging, sentiment analysis, and bulk translation
  • Cost-Sensitive Pipelines: Data extraction from large document corpora, periodic batch summarization
  • Edge and Mobile Scenarios: When latency is critical but resource budgets are limited

Limitations of Gemini 2.5 Flash-Lite

  • Preview Status: May undergo API changes before GA; integrations should account for possible version bumps.
  • No On-the-Fly Fine-Tuning: Cannot upload custom weights; rely on prompt engineering and system messages.
  • Reduced Creativity: Tuned for deterministic, high-throughput tasks; less suited for open-ended generation or “creative” writing.
  • Resource Ceiling: Scales linearly only up to ~16 vCPUs; beyond this, throughput gains diminish.
  • Multimodal Constraints: Supports image/audio inputs but with limited fidelity; not ideal for heavy vision or audio transcription tasks.
  • Context-Window Trade-Off : Although it accepts up to 1 M tokens, practical inference at that scale may see degraded throughput.

Features for Gemini 2.5 Flash Lite

Explore the key features of Gemini 2.5 Flash Lite, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Gemini 2.5 Flash Lite

Explore competitive pricing for Gemini 2.5 Flash Lite, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Gemini 2.5 Flash Lite can enhance your projects while keeping costs manageable.
model nameInput ($/1M)Output ($/1M)
gemini-2.5-flash-lite0.080.32
gemini-2.5-flash-lite-preview-06-170.080.32
gemini-2.5-flash-lite-thinking0.080.32
gemini-2.5-flash-lite-preview-06-17-thinking0.080.32
gemini-2.5-flash-lite-preview-09-20250.080.32

Sample code and API for Gemini 2.5 Flash Lite

Access comprehensive sample code and API resources for Gemini 2.5 Flash Lite to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of Gemini 2.5 Flash Lite in your projects.
POST
/v1beta/models/{model}:{operator}
POST
/v1/chat/completions

Versions of Gemini 2.5 Flash Lite

The reason Gemini 2.5 Flash Lite has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
gemini-2.5-flash-lite
gemini-2.5-flash-lite-preview-09-2025
gemini-2.5-flash-lite-preview-06-17
gemini-2.5-flash-lite-preview-06-17-thinking
gemini-2.5-flash-lite-thinking

More Models

A

Claude Opus 4.6

Input:$4/M
Output:$20/M
Claude Opus 4.6 是 Anthropic 的“Opus”级大型语言模型,于 2026 年 2 月发布。其定位为知识工作与研究工作流的主力模型——提升长上下文推理、多步骤规划、工具使用(包括代理型软件工作流),以及计算机使用类任务,如自动生成幻灯片和电子表格。
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT-5.4 nano

Input:$0.16/M
Output:$1/M
GPT-5.4 nano is designed for tasks where speed and cost matter most like classification, data extraction, ranking, and sub-agents.
O

GPT-5.4 mini

Input:$0.6/M
Output:$3.6/M
GPT-5.4 mini brings the strengths of GPT-5.4 to a faster, more efficient model designed for high-volume workloads.
A

Claude Mythos Preview

A

Claude Mythos Preview

Coming soon
Input:$60/M
Output:$240/M
Claude Mythos Preview is our most capable frontier model to date, and shows a striking leap in scores on many evaluation benchmarks compared to our previous frontier model, Claude Opus 4.6.
X

mimo-v2-pro

Input:$0.8/M
Output:$2.4/M
MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.

Related Blog

Is Free Gemini 2.5 Pro API fried? Changes to the free quota in 2025
Dec 11, 2025
gemini-2-5-pro
gemini-2-5-flash

Is Free Gemini 2.5 Pro API fried? Changes to the free quota in 2025

Google has sharply tightened the free tier for the Gemini API: Gemini 2.5 Pro has been removed from the free tier and Gemini 2.5 Flash’s daily free requests were cut dramatically (reports: ~250 → ~20/day). That doesn’t mean the model is permanently “dead” for experimentation — but it does mean free access has been effectively gutted for many real-world use cases.