ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Google/Gemini 2.5 Flash
G

Gemini 2.5 Flash

Input:$0.24/M
Output:$2/M
Context:1M
Max Output:65K
Gemini 2.5 Flash is an AI model developed by Google, designed to provide fast and cost-effective solutions for developers, especially for applications requiring enhanced Inference capabilities. According to the Gemini 2.5 Flash preview announcement, the model was released in preview on April 17, 2025, supports Multimodal input, and has a context window of 1 million tokens. This model supports a maximum context length of 65,536 tokens.
New
Commercial Use
Playground
Overview
Features
Pricing
API
Versions

Gemini 2.5 Flash is engineered to deliver rapid responses without compromising on the quality of output. It supports multimodal inputs, including text, images, audio, and video, making it suitable for diverse applications. The model is accessible through platforms like Google AI Studio and Vertex AI, providing developers with the tools necessary for seamless integration into various systems.


Basic Information (Features)

Gemini 2.5 Flash introduces several stand-out features that distinguish it within the Gemini 2.5 family:

  • Hybrid Reasoning: Developers can set a thinking_budget parameter to finely control how many tokens the model dedicates to internal reasoning before output .
  • Pareto Frontier: Positioned at the optimal cost-performance point, Flash offers the best price-to-intelligence ratio among 2.5 models .
  • Multimodal Support: Processes text, images, video, and audio natively, enabling richer conversational and analytical capabilities .
  • 1 Million-Token Context: Unmatched context length allows deep analysis and long document understanding in a single request .

Model Versioning

Gemini 2.5 Flash has transitioned through the following key versions:

  • gemini-2.5-flash-lite-preview-09-2025: Enhanced tool usability: Improved performance on complex, multi-step tasks, with a 5% increase in SWE-Bench Verified scores (from 48.9% to 54%). Improved efficiency: When enabling reasoning, higher-quality output is achieved with fewer tokens, reducing latency and costs.
  • Preview 04-17: Early access release with “thinking” capability, available via gemini-2.5-flash-preview-04-17.
  • Stable General Availability (GA): As of June 17, 2025, the stable endpoint gemini-2.5-flash replaces the preview, ensuring production-grade reliability with no API changes from the May 20 preview .
  • Deprecation of Preview: Preview endpoints were scheduled for shutdown on July 15, 2025; users must migrate to the GA endpoint before this date .

As of July 2025, Gemini 2.5 Flash is now publicly available and stable (no changes from the  gemini-2.5-flash-preview-05-20 ).If you are using gemini-2.5-flash-preview-04-17, the existing preview pricing will continue until the scheduled retirement of the model endpoint on July 15, 2025, when it will be shut down. You can migrate to the generally available model "gemini-2.5-flash" .

Faster, cheaper, smarter:

  • Design goals: low latency + high throughput + low cost;
  • Overall speedup in reasoning, multimodal processing, and long text tasks;
  • Token usage is reduced by 20–30%, significantly reducing reasoning costs.

Technical Specifications

Input Context Window: Up to 1 million tokens, allowing for extensive context retention.

Output Tokens: Capable of generating up to 8,192 tokens per response.

Modalities Supported: Text, images, audio, and video.

Integration Platforms: Available through Google AI Studio and Vertex AI.

Pricing: Competitive token-based pricing model, facilitating cost-effective deployment.


Technical Details

Under the hood, Gemini 2.5 Flash is a transformer-based large language model trained on a mixture of web, code, image, and video data. Key technical specifications include:

Multimodal Training: Trained to align multiple modalities, Flash can seamlessly mix text with images, video, or audio, useful for tasks like video summarization or audio captioning .

Dynamic Thinking Process: Implements an internal reasoning loop where the model plans and breaks down complex prompts before final output .

Configurable Thinking Budgets: The thinking_budget can be set from 0 (no reasoning) up to 24,576 tokens, allowing trade-offs between latency and answer quality .

Tool Integration: Supports Grounding with Google Search, Code Execution, URL Context, and Function Calling, enabling real-world actions directly from natural language prompts .


Benchmark Performance

In rigorous evaluations, Gemini 2.5 Flash demonstrates industry-leading performance:

  • LMArena Hard Prompts: Scored second only to 2.5 Pro on the challenging Hard Prompts benchmark, showcasing strong multi-step reasoning capabilities .
  • MMLU Score of 0.809: Exceeds average model performance with a 0.809 MMLU accuracy, reflecting its broad domain knowledge and reasoning prowess .
  • Latency and Throughput: Achieves 271.4 tokens/sec decoding speed with a 0.29 s Time-to-First-Token, making it ideal for latency-sensitive workloads.
  • Price-to-Performance Leader: At \$0.26/1 M tokens, Flash undercuts many competitors while matching or surpassing them on key benchmarks .

These results indicate Gemini 2.5 Flash's competitive edge in reasoning, scientific understanding, mathematical problem-solving, coding, visual interpretation, and multilingual capabilities:


Limitations

While powerful, Gemini 2.5 Flash carries certain limitations:

  • Safety Risks: The model can exhibit a “preachy” tone and may produce plausible-sounding but incorrect or biased outputs (hallucinations), particularly on edge-case queries. Rigorous human oversight remains essential.
  • Rate Limits: API usage is constrained by rate limits (10 RPM, 250,000 TPM, 250 RPD on default tiers), which can impact batch processing or high-volume applications.
  • Intelligence Floor: While exceptionally capable for a flash model, it remains less accurate than 2.5 Pro on the most demanding agentic tasks like advanced coding or multi-agent coordination.
  • Cost Trade-Offs: Although offering the best price-performance, extensive use of the thinking mode increases overall token consumption, raising costs for deeply reasoning prompts .

Features for Gemini 2.5 Flash

Explore the key features of Gemini 2.5 Flash, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Gemini 2.5 Flash

Explore competitive pricing for Gemini 2.5 Flash, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Gemini 2.5 Flash can enhance your projects while keeping costs manageable.

gemini-2.5-flash (same price across variants)

Model familyVariant (model name)Input price (USD / 1M tokens)Output price (USD / 1M tokens)
gemini-2.5-flashgemini-2.5-flash-thinking$0.24$2.00
gemini-2.5-flashgemini-2.5-flash-all$0.24$2.00
gemini-2.5-flashgemini-2.5-flash$0.24$2.00

Sample code and API for Gemini 2.5 Flash

Gemini 2.5 Flash API is Google's latest multimodal AI model, designed for high-speed, cost-efficient tasks with controllable reasoning capabilities, allowing developers to toggle advanced "thinking" features on or off via the Gemini API
POST
/v1beta/models/{model}:{operator}
POST
/v1/chat/completions
Python
JavaScript
Curl
from google import genai
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Tell me a three sentence bedtime story about a unicorn.",
)

print(response.text)

Python Code Example

from google import genai
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Tell me a three sentence bedtime story about a unicorn.",
)

print(response.text)

JavaScript Code Example

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY;
const base_url = "https://api.cometapi.com/v1beta";
const model = "gemini-2.5-flash";
const operator = "generateContent";

async function main() {
  const response = await fetch(`${base_url}/models/${model}:${operator}`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: api_key,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: "Tell me a three sentence bedtime story about a unicorn." },
          ],
        },
      ],
    }),
  });

  const data = await response.json();
  console.log(data.candidates[0].content.parts[0].text);
}

await main();

Curl Code Example

curl "https://api.cometapi.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "Authorization: $COMETAPI_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Tell me a three sentence bedtime story about a unicorn."
          }
        ]
      }
    ]
  }'

Versions of Gemini 2.5 Flash

The reason Gemini 2.5 Flash has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
gemini-2.5-flash-thinking
gemini-2.5-flash-all
gemini-2.5-flash-deepsearch
gemini-2.5-flash-lite-preview-06-17-thinking
gemini-2.5-flash-lite-thinking
gemini-2.5-flash-lite
gemini-2.5-flash-lite-preview-09-2025
gemini-2.5-flash
gemini-2.5-flash-image
gemini-2.5-flash-lite-preview-06-17
gemini-2.5-flash-preview-04-17
gemini-2.5-flash-preview-05-20
gemini-2.5-flash-preview-09-2025
gemini-2.5-flash-image-preview

More Models

C

Claude Opus 4.7

Input:$3/M
Output:$15/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.
O

GPT 5.5 ALL

Input:$4/M
Output:$24/M
GPT-5.5 excels in code writing, online research, data analysis, and cross-tool operations. The model not only improves its autonomy in handling complex multi-step tasks but also significantly improves reasoning capabilities and execution efficiency while maintaining the same latency as its predecessor, marking an important step towards automated office automation in AI.

Related Blog

Nano Banana 2 Flash Coming soon – The High-Speed Evolution of AI Image Generation
Jan 6, 2026
nano-banana-2

Nano Banana 2 Flash Coming soon – The High-Speed Evolution of AI Image Generation

Google has once again disrupted the generative AI landscape with the Nano Banana 2 Flash Coming soon, the latest addition to its widely acclaimed "Nano Banana" image generation family. Following the massive success of Nano Banana Pro (Gemini 3 Pro Image) late last year, this new iteration promises to democratize professional-grade visual synthesis by combining the frontier intelligence of the Gemini 3 architecture with unprecedented speed and efficiency.
Nano Banana discounts: A truly save money in 2026 for developers
Dec 25, 2025
nano-banana-pro

Nano Banana discounts: A truly save money in 2026 for developers

In conclusion: The official Nano Banana API does not offer any Christmas, New Year's, or other holiday discounts. This is a fact that all developers planning to use Nano Banana (including Nano Banana Pro) for image generation, content creation, or product integration in 2026 must understand. Google does not offer seasonal discounts for the Nano Banana API, whether it's Christmas, Black Friday, or New Year's. The official API's pricing system is consistently stable and transparent, with virtually no room for discounts. So the question is: If you are a developer, and if you plan to perform large-scale image generation, model testing, or product iteration during Christmas or New Year's, is there any way to reduce the cost of using Nano Banana?
Is Free Gemini 2.5 Pro API fried? Changes to the free quota in 2025
Dec 11, 2025
gemini-2-5-pro
gemini-2-5-flash

Is Free Gemini 2.5 Pro API fried? Changes to the free quota in 2025

Google has sharply tightened the free tier for the Gemini API: Gemini 2.5 Pro has been removed from the free tier and Gemini 2.5 Flash’s daily free requests were cut dramatically (reports: ~250 → ~20/day). That doesn’t mean the model is permanently “dead” for experimentation — but it does mean free access has been effectively gutted for many real-world use cases.
Ultimate Guide to Nano-Banana: How to Use and Prompt for best
Sep 8, 2025
gemini-2-5-flash-image

Ultimate Guide to Nano-Banana: How to Use and Prompt for best

Google’s recent release of Gemini 2.5 Flash Image — nicknamed “Nano-Banana” has quickly become the go-to for conversational image editing: it keeps likenesses
How to Use Nano Banana via API?(Gemini-2-5-flash-image)
Aug 28, 2025
gemini-2-5-flash-image

How to Use Nano Banana via API?(Gemini-2-5-flash-image)

Nano Banana is the community nickname (and internal shorthand) for Google’s Gemini 2.5 Flash Image — a high-quality, low-latency multimodal image generation +