Gemini 2.5 Flash is engineered to deliver rapid responses without compromising on the quality of output. It supports multimodal inputs, including text, images, audio, and video, making it suitable for diverse applications. The model is accessible through platforms like Google AI Studio and Vertex AI, providing developers with the tools necessary for seamless integration into various systems.
Gemini 2.5 Flash introduces several stand-out features that distinguish it within the Gemini 2.5 family:
Gemini 2.5 Flash has transitioned through the following key versions:
As of July 2025, Gemini 2.5 Flash is now publicly available and stable (no changes from the gemini-2.5-flash-preview-05-20 ).If you are using gemini-2.5-flash-preview-04-17, the existing preview pricing will continue until the scheduled retirement of the model endpoint on July 15, 2025, when it will be shut down. You can migrate to the generally available model "gemini-2.5-flash" .
Faster, cheaper, smarter:
Input Context Window: Up to 1 million tokens, allowing for extensive context retention.
Output Tokens: Capable of generating up to 8,192 tokens per response.
Modalities Supported: Text, images, audio, and video.
Integration Platforms: Available through Google AI Studio and Vertex AI.
Pricing: Competitive token-based pricing model, facilitating cost-effective deployment.
Under the hood, Gemini 2.5 Flash is a transformer-based large language model trained on a mixture of web, code, image, and video data. Key technical specifications include:
Multimodal Training: Trained to align multiple modalities, Flash can seamlessly mix text with images, video, or audio, useful for tasks like video summarization or audio captioning .
Dynamic Thinking Process: Implements an internal reasoning loop where the model plans and breaks down complex prompts before final output .
Configurable Thinking Budgets: The thinking_budget can be set from 0 (no reasoning) up to 24,576 tokens, allowing trade-offs between latency and answer quality .
Tool Integration: Supports Grounding with Google Search, Code Execution, URL Context, and Function Calling, enabling real-world actions directly from natural language prompts .
In rigorous evaluations, Gemini 2.5 Flash demonstrates industry-leading performance:
These results indicate Gemini 2.5 Flash's competitive edge in reasoning, scientific understanding, mathematical problem-solving, coding, visual interpretation, and multilingual capabilities:

While powerful, Gemini 2.5 Flash carries certain limitations:
| Comet Price (USD / M Tokens) | Official Price (USD / M Tokens) |
|---|---|
Input:$0.24/M Output:$2.00/M | Input:$0.30/M Output:$2.50/M |
from google import genai
import os
# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"
client = genai.Client(
http_options={"api_version": "v1beta", "base_url": BASE_URL},
api_key=COMETAPI_KEY,
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Tell me a three sentence bedtime story about a unicorn.",
)
print(response.text)