ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/Google/Nano Banana 2
G

Nano Banana 2

Input:$0.4/M
Output:$2.4/M
Core Capabilities Overview: Resolution: Up to 4K (4096×4096), on par with Pro. Reference Image Consistency: Up to 14 reference images (10 objects + 4 characters), maintaining style/character consistency. Extreme Aspect Ratios: New 1:4, 4:1, 1:8, 8:1 ratios added, suitable for long images, posters, and banners. Text Rendering: Advanced text generation, suitable for infographics and marketing poster layouts. Search Enhancement: Integrated Google Search + Image Search. Grounding: Built-in thinking process; complex prompts are reasoned before generation.
New
Commercial Use
Playground
Overview
Features
Pricing
API
Versions

Technical Specifications of Gemini 3.1 Flash Image Preview

ItemGemini 3.1 Flash Image Preview
ProviderGoogle
Model familyGemini 3.1 (Flash tier)
Primary focusFast multimodal generation with image preview
Input typesText, Image
Output typesText, Image (preview generation)
Context windowUp to 1M tokens (Gemini 3.x Flash tier standard)
Latency tierLow-latency, high-throughput
Streaming supportYes
Tool callingYes (Gemini API tools framework)
Version3.1

What is Nano Banana 2

Nano Banana 2 is the popular nickname used by the press and developer community for the newly released Gemini-3.1-Flash-Image model. Google positions it as the “Flash”-tier image engine that brings near-Pro visual fidelity to a much lower latency and cost tier — suitable for high-volume generation, rapid iterative editing, and integrated product workflows across Google services. It inherits Gemini 3.1’s multimodal reasoning and adds image-centric capabilities (legible text in images, multi-image composition, wide aspect ratio support, native 4K).

Main features

  • High-speed, multi-resolution generation: Flash-tier speed with options for 0.5K / 1K / 2K / 4K outputs and new extreme aspect ratios (1:4, 4:1, 1:8, 8:1).
  • Real-time web grounding: Integrates both text and image search results to ground generated content in current web information when “Thinking” or search grounding is enabled. Useful for up-to-date references and factual infographics.
  • Improved text rendering: Better short-text and graphic text rendering (fonts, sizes) than earlier Flash models; still imperfect on long paragraphs/small text.
  • Multi-input editing and multi-turn workflows: Strong support for combining several images as inputs and for iterative edits across turns.

📊 Benchmark Performance — Image Generation & Editing (Elo scores)

CapabilityGemini 3.1 Flash Image (Nano Banana 2)Gemini 2.5 Flash Image (Nano Banana)Gemini 3 Pro Image (Nano Banana Pro)GPT-Image 1.5Seedream 5.0 LiteGrok Imagine Image Pro
Text-to-Image — Overall Preference1079.0 ± 7.01073.0 ± 5.0942.0 ± 6.01021.0 ± 5.01047.0 ± 5.0928.0 ± 8.0
Text-to-Image — Visual Quality1140.0 ± 6.01129.0 ± 6.0929.0 ± 6.01043.0 ± 5.0975.0 ± 5.0759.0 ± 10.0
Text-to-Image — Infographics (Factuality)1114.0 ± 14.01074.0 ± 12.0881.0 ± 13.01102.0 ± 13.0985.0 ± 12.0890.0 ± 22.0
Editing — General1065.0 ± 9.01047.0 ± 9.0913.0 ± 9.01051.0 ± 10.0995.0 ± 8.0937.0 ± 9.0
Editing — Character1056.0 ± 7.01049.0 ± 7.0952.0 ± 7.01050.0 ± 8.01025.0 ± 7.0894.0 ± 8.0
Editing — Creative1023.0 ± 7.01031.0 ± 7.0976.0 ± 7.01004.0 ± 7.01017.0 ± 7.0938.0 ± 7.0
Editing — Object/Environment1029.0 ± 8.01018.0 ± 8.0945.0 ± 8.01042.0 ± 10.0976.0 ± 8.0946.0 ± 9.0
Editing — Multi-Input1037.0 ± 8.01016.0 ± 8.0919.0 ± 9.01056.0 ± 12.01014.0 ± 9.0N/A
Editing — Stylization1045.0 ± 7.01031.0 ± 7.0862.0 ± 8.01045.0 ± 9.0996.0 ± 7.0984.0 ± 7.0

Key takeaways from this benchmark table:

  • Across text-to-image generation and image editing categories, Gemini 3.1 Flash Image consistently leads or matches the highest scores among Flash-tier and many competitive image models.
  • The model shows especially strong results in Visual Quality and Infographic (Factuality) benchmarks—signaling that it excels not only in aesthetic quality but also in rendering structurally accurate content.
  • On Multi-Input editing, Nano Banana 2 also shows robust generalization, with higher scores than its previous Flash generation.

These evaluations are conducted via human side-by-side Elo comparisons on a diverse benchmark suite, reflecting both preference and fidelity across commonly used image generation/editing tasks.

Nano Banana 2 vs Nano Banana vs Nano Banana Pro

ModelPositioningRepresentative benchmark/notes
Gemini 3.1 Flash Image (Nano Banana 2)Flash tier: speed + high visual quality (2K–4K)Overall preference 1079.0 ± 7.0; visual quality 1140 ± 6.0 (internal GenAI-Bench).
Gemini 2.5 Flash Image (Nano Banana)Earlier Flash release (lower fidelity)Slightly lower preference/visual scores vs 3.1.
Gemini 3 Pro Image (Nano Banana Pro)Pro tier: higher perceived fidelity for complex tasks, higher cost/latencyDifferent tradeoffs; some metrics show different relative rankings in specialty tasks.
GPT-Image 1.5 / other commercial modelsCompetitors (open/closed)In Google’s internal benchmarks GPT-Image and others scored below Gemini 3.1 on visual quality and overall preference in the reported eval. Independent third-party comparisons vary.

When to choose Flash Image Preview:

  • Real-time image preview in apps
  • Cost-sensitive large-scale image generation
  • Interactive design assistants

How to access and integrate Nano Banana 2

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Nano Banana 2 API

Select the “gemini-3.1-flash-image-preview8” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it:Gemini generates image

Nano Banana 2 supports image editing, image generation, and multi-image workflows. For image editing, you need to upload the image URL. For more parameters, please refer to the documentation.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data. You can directly download the image to your local machine in the playground (usually in PNG format). An image URL is generated in the API process; please download it promptly.

FAQ

What exactly is Nano Banana 2 and what does it do?

Nano Banana 2 is Google’s latest AI image generation and editing model, built on Gemini Flash image technology to deliver fast, high-quality visual generation and precise instruction following across text and image inputs.

How does Nano Banana 2 relate to Gemini 3.1 Flash Image?

Nano Banana 2 is essentially the consumer-facing branding for Google’s Gemini 3.1 Flash Image model, combining advanced capabilities from previous Nano Banana versions with the speed of Flash models.

What improvements does Nano Banana 2 add over earlier Nano Banana models?

Nano Banana 2 brings faster generation speed, sharper detail, better instruction fidelity, enhanced text rendering⁠/localized translation, and broader creative control while making many Pro-grade features available at base tier.

What kinds of images and resolutions can Nano Banana 2 generate?

The model supports flexible output with various aspect ratios and resolutions up to 4K, suitable for social media, ads, displays, and professional content.

Can Nano Banana 2 maintain consistency in complex compositions?

Yes — it preserves consistency across multiple subjects and objects (e.g., up to five characters and 14 objects in a single prompt workflow), helping with narrative scenes and storyboard-style tasks.

What image generation use cases is Gemini 3.1 Flash Image best suited for?

It’s well-suited for professional-grade image creation and editing, infographics, multi-image consistency, text rendering, and localized multilingual outputs, especially when workflows need precise control and repeated iterations.

Does Nano Banana 2 use real-time information or world knowledge?

Nano Banana 2 incorporates real-world knowledge and image search integration to help generate more accurate subjects, infographics, and location-aware visuals.

Can Gemini 3.1 Flash Image generate detailed text within images or diagrams?

Yes — it can generate and render clear text within images, but extremely small or dense multi-paragraph text sometimes remains challenging.

Features for Nano Banana 2

Explore the key features of Nano Banana 2, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Nano Banana 2

Explore competitive pricing for Nano Banana 2, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Nano Banana 2 can enhance your projects while keeping costs manageable.

nano-banana-2(image)

variant / aliasPrice
gemini-3.1-flash-image (0.5K)≈ $0.03600
gemini-3.1-flash-image (1K)≈ $0.05360
gemini-3.1-flash-image (2K)≈ $0.08080
gemini-3.1-flash-image (4K)≈ $0.12080
gemini-3.1-flash-image-preview (0.5K)≈ $0.03600
gemini-3.1-flash-image-preview (1K)≈ $0.05360
gemini-3.1-flash-image-preview (2K)≈ $0.08080
gemini-3.1-flash-image-preview (4K)≈ $0.12080

Sample code and API for Nano Banana 2

Access comprehensive sample code and API resources for Nano Banana 2 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of Nano Banana 2 in your projects.
POST
/v1beta/models/{model}:generateContent
Python
JavaScript
Curl
from google import genai
from google.genai import types
from PIL import Image
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

prompt = (
    "A woman leaning on a wooden railing of a traditional Chinese building. "
    "She is wearing a blue cheongsam with pink and red floral motifs and a headdress "
    "made of colorful flowers, including roses and lilacs. Realistic painting style, "
    "focusing on the textural details of the clothing patterns and wooden buildings."
)
aspect_ratio = "9:16"  # "1:1","2:3","3:2","3:4","4:3","4:5","5:4","9:16","16:9","21:9"

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[prompt],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(aspect_ratio=aspect_ratio),
    ),
)

os.makedirs("./output", exist_ok=True)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        output_path = "./output/gemini-3.1-flash-image-preview.png"
        image.save(output_path)
        print(f"Image saved to {output_path}")

Python Code Example

from google import genai
from google.genai import types
from PIL import Image
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

prompt = (
    "A woman leaning on a wooden railing of a traditional Chinese building. "
    "She is wearing a blue cheongsam with pink and red floral motifs and a headdress "
    "made of colorful flowers, including roses and lilacs. Realistic painting style, "
    "focusing on the textural details of the clothing patterns and wooden buildings."
)
aspect_ratio = "9:16"  # "1:1","2:3","3:2","3:4","4:3","4:5","5:4","9:16","16:9","21:9"

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[prompt],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(aspect_ratio=aspect_ratio),
    ),
)

os.makedirs("./output", exist_ok=True)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        output_path = "./output/gemini-3.1-flash-image-preview.png"
        image.save(output_path)
        print(f"Image saved to {output_path}")

JavaScript Code Example

import fs from "fs";
import path from "path";

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1beta";
const model = "gemini-3.1-flash-image-preview";

const prompt =
  "A woman leaning on a wooden railing of a traditional Chinese building. " +
  "She is wearing a blue cheongsam with pink and red floral motifs and a headdress " +
  "made of colorful flowers, including roses and lilacs. Realistic painting style, " +
  "focusing on the textural details of the clothing patterns and wooden buildings.";

const response = await fetch(`${base_url}/models/${model}:generateContent`, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: api_key,
  },
  body: JSON.stringify({
    contents: [
      {
        role: "user",
        parts: [{ text: prompt }],
      },
    ],
    generationConfig: {
      responseModalities: ["IMAGE"],
      imageConfig: {
        aspectRatio: "9:16",
      },
    },
  }),
});

const data = await response.json();

const outputDir = "./output";
if (!fs.existsSync(outputDir)) {
  fs.mkdirSync(outputDir, { recursive: true });
}

for (const candidate of data.candidates) {
  for (const part of candidate.content.parts) {
    if (part.text) {
      console.log(part.text);
    } else if (part.inlineData) {
      const imageBuffer = Buffer.from(part.inlineData.data, "base64");
      const outputPath = path.join(outputDir, "gemini-3.1-flash-image-preview.png");
      fs.writeFileSync(outputPath, imageBuffer);
      console.log(`Image saved to ${outputPath}`);
    }
  }
}

Curl Code Example

# Get your CometAPI key from https://api.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

mkdir -p ./output

curl -s "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: $COMETAPI_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "A woman leaning on a wooden railing of a traditional Chinese building. She is wearing a blue cheongsam with pink and red floral motifs and a headdress made of colorful flowers, including roses and lilacs. Realistic painting style, focusing on the textural details of the clothing patterns and wooden buildings."
          }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "9:16"
      }
    }
  }' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
parts = data['candidates'][0]['content']['parts']
for part in parts:
    if 'text' in part:
        print(part['text'])
    elif 'inlineData' in part:
        img = base64.b64decode(part['inlineData']['data'])
        with open('./output/gemini-3.1-flash-image-preview.png', 'wb') as f:
            f.write(img)
        print('Image saved to ./output/gemini-3.1-flash-image-preview.png')
"

Versions of Nano Banana 2

The reason Nano Banana 2 has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
Model iddescriptionAvailabilityRequest
gemini-3.1-flash-imageRecommend, Pointing to the latest model✅Gemini generates image
gemini-3.1-flash-image-previewOfficial Preview✅Gemini generates image

More Models

D

Doubao Seedream 5

Per Request:$0.028
Seedream 5.0 Lite is a unified multimodal image generation model endowed with deep thinking andonline search capabilities, featuring an all-round upgrade in its understanding, reasoning and generationcapabilities.
F

FLUX 2 MAX

Per Request:$0.008
FLUX.2 [max] is a top-tier visual-intelligence model from Black Forest Labs (BFL) designed for production workflows: marketing, product photography, e-commerce, creative pipelines, and any application that requires consistent character/product identity, accurate text rendering, and photoreal detail at multi-megapixel resolutions. The architecture is engineered for strong prompt-following, multi-reference fusion (up to ten input images), and grounded generation (ability to incorporate up-to-date web context when producing images).
X

Black Forest Labs/FLUX 2 MAX

Per Request:$0.056
FLUX.2 [max] is the flagship, highest-quality variant of the FLUX.2 family from Black Forest Labs (BFL). It is positioned as a professional-grade text→image generation and image-editing model that focuses on maximal fidelity, prompt adherence, and editing consistency across characters, objects, lighting and color. BFL and partner registries describe FLUX.2 [max] as the top-tier FLUX.2 variant with features for multi-reference editing, grounded generation.
O

GPT Image 1.5

Input:$6.4/M
Output:$25.6/M
GPT-Image-1.5 is OpenAI’s image model in the GPT Image family . It is a natively multimodal GPT model designed to generate images from text prompts and to perform high-fidelity edits of input images while following user instructions closely.
D

Doubao Seedream 4.5

Per Request:$0.032
Seedream 4.5 is ByteDance/Seed’s multimodal image model (text→image + image editing) that focuses on production-grade image fidelity, stronger prompt adherence, and much-improved editing consistency (subject preservation, text/typography rendering, and facial realism).
R

Black Forest Labs/FLUX 2 PRO

R

Black Forest Labs/FLUX 2 PRO

Per Request:$0.06
FLUX 2 PRO is the flagship commercial model in the FLUX 2 series, delivering state-of-the-art image generation with unprecedented quality and detail. Built for professional and enterprise applications, it offers superior prompt adherence, photorealistic outputs, and exceptional artistic capabilities. This model represents the cutting edge of AI image synthesis technology.

Related Blog

How Much Does OpenClaw Cost in 2026? Complete Pricing Breakdown
Apr 13, 2026
openclaw

How Much Does OpenClaw Cost in 2026? Complete Pricing Breakdown

OpenClaw’s core software is 100% free (MIT license). Real-world monthly costs range from $0–$13 for light personal use** (free-tier hosting + cheap models) to **$25–$100 for small teams and $100–$200+ for heavy automation. The official OpenClaw Cloud managed plan is a flat $59/month ($29.50 first month). API tokens are the biggest variable—smart optimization can slash them by 90%.
GPT Image 1.5 vs Seedream 4.5: which is Better in 2026
Apr 12, 2026
gpt-image-1-5
seedream-4-5

GPT Image 1.5 vs Seedream 4.5: which is Better in 2026

GPT Image 1.5 (OpenAI, Dec 2025) leads with 4× faster generation (5–15 seconds), top-tier LM Arena ELO scores (~1,264–1,285), and superior instruction-following for editing. Seedream 4.5 (ByteDance, Dec 2025) excels in typography, 4K resolution, multi-image consistency (up to 14 references), and flat $0.04/image pricing. Choose GPT Image 1.5 for speed and versatility; Seedream 4.5 for design-heavy commercial work. Both are accessible affordably via **CometAPI**’s unified platform for 20%+ savings and single-key integration.
How Long Does ChatGPT Take to Generate an Image in 2026?
Apr 9, 2026
chat-gpt

How Long Does ChatGPT Take to Generate an Image in 2026?

In 2026, ChatGPT typically generates an image in **5–20 seconds** using its latest GPT-Image 1.5 model (the successor to DALL·E 3). Simple prompts finish in as little as 3–8 seconds, while complex or high-detail requests can take 20–60 seconds during peak hours. Free users often wait longer (30–60+ seconds), whereas Plus/Pro subscribers benefit from priority processing. These times represent a major improvement over 2024–2025 DALL·E 3 averages of 15–30 seconds, thanks to OpenAI’s December 2025 GPT-Image 1.5 upgrade that delivers up to 4× faster inference.
Alibaba Wan2.7-Image Review 2026: Revolutionary Unified AI Image Model
Apr 3, 2026

Alibaba Wan2.7-Image Review 2026: Revolutionary Unified AI Image Model

Wan2.7-Image is Alibaba Cloud’s newly launched unified image model, announced on April 1, 2026. It combines image generation, image editing, and visual understanding in one workflow, supports multi-image input, and is designed for faster generation than the Pro variant. Alibaba says the model can handle text-to-image, image editing, image-set generation, and multiple reference images, while Wan2.7-Image-Pro adds 4K output and more stable composition.
Luma AI Unit-1 Image Model (2026): Comprehensive Analysis & Comparison
Mar 24, 2026

Luma AI Unit-1 Image Model (2026): Comprehensive Analysis & Comparison

Luma AI’s Uni-1 is a next-generation autoregressive multimodal image model that unifies image generation and visual understanding into a single architecture. Unlike diffusion models, it processes text and image tokens in a shared sequence, enabling superior reasoning, editing, and multi-turn creative workflows. Uni-1 outperforms competitors like GPT Image 1.5 and Nano Banana 2 on logic-based benchmarks such as RISEBench.