ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/OpenAI/GPT 4o Image
O

GPT 4o Image

Per Request:$0.04
gpt-4o-image generate images as output, optionally using images as input
New
Commercial Use
Overview
Features
Pricing
API

Technical Specifications of gpt-4o-image

SpecificationDetails
Model IDgpt-4o-image
Model TypeMultimodal image generation model
Input ModalitiesText, image
Output ModalitiesImage
Primary Use CasesText-to-image generation, image-to-image generation, visual editing, creative asset production
Context SupportText prompts with optional image inputs
StreamingNot typically required for image output workflows
Tool / Function CallingNot applicable for core image generation
Response FormatGenerated image output, typically returned through API response payload or referenced asset data
Best ForApplications that need generated images from prompts, optionally guided by input images

What is gpt-4o-image?

gpt-4o-image is a multimodal image generation model exposed through CometAPI that is designed to generate images as output, with support for optional image inputs alongside text prompts. It is well suited for products that need to create visual content from natural language descriptions, transform existing images, or build image-driven creative workflows.

Because it can work from prompt-only input or combine prompt instructions with reference imagery, gpt-4o-image fits a wide range of use cases such as concept art generation, marketing creatives, product mockups, design exploration, and iterative visual editing. Through CometAPI, developers can access gpt-4o-image using a consistent API integration pattern across providers and models.

Main features of gpt-4o-image

  • Text-to-image generation: Create original images from natural language prompts for creative, design, and production workflows.
  • Image-conditioned generation: Use one or more input images to guide composition, style, subject matter, or transformations.
  • Visual iteration: Refine outputs across repeated requests by adjusting prompt details and image references.
  • Creative flexibility: Support a broad range of visual use cases, including illustrations, marketing assets, mockups, and conceptual design.
  • Multimodal prompting: Combine descriptive text with image inputs to achieve more controlled and context-aware results.
  • Developer-friendly access: Integrate gpt-4o-image through CometAPI’s unified model access layer and standardized API workflow.

How to access and integrate gpt-4o-image

Step 1: Sign Up for API Key

Sign up on CometAPI and create an API key from the dashboard. After generating your key, store it securely and use it to authenticate requests to the CometAPI endpoint.

Step 2: Send Requests to gpt-4o-image API

Use CometAPI’s OpenAI-compatible API format and set the model field to gpt-4o-image.

curl --request POST \
  --url https://api.cometapi.com/v1/responses \
  --header "Authorization: Bearer $COMETAPI_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "gpt-4o-image",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Generate a clean modern product poster for a smartwatch on a soft studio background." }
        ]
      }
    ]
  }'

You can also include image inputs in the request when building image-to-image or guided generation workflows, depending on your application’s needs.

Step 3: Retrieve and Verify Results

Read the API response, extract the generated image result from the returned output structure, and verify that the image matches your prompt, formatting expectations, and application requirements before displaying it to end users or storing it in your system.

Features for GPT 4o Image

Explore the key features of GPT 4o Image, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for GPT 4o Image

Explore competitive pricing for GPT 4o Image, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GPT 4o Image can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Per Request:$0.04
Per Request:$0.05
-20%

Sample code and API for GPT 4o Image

Access comprehensive sample code and API resources for GPT 4o Image to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of GPT 4o Image in your projects.
POST
/v1/chat/completions

More Models

G

Nano Banana 2

Input:$0.4/M
Output:$2.4/M
Core Capabilities Overview: Resolution: Up to 4K (4096×4096), on par with Pro. Reference Image Consistency: Up to 14 reference images (10 objects + 4 characters), maintaining style/character consistency. Extreme Aspect Ratios: New 1:4, 4:1, 1:8, 8:1 ratios added, suitable for long images, posters, and banners. Text Rendering: Advanced text generation, suitable for infographics and marketing poster layouts. Search Enhancement: Integrated Google Search + Image Search. Grounding: Built-in thinking process; complex prompts are reasoned before generation.
D

Doubao Seedream 5

Per Request:$0.028
Seedream 5.0 Lite is a unified multimodal image generation model endowed with deep thinking andonline search capabilities, featuring an all-round upgrade in its understanding, reasoning and generationcapabilities.
F

FLUX 2 MAX

Per Request:$0.008
FLUX.2 [max] is a top-tier visual-intelligence model from Black Forest Labs (BFL) designed for production workflows: marketing, product photography, e-commerce, creative pipelines, and any application that requires consistent character/product identity, accurate text rendering, and photoreal detail at multi-megapixel resolutions. The architecture is engineered for strong prompt-following, multi-reference fusion (up to ten input images), and grounded generation (ability to incorporate up-to-date web context when producing images).
X

Black Forest Labs/FLUX 2 MAX

Per Request:$0.056
FLUX.2 [max] is the flagship, highest-quality variant of the FLUX.2 family from Black Forest Labs (BFL). It is positioned as a professional-grade text→image generation and image-editing model that focuses on maximal fidelity, prompt adherence, and editing consistency across characters, objects, lighting and color. BFL and partner registries describe FLUX.2 [max] as the top-tier FLUX.2 variant with features for multi-reference editing, grounded generation.
O

GPT Image 1.5

Input:$6.4/M
Output:$25.6/M
GPT-Image-1.5 is OpenAI’s image model in the GPT Image family . It is a natively multimodal GPT model designed to generate images from text prompts and to perform high-fidelity edits of input images while following user instructions closely.
D

Doubao Seedream 4.5

Per Request:$0.032
Seedream 4.5 is ByteDance/Seed’s multimodal image model (text→image + image editing) that focuses on production-grade image fidelity, stronger prompt adherence, and much-improved editing consistency (subject preservation, text/typography rendering, and facial realism).