O

GPT-5.2 Pro

Context:400,000
Input:$12.00/M
Output:$96.00/M
gpt-5.2-pro is the highest-capability, production-oriented member of OpenAI’s GPT-5.2 family, exposed through the Responses API for workloads that demand maximal fidelity, multi-step reasoning, extensive tool use and the largest context/throughput budgets OpenAI offers.
O

GPT-5.2 Chat

Context:128,000
Input:$1.40/M
Output:$11.20/M
gpt-5.2-chat-latest is the Chat-optimized snapshot of OpenAI’s GPT-5.2 family (branded in ChatGPT as GPT-5.2 Instant). It is the model for interactive/chat use cases that need a blend of speed, long-context handling, multimodal inputs and reliable conversational behaviour.
O

GPT-5.2

Context:400,000
Input:$1.40/M
Output:$11.20/M
GPT-5.2 is a multi-flavored model suite (Instant, Thinking, Pro) engineered for better long-context understanding, stronger coding and tool use, and materially higher performance on professional “knowledge-work” benchmarks.
O

GPT-5.1 Chat

Context:400.0k
Input:$1.00/M
Output:$8.00/M
GPT-5.1 Chat is an instruction-tuned conversational language model for general-purpose chat, reasoning, and writing. It supports multi-turn dialogue, summarization, drafting, knowledge-base QA, and lightweight code assistance for in-app assistants, support automation, and workflow copilots. Technical highlights include chat-optimized alignment, controllable and structured outputs, and integration paths for tool invocation and retrieval workflows when available.
O

GPT-5.1

Input:$1.00/M
Output:$8.00/M
GPT-5.1 is a general-purpose instruction-tuned language model focused on text generation and reasoning across product workflows. It supports multi-turn dialogue, structured output formatting, and code-oriented tasks such as drafting, refactoring, and explanation. Typical uses include chat assistants, retrieval-augmented QA, data transformation, and agent-style automation with tools or APIs when supported. Technical highlights include text-centric modality, instruction following, JSON-style outputs, and compatibility with function calling in common orchestration frameworks.
O

GPT-5 nano

Context:400K
Input:$0.04/M
Output:$0.32/M
GPT-5 Nano is an artificial intelligence model provided by OpenAI.
O

GPT-5 mini

Context:400K
Input:$0.20/M
Output:$1.60/M
GPT-5 mini is OpenAI’s cost- and latency-optimized member of the GPT-5 family, intended to deliver much of GPT-5’s multimodal and instruction-following strengths at substantially lower cost for large-scale production use. It targets environments where throughput, predictable per-token pricing, and fast responses are the primary constraints while still providing strong general-purpose capabilities.
O

GPT 5 Chat

Context:400K
Input:$1.00/M
Output:$8.00/M
GPT-5 Chat (latest) is an artificial intelligence model provided by OpenAI.
O

GPT-5

Context:400K
Input:$1.00/M
Output:$8.00/M
GPT-5 is OpenAI's most powerful coding model to date. It shows significant improvements in complex front-end generation and debugging large codebases. It can transform ideas into reality with intuitive and aesthetically pleasing results, creating beautiful and responsive websites, applications, and games with a keen sense of aesthetics, all from a single prompt. Early testers have also noted its design choices, with a deeper understanding of elements like spacing, typography, and white space.
O

GPT-4.1 nano

Context:1.0M
Input:$0.08/M
Output:$0.32/M
GPT-4.1 nano is an artificial intelligence model provided by OpenAI. gpt-4.1-nano: Features a larger context window—supporting up to 1 million context tokens and capable of better utilizing that context through improved long-context understanding. Has an updated knowledge cutoff time of June 2024. This model supports a maximum context length of 1,047,576 tokens.
O

GPT-4.1

Context:1.0M
Input:$1.60/M
Output:$6.40/M
GPT-4.1 is an artificial intelligence model provided by OpenAI. gpt-4.1-nano: Features a larger context window—supporting up to 1 million context tokens and capable of better utilizing that context through improved long-context understanding. Has an updated knowledge cutoff time of June 2024. This model supports a maximum context length of 1,047,576 tokens.
O

GPT-4o mini

Input:$0.12/M
Output:$0.48/M
GPT-4o mini is an artificial intelligence model provided by OpenAI.
O

Whisper-1

Input:$24.00/M
Output:$24.00/M
Speech to text, creating translations
O

TTS

Input:$12.00/M
Output:$12.00/M
OpenAI Text-to-Speech
O

Sora 2 Pro

Per Second:$0.24
Sora 2 Pro is our most advanced and powerful media generation model, capable of generating videos with synchronized Audio. It can create detailed, dynamic video clips from natural language or images.
O

Sora 2

Per Second:$0.08
Super powerful video generation model, with sound effects, supports chat format.
O

GPT Image 1 mini

Input:$2.00/M
Output:$6.40/M
Cost-optimized version of GPT Image 1. It is a native Multimodal language model that accepts both text and image input and generates image output.
O

GPT 4.1 mini

Context:1.0M
Input:$0.32/M
Output:$1.28/M
GPT-4.1 mini is an artificial intelligence model provided by OpenAI. gpt-4.1-mini: A significant leap in small model performance, even beating GPT-4o in many benchmarks. It meets or exceeds GPT-4o in intelligence evaluation while reducing latency by nearly half and cost by 83%. This model supports a maximum context length of 1,047,576 tokens.
O

o4-mini-deep-research

Context:200K
Input:$1.60/M
Output:$6.40/M
O4-Mini-Deep-Research is OpenAI’s latest agentic reasoning model, combining the lightweight o4-mini backbone with the advanced Deep Research framework. Designed to deliver fast, cost-efficient deep information synthesis, it enables developers and researchers to perform automated web searches, data analysis, and chain-of-thought reasoning within a single API call.
O

o4-mini

Input:$0.88/M
Output:$3.52/M
O4-mini is an artificial intelligence model provided by OpenAI.
O

O3 Pro

Context:200K
Input:$16.00/M
Output:$64.00/M
OpenAI o3‑pro is a “pro” variant of the o3 reasoning model engineered to think longer and deliver the most dependable responses by employing private chain‑of‑thought reinforcement learning and setting new state‑of‑the‑art benchmarks across domains like science, programming, and business—while autonomously integrating tools such as web search, file analysis, Python execution, and visual reasoning within API.
O

o3-mini

Input:$0.88/M
Output:$3.52/M
O3-mini is an artificial intelligence model provided by OpenAI.
O

o3-deep-research

Input:$8.00/M
Output:$32.00/M
A networked deep research agent based on the O3 model, supporting multi-step Inference and citation analysis reports.
O

o3

Input:$1.60/M
Output:$6.40/M
O3 is an artificial intelligence model provided by OpenAI.
O

GPT-4o mini Audio

Input:$0.12/M
Output:$0.48/M
GPT-4o mini Audio is a multimodal model for speech and text interactions. It performs speech recognition, translation, and text-to-speech, follows instructions, and can call tools for structured actions with streaming responses. Typical uses include real-time voice assistants, live captioning and translation, call summarization, and voice-controlled applications. Technical highlights include audio input and output, streaming responses, function calling, and structured JSON output.
O

codex-mini-latest

Input:$1.20/M
Output:$4.80/M
Codex Mini is an artificial intelligence model provided by OpenAI. It is OpenAI's latest achievement in code generation, a lightweight model specifically optimized for the Codex command-line interface (CLI). As a fine-tuned version of o4-mini, this model inherits the base model's high efficiency and response speed while being specially optimized for code understanding and generation.
O

GPT-4o mini TTS

Input:$9.60/M
Output:$38.40/M
GPT-4o mini TTS is a neural text-to-speech model designed for natural, low-latency voice generation in user-facing applications. It converts text to natural-sounding speech with selectable voices, multi-format output, and streaming synthesis for responsive experiences. Typical uses include voice assistants, IVR and contact flows, product read-aloud, and media narration. Technical highlights include API-based streaming and export to common audio formats such as MP3 and WAV.
O

GPT-4o Realtime

Input:$60.00/M
Output:$240.00/M
The Realtime API allows developers to build low-latency, Multimodal experiences, including speech-to-speech functionality. Text and Audio processed by the Realtime API are priced separately. This model supports a maximum context length of 128,000 tokens.
O

GPT-4o Search

Input:$60.00/M
Output:$60.00/M
GPT-4o Search is a GPT-4o-based multimodal model configured for search-augmented reasoning and grounded, current answers. It follows instructions and uses web search tools to retrieve, evaluate, and synthesize external information, with source context when available. Typical uses include research assistance, fact-checking, news and trend monitoring, and answering time-sensitive queries. Technical highlights include tool/function calling for browsing and retrieval, long-context handling, and structured outputs suitable for citations and links.
O

ChatGPT-4o

Input:$4.00/M
Output:$12.00/M
Based on the latest iteration of GPT-4o, a Multimodal large language model (LLM) that supports text, image, Audio, and video input/output.
O

tts-1-hd-1106

Input:$24.00/M
Output:$24.00/M
O

tts-1-hd

Input:$24.00/M
Output:$24.00/M
O

tts-1-1106

Input:$12.00/M
Output:$12.00/M
O

tts-1

Input:$12.00/M
Output:$12.00/M
O

text-embedding-ada-002

Input:$0.08/M
Output:$0.08/M
An Ada-based text embedding model optimized for various NLP tasks.
O

text-embedding-3-small

Input:$0.02/M
Output:$0.02/M
A small text embedding model for efficient processing.
O

text-embedding-3-large

Input:$0.10/M
Output:$0.10/M
A large text embedding model for a wide range of natural language processing tasks.
O

omni-moderation-latest

Per Request:$0.00
O

omni-moderation-2024-09-26

Per Request:$0.00
O

o1-pro-all

Input:$120.00/M
Output:$480.00/M
O

o1-pro-2025-03-19

Input:$120.00/M
Output:$480.00/M
O

o1-pro

Input:$120.00/M
Output:$480.00/M
O1-pro is an artificial intelligence model provided by OpenAI.
O

o1-preview-all

Per Request:$0.16
O

o1-preview-2024-09-12

Input:$12.00/M
Output:$48.00/M
O

o1-preview

Input:$12.00/M
Output:$48.00/M
O1-preview is an artificial intelligence model provided by OpenAI.
O

o1-mini-all

Per Request:$0.08
O

o1-mini-2024-09-12

Input:$0.88/M
Output:$3.52/M
O

o1-mini

Input:$0.88/M
Output:$3.52/M
O1-mini is an artificial intelligence model provided by OpenAI.
O

o1-all

Per Request:$0.16
O

o1-2024-12-17

Input:$12.00/M
Output:$48.00/M
O

o1

Input:$12.00/M
Output:$48.00/M
O1 is an artificial intelligence model provided by OpenAI.
O

gpt-realtime-mini

Input:$0.48/M
Output:$0.96/M
An economical version of the real-time GPT—capable of responding to Audio and text input in real-time via WebRTC, WebSocket, or SIP connections.
C

gpt-oss-20b

Input:$0.08/M
Output:$0.32/M
gpt-oss-20b is an artificial intelligence model provided by cloudflare-workers-ai.
C

gpt-oss-120b

Input:$0.16/M
Output:$0.80/M
gpt-oss-120b is an artificial intelligence model provided by cloudflare-workers-ai.
O

gpt-image-1

Input:$8.00/M
Output:$32.00/M
An advanced AI model for generating images from text descriptions.
O

gpt-4o-all

Input:$2.00/M
Output:$8.00/M
<div>GPT-4o is OpenAI's most advanced Multimodal model, faster and cheaper than GPT-4 Turbo, with stronger visual capabilities. This model has a 128K context and a knowledge cutoff of October 2023. Models in the 1106 series and above support tool_calls and function_call.</div> This model supports a maximum context length of 128,000 tokens.
O

gpt-4-vision-preview

Input:$2.00/M
Output:$8.00/M
This model supports a maximum context length of 128,000 tokens.
O

gpt-4-vision

Input:$8.00/M
Output:$24.00/M
This model supports a maximum context length of 128,000 tokens.
O

gpt-4-v

Per Request:$0.04
O

gpt-4-turbo-preview

Input:$8.00/M
Output:$24.00/M
<div>gpt-4-turbo-preview Upgraded version, stronger code generation capabilities, reduced model "laziness", fixed non-English UTF-8 generation issues.</div> This model supports a maximum context length of 128,000 tokens.
O

gpt-4-turbo-2024-04-09

Input:$8.00/M
Output:$24.00/M
<div>gpt-4-turbo-2024-04-09 Upgraded version, stronger code generation capabilities, reduced model "laziness", fixed non-English UTF-8 generation issues.</div> This model supports a maximum context length of 128,000 tokens.
O

gpt-4-turbo

Input:$8.00/M
Output:$24.00/M
GPT-4 Turbo is an artificial intelligence model provided by OpenAI.
O

gpt-4-search

Per Request:$0.04
O

gpt-4-gizmo-*

Input:$24.00/M
Output:$48.00/M
O

gpt-4-gizmo

Input:$24.00/M
Output:$48.00/M
O

gpt-4-dalle

Per Request:$0.04
O

gpt-4-all

Input:$24.00/M
Output:$48.00/M
A

gpt-4-32k

Input:$48.00/M
Output:$96.00/M
GPT-4 32K is an artificial intelligence model provided by Azure.
O

gpt-4-1106-preview

Input:$8.00/M
Output:$16.00/M
O

gpt-4-0613

Input:$24.00/M
Output:$48.00/M
O

gpt-4-0314

Input:$24.00/M
Output:$48.00/M
O

gpt-4-0125-preview

Input:$8.00/M
Output:$16.00/M
O

gpt-4

Input:$24.00/M
Output:$48.00/M
GPT-4 is an artificial intelligence model provided by OpenAI.
O

gpt-3.5-turbo-0125

Input:$0.40/M
Output:$1.20/M
GPT-3.5 Turbo 0125 is an artificial intelligence model provided by OpenAI. A pure official high-speed GPT-3.5 series, supporting tools_call. This model supports a maximum context length of 4096 tokens.
O

gpt-3.5-turbo

Input:$0.40/M
Output:$1.20/M
GPT-3.5 Turbo is an artificial intelligence model provided by OpenAI. A pure official high-speed GPT-3.5 series, supporting tools_call. This model supports a maximum context length of 4096 tokens.
O

dall-e-3

Per Request:$0.02
New version of DALL-E for image generation.
O

dall-e-2

Input:$8.00/M
Output:$32.00/M
An AI model that generates images from text descriptions.
C

Claude Sonnet 4.5

Context:200K
Input:$2.40/M
Output:$12.00/M
Claude Sonnet 4.5 achieves a significant leap in computer application capabilities. On OSWorld, a benchmark platform for testing AI models on real-world computer tasks, Sonnet 4.5 jumped to the top with 61.4%, whereas just four months prior, Sonnet 4 led with 42.2%. Our Claude for Chrome extension puts these upgraded features into practice.
A

Claude Opus 4.5

Context:200K
Input:$4.00/M
Output:$20.00/M
Claude Opus 4.5 is an instruction-tuned large language model from Anthropic designed for complex reasoning, coding, and multi-turn dialogue. It supports extended context handling, tool/function calling, structured outputs, and integration with retrieval-augmented workflows. Typical uses include analytical assistants, code generation and review, knowledge-base QA, and content drafting with policy-aligned responses. Technical highlights include instruction following, RAG-friendly behavior, and safety controls available in Claude deployments.
C

Claude Opus 4.1

Context:200K
Input:$12.00/M
Output:$60.00/M
Claude Opus 4.1 is an updated version of Anthropic's flagship model, offering improved performance in coding, Inference, and agent tasks. It achieves 74.5% on SWE-bench Verified, showing significant improvements in multi-file code refactoring, debugging accuracy, and detail-oriented Inference. This model supports extended reasoning up to 64K tokens and is optimized for tasks involving research, data analysis, and Tool-assisted Inference.
C

Claude 4 Sonnet

Context:200K
Input:$2.40/M
Output:$12.00/M
Fastest, most cost-effective model, 200K context window.
C

Claude Opus 4

Context:200K
Input:$12.00/M
Output:$60.00/M
The optimal balance of intelligence, cost, and speed. 200K context window.
C

Claude 3.7 Sonnet

Input:$2.40/M
Output:$12.00/M
Claude's big move against R1, the powerful 3.7 is officially online. This model supports a maximum context length of 200,000 tokens. With thinking support.
C

Claude Haiku 4.5

Context:200K
Input:$0.80/M
Output:$4.00/M
Fastest, most cost-effective model.
C

Claude 3.5 Haiku

Input:$0.80/M
Output:$4.00/M
These aliases automatically point to the latest snapshot of a given model. While useful for experimentation, we recommend using specific model versions (e.g., claude-3-5-sonnet-20241022) in production applications to ensure consistent behavior. When we release new model snapshots, we will migrate the -latest alias to point to the new version (typically within one week of the new version's release). The -latest alias has the same rate limits and pricing as the underlying model version it refers to. This model supports a maximum context length of 200,000 tokens.
C

Claude 3 Haiku

Input:$0.20/M
Output:$1.00/M
Claude Haiku 3 is an artificial intelligence model provided by Anthropic.
G

Veo 3.1 Pro

Per Request:$2.00
Veo 3.1-Pro refers to the high-capability access/configuration of Google’s Veo 3.1 family — a generation of short-form, audio-enabled video models that add richer native audio, improved narrative/editing controls and scene-extension tools.
G

Veo 3.1

Per Request:$0.40
Veo 3.1 is Google’s incremental-but-significant update to its Veo text-and-image→video family, adding richer native audio, longer and more controllable video outputs, and finer editing and scene-level controls.
G

Veo 3 Pro

Per Request:$2.00
Veo 3 pro denotes the production-grade Veo 3 video model experience (high fidelity, native audio, and extended tooling)
G

Veo 3 Fast

Per Request:$0.40
Veo 3 Fast is Google’s speed-optimized variant of the Veo family of generative video models (Veo 3 / Veo 3.1 etc.). It is engineered to produce short, high-quality video clips with natively generated audio while prioritizing throughput and cost per second—trading some top-end visual fidelity and/or longer single-shot duration for much faster generation and lower price. What is Veo 3 Fast — concise introduction
G

Veo 3

Per Request:$0.40
Google DeepMind’s Veo 3 represents the cutting edge of text-to-video generation, marking the first time a large-scale generative AI model seamlessly synchronizes high-fidelity video with accompanying audio—including dialogue, sound effects, and ambient soundscapes.
G

Gemini 2.5 Pro

Context:1M
Input:$1.00/M
Output:$8.00/M
Gemini 2.5 Pro is an artificial intelligence model provided by Google. It has native Multimodal processing capabilities and an ultra-long context window of up to 1 million tokens, providing unprecedented powerful support for complex, long-sequence tasks. According to Google's data, Gemini 2.5 Pro performs particularly well in complex tasks. This model supports a maximum context length of 1,048,576 tokens.
G

Gemini 2.5 Flash

Context:1M
Input:$0.24/M
Output:$2.00/M
Gemini 2.5 Flash is an AI model developed by Google, designed to provide fast and cost-effective solutions for developers, especially for applications requiring enhanced Inference capabilities. According to the Gemini 2.5 Flash preview announcement, the model was released in preview on April 17, 2025, supports Multimodal input, and has a context window of 1 million tokens. This model supports a maximum context length of 65,536 tokens.
G

Nano Banana

Per Request:$0.03
Gemini 2.5 Flash Image (aka nano-banana), Google's most advanced image generation and editing model. This update enables you to blend multiple images into a single one, maintain character consistency to tell rich stories, perform targeted transformations using natural language, and leverage Gemini's world knowledge to generate and edit images.
G

Gemini 2.5 Flash Lite

Context:1M
Input:$0.08/M
Output:$0.32/M
An optimized Gemini 2.5 Flash model for high cost-effectiveness and high throughput. The smallest, most cost-effective model, built for large-scale use.
G

Gemini 2.5 Pro DeepSearch

Input:$8.00/M
Output:$64.00/M
Deep search model, with enhanced deep search and information retrieval capabilities, an ideal choice for complex knowledge integration and analysis.
G

Gemini 2.5 Pro (All)

Input:$2.00/M
Output:$16.00/M
Gemini 2.5 Pro (All) is a multimodal model for text and media understanding, designed for general-purpose assistants and grounded reasoning. It handles instruction following, analytical writing, code comprehension, and image/audio understanding with reliable tool/function calling and RAG-friendly behavior. Typical uses include enterprise chat agents, document and UI analysis, visual question answering, and workflow automation. Technical highlights include unified image‑text‑audio inputs, long-context support, structured JSON output, streaming responses, and system-instruction control.
G

Gemini 2.5 Flash DeepSearch

Input:$4.80/M
Output:$38.40/M
Deep search model, with enhanced deep search and information retrieval capabilities, an ideal choice for complex knowledge integration and analysis.
G

Gemini 2.5 Flash (All)

Input:$0.24/M
Output:$2.00/M
Gemini 2.5 Flash is an AI model developed by Google, designed to provide fast and cost-effective solutions for developers, especially for applications requiring enhanced Inference capabilities. According to the Gemini 2.5 Flash preview announcement, the model was released in preview on April 17, 2025, supports Multimodal input, and has a context window of 1 million tokens. This model supports a maximum context length of 65,536 tokens.
G

Gemini 2.0 Flash Lite

Input:$0.08/M
Output:$0.32/M
Gemini 2.0 Flash Lite is a compact, instruction-tuned multimodal model optimized for low-latency, high-throughput inference. It handles text and image understanding, summarization, classification, and lightweight reasoning, with tool/function calling and structured output control. Typical uses include conversational agents, rapid content drafting, metadata extraction from documents or screenshots, and retrieval-augmented workflows. Technical highlights include text-image inputs, streaming generation, function/tool calling, and deployment options suited to latency-sensitive services.
G

Gemini 2.0 Flash

Input:$0.08/M
Output:$0.32/M
Gemini 2.0 Flash is an artificial intelligence model provided by Google-Vertex.
G

Nano Banana Pro

Per Request:$0.19
Nano Banana Pro is an AI model for general-purpose assistance in text-centric workflows. It is suitable for instruction-style prompting to generate, transform, and analyze content with controllable structure. Typical uses include chat assistants, document summarization, knowledge QA, and workflow automation. Public technical details are limited; integration aligns with common AI assistant patterns such as structured outputs, retrieval-augmented prompts, and tool or function calling.
G

Gemini 3 Pro Preview

Context:200.0k
Input:$1.60/M
Output:$9.60/M
Gemini 3 Pro Preview is a general-purpose model in the Gemini family, available in preview for evaluation and prototyping. It supports instruction following, multi-turn reasoning, and code and data tasks, with structured outputs and tool/function calling for workflow automation. Typical uses include chat assistants, summarization and rewriting, retrieval-augmented QA, data extraction, and lightweight coding help across apps and services. Technical highlights include API-based deployment, streaming responses, safety controls, and integration readiness, with multimodal capabilities depending on preview configuration.
X

Grok Code Fast 1

Context:256K
Input:$0.16/M
Output:$1.20/M
Grok Code Fast 1 is an AI programming model launched by xAI, designed for fast and efficient basic coding tasks. The model can process 92 tokens per second, has a 256k context window, and is suitable for rapid prototyping, code debugging, and generating simple visual elements.
X

Grok 4 Fast

Context:2M
Input:$0.16/M
Output:$0.40/M
Grok 4 Fast is a new artificial intelligence model launched by xAI, integrating Inference and non-Inference capabilities into a single architecture. This model has a 2 million token context window and is designed for high-throughput applications such as search and coding. The model offers two versions: Grok-4-Fast-Reasoning and Grok-4-Fast-Non-Reasoning, optimized for different tasks.
X

Grok 4.1 Fast

Context:2M
Input:$0.16/M
Output:$0.40/M
Grok 4.1 Fast is xAI’s production-focused large model, optimized for agentic tool-calling, long-context workflows, and low-latency inference. It’s a multimodal, two-variant family designed to run autonomous agents that search, execute code, call services, and reason over extremely large contexts (up to 2 million tokens).
X

Grok 4

Context:256K
Input:$2.40/M
Output:$12.00/M
Grok 4 is an artificial intelligence model provided by XAI. Currently supports text modality, with vision, image generation, and other features coming soon. Possesses extremely powerful technical parameters and ecosystem capabilities: Context Window: Supports context processing of up to 256,000 tokens, leading mainstream models.
X

Grok 3 Reasoner

Input:$2.40/M
Output:$12.00/M
Grok-3 reasoning model, with chain-of-thought, Elon Musk's competitor to R1. This model supports a maximum context length of 100,000 tokens.
X

Grok 3 Mini

Input:$0.24/M
Output:$0.40/M
A lightweight model that thinks before responding. Fast, smart, and ideal for logic-based tasks that don't require deep domain knowledge. Raw thought traces are accessible. This model supports a maximum context length of 100,000 tokens.
X

Grok 3 DeepSearch

Input:$2.40/M
Output:$12.00/M
Grok-3 deep networked search model. This model supports a maximum context length of 100,000 tokens.
X

Grok 3 DeeperSearch

Input:$2.40/M
Output:$12.00/M
Grok-3 deep networked search model, superior to grok-3-deepsearch. This model supports a maximum context length of 100,000 tokens.
X

Grok 3

Input:$2.40/M
Output:$12.00/M
Grok-3 is the latest artificial intelligence chatbot model released by Elon Musk's xAI company on February 17, 2025. Its training cluster has reached the 200,000 card level, performing excellently in tasks such as mathematics, science, and programming, and is hailed by Musk as "the smartest AI on Earth." This model supports a maximum context length of 100,000 tokens.
X

Grok 2

Input:$0.80/M
Output:$0.80/M
Grok 2 is an artificial intelligence model provided by XAI.
D

DeepSeek-V3.2

Context:128K
Input:$0.22/M
Output:$0.35/M
DeepSeek v3.2 is the latest production release in the DeepSeek V3 family: a large, reasoning-first open-weight language model family designed for long-context understanding, robust agent/tool use, advanced reasoning, coding and math.
D

DeepSeek-V3

Input:$0.22/M
Output:$0.88/M
The most popular and cost-effective DeepSeek-V3 model. 671B full-blood version. This model supports a maximum context length of 64,000 tokens.
D

DeepSeek-V3.1

Input:$0.44/M
Output:$1.32/M
DeepSeek V3.1 is the upgrade in DeepSeek’s V-series: a hybrid “thinking / non-thinking” large language model aimed at high-throughput, low-cost general intelligence and agentic tool use. It keeps OpenAI-style API compatibility, adds smarter tool-calling, and—per the company—lands faster generation and improved agent reliability.
D

DeepSeek-R1T2-Chimera

Input:$0.24/M
Output:$0.24/M
A 671B parameter Mixture of Experts text generation model, merged from DeepSeek-AI's R1-0528, R1, and V3-0324, supporting up to 60k tokens of context.
D

DeepSeek-Reasoner

Input:$0.44/M
Output:$1.75/M
DeepSeek-Reasoner is DeepSeek’s reasoning-first family of LLMs and API endpoints designed to (1) expose internal chain-of-thought (CoT) reasoning to callers and (2) operate in “thinking” modes tuned for multi-step planning, math, coding and agent/tool use.
D

DeepSeek-OCR

Per Request:$0.04
DeepSeek-OCR is an optical character recognition model for extracting text from images and documents. It processes scanned pages, photos, and UI screenshots to produce transcriptions with layout cues such as line breaks. Common uses include document digitization, invoice and receipt intake, search indexing, and enabling RPA pipelines. Technical highlights include image-to-text processing, support for scanned and photographed content, and structured text output for downstream parsing.
D

DeepSeek-Chat

Context:64K
Input:$0.22/M
Output:$0.88/M
The most popular and cost-effective DeepSeek-V3 model. 671B full-blood version. This model supports a maximum context length of 64,000 tokens.
Q

Qwen Image

Per Request:$0.03
Qwen-Image is a revolutionary image generation foundational model released by Alibaba's Tongyi Qianwen team in 2025. With a parameter scale of 20 billion, it is based on the MMDiT (Multimodal Diffusion Transformer) architecture. The model has achieved significant breakthroughs in complex text rendering and precise image editing, demonstrating exceptional performance particularly in Chinese text rendering. Translated with DeepL.com (free version)
M

Kimi-K2

Input:$0.45/M
Output:$1.79/M
- **kimi-k2-250905**: Moonshot AI's Kimi K2 series 0905 version, supporting ultra-long context (up to 256k tokens, frontend and Tool calls). - 🧠 Enhanced Tool Calling: 100% accuracy, seamless integration, suitable for complex tasks and integration optimization. - ⚡️ More Efficient Performance: TPS up to 60-100 (standard API), up to 600-100 in Turbo mode, providing faster response and improved Inference capabilities, knowledge cutoff up to mid-2025.
Q

qwen3-max-preview

Input:$0.24/M
Output:$2.42/M
- **qwen3-max-preview**: Alibaba Tongyi Qianwen team's latest Qwen3-Max-Preview model, positioned as the series' performance peak. - 🧠 Powerful Multimodal and Inference: Supports ultra-long context (up to 128k tokens) and Multimodal input, excels at complex Inference, code generation, translation, and creative content. - ⚡️ Breakthrough Improvement: Significantly optimized across multiple technical indicators, faster response speed, knowledge cutoff up to 2025, suitable for enterprise-level high-precision AI applications.
Q

qwen3-coder-plus-2025-07-22

Input:$0.24/M
Output:$0.97/M
Qwen3 Coder Plus stable version, released on July 22, 2025, provides higher stability, suitable for production deployment.
Q

qwen3-coder-plus

Input:$0.24/M
Output:$0.97/M
Q

qwen3-coder-480b-a35b-instruct

Input:$0.24/M
Output:$0.97/M
Q

qwen3-coder

Input:$0.24/M
Output:$0.97/M
Q

qwen3-8b

Input:$0.04/M
Output:$0.16/M
Q

qwen3-32b

Input:$1.60/M
Output:$6.40/M
Q

qwen3-30b-a3b

Input:$0.12/M
Output:$0.48/M
Has 3 billion parameters, balancing performance and resource requirements, suitable for enterprise-level applications. - This model may employ MoE or other optimized architectures, suitable for scenarios requiring efficient processing of complex tasks, such as intelligent customer service and content generation.
Q

qwen3-235b-a22b

Input:$0.22/M
Output:$2.22/M
Qwen3-235B-A22B is the flagship model of the Qwen3 series, with 23.5 billion parameters, using a Mixture of Experts (MoE) architecture. - Particularly suitable for complex tasks requiring high-performance Inference, such as coding, mathematics, and Multimodal applications.
Q

qwen3-14b

Input:$0.80/M
Output:$3.20/M
Q

qwen2.5-vl-72b-instruct

Input:$2.40/M
Output:$7.20/M
Q

qwen2.5-vl-72b

Input:$2.40/M
Output:$7.20/M
Q

qwen2.5-vl-32b-instruct

Input:$2.40/M
Output:$7.20/M
Q

qwen2.5-omni-7b

Input:$60.00/M
Output:$60.00/M
Q

qwen2.5-math-72b-instruct

Input:$3.20/M
Output:$3.20/M
Q

qwen2.5-coder-7b-instruct

Input:$0.80/M
Output:$0.80/M
Q

qwen2.5-coder-32b-instruct

Input:$0.80/M
Output:$0.80/M
Q

qwen2.5-7b-instruct

Input:$0.80/M
Output:$0.80/M
Q

qwen2.5-72b-instruct

Input:$3.20/M
Output:$3.20/M
Q

qwen2.5-32b-instruct

Input:$0.96/M
Output:$0.96/M
Q

qwen2.5-14b-instruct

Input:$3.20/M
Output:$3.20/M
Q

qwen2-vl-7b-instruct

Input:$1.60/M
Output:$1.60/M
Q

qwen2-vl-72b-instruct

Input:$1.60/M
Output:$1.60/M
Q

qwen2-7b-instruct

Input:$0.16/M
Output:$0.16/M
Q

qwen2-72b-instruct

Input:$8.00/M
Output:$8.00/M
Q

qwen2-57b-a14b-instruct

Input:$3.20/M
Output:$3.20/M
Q

qwen2-1.5b-instruct

Input:$0.16/M
Output:$0.16/M
Q

qwen1.5-7b-chat

Input:$0.16/M
Output:$0.16/M
Q

Qwen2.5-72B-Instruct-128K

Input:$3.20/M
Output:$3.20/M
M

mj_turbo_zoom

Per Request:$0.17
M

mj_turbo_variation

Per Request:$0.17
M

mj_turbo_upscale_subtle

Per Request:$0.17
M

mj_turbo_upscale_creative

Per Request:$0.17
M

mj_turbo_upscale

Per Request:$0.02
M

mj_turbo_upload

Per Request:$0.01
M

mj_turbo_shorten

Per Request:$0.17
M

mj_turbo_reroll

Per Request:$0.17
M

mj_turbo_prompt_analyzer_extended

Per Request:$0.00
M

mj_turbo_prompt_analyzer

Per Request:$0.00
M

mj_turbo_pic_reader

Per Request:$0.00
M

mj_turbo_pan

Per Request:$0.17
M

mj_turbo_modal

Per Request:$0.17
Submit the content in the modal popup, used for partial redrawing and Zoom functionality.
M

mj_turbo_low_variation

Per Request:$0.17
M

mj_turbo_inpaint

Per Request:$0.08
M

mj_turbo_imagine

Per Request:$0.17
M

mj_turbo_high_variation

Per Request:$0.17
M

mj_turbo_describe

Per Request:$0.00
M

mj_turbo_custom_zoom

Per Request:$0.00
M

mj_turbo_blend

Per Request:$0.17
M

mj_fast_zoom

Per Request:$0.06
M

mj_fast_video

Per Request:$0.60
Midjourney video generation
M

mj_fast_variation

Per Request:$0.06
M

mj_fast_upscale_subtle

Per Request:$0.06
M

mj_fast_upscale_creative

Per Request:$0.06
M

mj_fast_upscale

Per Request:$0.01
M

mj_fast_upload

Per Request:$0.01
M

mj_fast_shorten

Per Request:$0.06
M

mj_fast_reroll

Per Request:$0.06
M

mj_fast_prompt_analyzer_extended

Per Request:$0.00
M

mj_fast_prompt_analyzer

Per Request:$0.00
M

mj_fast_pic_reader

Per Request:$0.00
M

mj_fast_pan

Per Request:$0.06
M

mj_fast_modal

Per Request:$0.06
M

mj_fast_low_variation

Per Request:$0.06
M

mj_fast_inpaint

Per Request:$0.06
M

mj_fast_imagine

Per Request:$0.06
Midjourney drawing
M

mj_fast_high_variation

Per Request:$0.06
M

mj_fast_edits

Per Request:$0.06
M

mj_fast_describe

Per Request:$0.00
M

mj_fast_custom_zoom

Per Request:$0.00
M

mj_fast_blend

Per Request:$0.06
S

suno_uploads

Per Request:$0.02
Upload music
S

suno_persona_create

Per Request:$0.01
Create a personal style
S

suno_music

Per Request:$0.14
Generate music
S

suno_lyrics

Per Request:$0.02
Generate lyrics
S

suno_concat

Per Request:$0.04
Song splicing
S

suno_act_wav

Per Request:$0.01
Get WAV format files
S

suno_act_timing

Per Request:$0.01
Timing: Lyrics, Audio timeline
S

suno_act_stems

Per Request:$0.01
S

suno_act_mp4

Per Request:$0.01
Generate MP4 MV
K

kling_virtual_try_on

Per Request:$0.20
K

kling_video

Per Request:$0.40
K

kling_tts

Per Request:$0.02
[Speech Synthesis] Newly launched: text-to-broadcast audio online, with preview function ● Can simultaneously generate audio_id, usable with any Keling API.
K

kling_multi_image2image

Per Request:$0.32
K

kling_multi_elements_submit

Per Request:$0.40
K

kling_multi_elements_preview

Per Request:$0.00
K

kling_multi_elements_init

Per Request:$0.00
K

kling_multi_elements_delete

Per Request:$0.00
K

kling_multi_elements_clear

Per Request:$0.00
K

kling_multi_elements_add

Per Request:$0.00
K

kling_lip_sync

Per Request:$0.20
K

kling_image_recognize

Per Request:$0.04
Keling image element recognition API, usable for multi-image reference video generation, Multimodal video editing features ● Can recognize subjects, faces, clothing, etc., and can obtain 4 sets of results (if available) per request.
K

kling_image_expand

Per Request:$0.16
K

kling_image

Per Request:$0.02
K

kling_identify_face

Per Request:$0.02
K

kling_extend

Per Request:$0.40
K

kling_effects

Per Request:$0.40
K

kling_avatar_image2video

Per Request:$0.16
K

kling_audio_video_to_audio

Per Request:$0.20
K

kling_audio_text_to_audio

Per Request:$0.20
K

kling_advanced_lip_syn

Per Request:$0.20
D

Doubao Seedream 4-5

Per Request:$0.04
Seedream 4.5 is ByteDance/Seed’s multimodal image model (text→image + image editing) that focuses on production-grade image fidelity, stronger prompt adherence, and much-improved editing consistency (subject preservation, text/typography rendering, and facial realism).
D

doubao-seedream-4-0-250828

Per Request:$0.02
D

doubao-seedream-3-0-t2i-250415

Per Request:$0.02
D

doubao-seededit-3-0-i2i-250628

Per Request:$0.02
D

doubao-seed-1-6-thinking-250715

Input:$0.04/M
Output:$1.07/M
D

doubao-seed-1-6-flash-250615

Input:$0.04/M
Output:$1.07/M
D

doubao-seed-1-6-250615

Input:$0.04/M
Output:$1.07/M
D

doubao-1.5-vision-pro-250328

Input:$0.33/M
Output:$1.00/M
D

doubao-1.5-vision-lite-250315

Input:$0.17/M
Output:$0.50/M
D

doubao-1.5-pro-32k-250115

Input:$0.18/M
Output:$0.44/M
D

doubao-1.5-pro-256k

Input:$1.10/M
Output:$1.99/M
D

doubao-1-5-vision-pro-32k

Input:$0.33/M
Output:$1.00/M
D

doubao-1-5-thinking-vision-pro-250428

Input:$0.33/M
Output:$1.00/M
D

doubao-1-5-thinking-pro-250415

Input:$0.45/M
Output:$1.79/M
D

doubao-1-5-pro-32k-250115

Input:$0.18/M
Output:$0.44/M
D

doubao-1-5-pro-32k

Input:$0.18/M
Output:$0.44/M
D

doubao-1-5-pro-256k-250115

Input:$0.56/M
Output:$1.00/M
D

doubao-1-5-pro-256k

Input:$1.10/M
Output:$1.99/M
D

doubao-1-5-lite-32k-250115

Input:$0.03/M
Output:$0.07/M
D

Doubao-Seed-1.6-thinking

Input:$0.04/M
Output:$1.07/M
D

Doubao-Seed-1.6-flash

Input:$0.04/M
Output:$1.07/M
D

Doubao-Seed-1.6

Input:$0.04/M
Output:$1.07/M
D

Doubao-1.5-vision-pro-32k

Input:$0.33/M
Output:$1.00/M
D

Doubao-1.5-vision-pro

Input:$0.33/M
Output:$1.00/M
D

Doubao-1.5-vision-lite

Input:$0.17/M
Output:$0.50/M
D

Doubao-1.5-thinking-vision-pro

Input:$0.33/M
Output:$1.00/M
D

Doubao-1.5-thinking-pro

Input:$0.45/M
Output:$1.79/M
D

Doubao-1.5-pro-32k

Input:$0.18/M
Output:$0.44/M
D

Doubao-1.5-lite-32k

Input:$0.07/M
Output:$0.13/M
R

runwayml_video_to_video

Per Request:$0.96
R

runwayml_upscale_video

Per Request:$0.16
R

runwayml_text_to_image

Per Request:$0.32
R

runwayml_image_to_video

Per Request:$0.32
R

runwayml_character_performance

Per Request:$0.40
R

runway_video2video

Per Request:$0.20
R

runway_video

Per Request:$0.20
R

runway_act_one

Per Request:$0.40
R

Black Forest Labs/FLUX 2 PRO

Per Request:$0.06
FLUX 2 PRO is the flagship commercial model in the FLUX 2 series, delivering state-of-the-art image generation with unprecedented quality and detail. Built for professional and enterprise applications, it offers superior prompt adherence, photorealistic outputs, and exceptional artistic capabilities. This model represents the cutting edge of AI image synthesis technology.
R

Black Forest Labs/FLUX 2 FLEX

Per Request:$0.19
FLUX 2 FLEX is the versatile, adaptable model designed for flexible deployment across various use cases and hardware configurations. It offers scalable performance with adjustable quality settings, making it ideal for applications requiring dynamic resource allocation. This model provides the best balance between quality, speed, and resource efficiency.
R

Black Forest Labs/FLUX 2 DEV

Per Request:$0.06
FLUX 2 DEV is the development-friendly version optimized for research, experimentation, and non-commercial applications. It provides developers with powerful image generation capabilities while maintaining a balance between quality and computational efficiency. Perfect for prototyping, academic research, and personal creative projects.
R

stability-ai/stable-diffusion-3.5-medium

Per Request:$0.11
R

stability-ai/stable-diffusion-3.5-large-turbo

Per Request:$0.13
R

stability-ai/stable-diffusion-3.5-large

Per Request:$0.21
R

stability-ai/stable-diffusion-3

Per Request:$0.11
R

stability-ai/stable-diffusion

Per Request:$0.02
R

stability-ai/sdxl

Per Request:$0.03
R

recraft-ai/recraft-v3-svg

Per Request:$0.26
R

recraft-ai/recraft-v3

Per Request:$0.13
R

ideogram-ai/ideogram-v2-turbo

Per Request:$0.06
R

ideogram-ai/ideogram-v2

Per Request:$0.10
R

bria/remove-background

Input:$60.00/M
Output:$60.00/M
R

bria/increase-resolution

Input:$60.00/M
Output:$60.00/M
R

bria/image-3.2

Input:$60.00/M
Output:$60.00/M
R

bria/genfill

Input:$60.00/M
Output:$60.00/M
R

bria/generate-background

Input:$60.00/M
Output:$60.00/M
R

bria/expand-image

Input:$60.00/M
Output:$60.00/M
R

bria/eraser

Input:$60.00/M
Output:$60.00/M
R

black-forest-labs/flux-schnell

Per Request:$0.01
black-forest-labs/flux-schnell is a text-to-image generative model from Black Forest Labs, designed for rapid sampling and iterative prompt exploration. It synthesizes varied styles and compositions from short prompts, supports negative prompting and seed control, and can produce high‑resolution outputs suitable for product mockups, concept art, and marketing visuals. Typical uses include interactive ideation, thumbnail and banner generation, and automated creative variants in content pipelines. Technical highlights include compatibility with the Hugging Face Diffusers stack, flexible resolution control, and an efficient sampler tuned for speed on common GPUs.
R

black-forest-labs/flux-pro

Per Request:$0.18
black-forest-labs/flux-pro is a text-to-image generative model from Black Forest Labs for high-fidelity image synthesis across styles and subjects. It turns detailed prompts into coherent compositions with controllable attributes such as aspect ratio and style via standard generation parameters. Typical uses include concept art, product visualization, marketing creatives, and photorealistic scenes in design workflows. Technical highlights include text-to-image modality, instruction-like prompt following, and integration in common image generation toolchains.
R

black-forest-labs/flux-kontext-pro

Per Request:$0.05
black-forest-labs/flux-kontext-pro is a multimodal diffusion model for context-aware image generation. It synthesizes images from text prompts and optional reference images, preserving composition and style cues for grounded results. Typical uses include brand asset creation, product visuals, and visual ideation using mood boards or example shots. Technical highlights include text and image inputs, reference-conditioned sampling, and reproducible outputs via seed control.
R

black-forest-labs/flux-kontext-max

Per Request:$0.10
black-forest-labs/flux-kontext-max is a context-conditioned image generation model in the FLUX line, built to create images from text with optional reference inputs. It enables grounded synthesis, style or subject preservation, and controlled variations guided by supplied visual context. Typical applications include brand-consistent creatives, product mockups, character continuity, and moodboard-driven ideation. Technical highlights include diffusion-based generation and multimodal conditioning with text and reference inputs suitable for reference-guided workflows.
R

black-forest-labs/flux-dev

Per Request:$0.08
black-forest-labs/flux-dev is an open-weights text-to-image model from Black Forest Labs for generating images from natural language prompts. It produces photorealistic and stylized results from detailed prompts and works with common control options in diffusion toolchains. Typical uses include concept art, product visualization, marketing imagery, and rapid creative exploration in design workflows. Technical highlights include a transformer-based rectified-flow design, integration with the Hugging Face Diffusers library, and deployment via standard GPU inference stacks.
R

black-forest-labs/flux-1.1-pro-ultra

Per Request:$0.19
black-forest-labs/flux-1.1-pro-ultra is a text-to-image diffusion transformer designed for production image synthesis from natural language prompts. It generates detailed outputs from complex instructions with controls for style, composition, aspect ratio, negative prompts, and seed reproducibility. Typical uses include marketing creatives, product visualization, concept art, and content ideation. Technical highlights include transformer-based diffusion, text-encoder guidance, and deployment via common inference APIs with scheduler and guidance parameters.
R

black-forest-labs/flux-1.1-pro

Per Request:$0.13
black-forest-labs/flux-1.1-pro is a text-to-image generation model from Black Forest Labs for controllable, high-fidelity visuals. It interprets detailed prompts to produce compositions across styles and subjects, with support for iterative refinement and image variations in common diffusion workflows. Typical uses include concept art, product mockups, marketing imagery, and scene exploration. Technical highlights include text-conditioned image synthesis and integration with standard inference toolchains used for diffusion models.
F

FLUX 2 PRO

Per Request:$0.08
FLUX 2 PRO is the flagship commercial model in the FLUX 2 series, delivering state-of-the-art image generation with unprecedented quality and detail. Built for professional and enterprise applications, it offers superior prompt adherence, photorealistic outputs, and exceptional artistic capabilities. This model represents the cutting edge of AI image synthesis technology.
F

FLUX 2 FLEX

Per Request:$0.01
FLUX 2 FLEX is the versatile, adaptable model designed for flexible deployment across various use cases and hardware configurations. It offers scalable performance with adjustable quality settings, making it ideal for applications requiring dynamic resource allocation. This model provides the best balance between quality, speed, and resource efficiency.
L

Llama-4-Scout

Input:$0.22/M
Output:$1.15/M
Llama-4-Scout is a general-purpose language model for assistant-style interaction and automation. It handles instruction following, reasoning, summarization, and transformation tasks, and can support light code-related assistance. Typical uses include chat orchestration, knowledge-augmented QA, and structured content generation. Technical highlights include compatibility with tool/function calling patterns, retrieval-augmented prompting, and schema-constrained outputs for integration into product workflows.
L

Llama-4-Maverick

Input:$0.48/M
Output:$1.44/M
Llama-4-Maverick is a general-purpose language model for text understanding and generation. It supports conversational QA, summarization, structured drafting, and basic coding assistance, with options for structured outputs. Common applications include product assistants, knowledge retrieval front-ends, and workflow automation that require consistent formatting. Technical details such as parameter count, context window, modality, and tool or function calling vary by distribution; integrate according to the deployment’s documented capabilities.
M

minimax_video-01

Per Request:$1.44
M

minimax_minimax-hailuo-02

Per Request:$2.88
M

minimax_files_retrieve

Per Request:$0.00
M

minimax-m2

Input:$0.24/M
Output:$0.96/M
minimax-m2 is a compact and efficient large language model, optimized for end-to-end programming and agent workflows, with 10 billion active parameters (230 billion total parameters), performing near state-of-the-art in general Inference, Tool use, and multi-step task execution, while maintaining low latency and high deployment efficiency. The model excels in code generation, multi-file editing, compile-run-fix loops, and defect repair in test verification, achieving excellent results in benchmarks such as SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench, and demonstrating competitiveness in long-cycle task planning, information retrieval, and execution error recovery in agent evaluations like BrowseComp and GAIA. Rated by Artificial Analysis, MiniMax-M2 ranks in the top tier of open-source models in comprehensive intelligence areas such as mathematics, scientific Inference, and instruction following. Its small active parameter count enables fast Inference, high concurrency, and better unit economics, making it ideal for large-scale agent deployment, developer auxiliary Tools, and Inference-driven applications requiring response speed and cost efficiency.
F

flux-pro-finetuned

Per Request:$0.07
F

flux-pro-1.1-ultra-finetuned

Per Request:$0.10
F

flux-pro-1.1-ultra

Per Request:$0.07
F

flux-pro-1.1

Per Request:$0.05
F

flux-pro-1.0-fill-finetuned

Per Request:$0.10
F

flux-pro-1.0-fill

Per Request:$0.06
F

flux-pro-1.0-depth-finetuned

Per Request:$0.10
F

flux-pro-1.0-depth

Per Request:$0.06
F

flux-pro-1.0-canny-finetuned

Per Request:$0.10
F

flux-pro-1.0-canny

Per Request:$0.06
F

flux-pro

Per Request:$0.05
F

flux-kontext-pro

Per Request:$0.05
F

flux-kontext-max

Per Request:$0.10
F

flux-finetune

Per Request:$0.05
F

flux-dev

Per Request:$0.03
H

hunyuan-vision

Input:$2.01/M
Output:$2.01/M
H

hunyuan-turbos-vision-20250619

Input:$0.33/M
Output:$1.00/M
H

hunyuan-turbos-vision

Input:$0.33/M
Output:$1.00/M
H

hunyuan-turbos-longtext-128k-20250325

Input:$0.17/M
Output:$0.67/M
H

hunyuan-turbos-latest

Input:$0.09/M
Output:$0.22/M
H

hunyuan-turbos-20250604

Input:$0.09/M
Output:$0.22/M
H

hunyuan-turbos-20250515

Input:$0.09/M
Output:$0.22/M
H

hunyuan-turbos-20250416

Input:$0.09/M
Output:$0.22/M
H

hunyuan-turbos-20250313

Input:$0.09/M
Output:$0.22/M
H

hunyuan-t1-vision-20250619

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-vision

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-latest

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-20250711

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-20250529

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-20250521

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-20250403

Input:$0.11/M
Output:$0.45/M
H

hunyuan-t1-20250321

Input:$0.11/M
Output:$0.45/M
H

hunyuan-standard-256K

Input:$0.06/M
Output:$0.22/M
H

hunyuan-standard

Input:$0.09/M
Output:$0.22/M
H

hunyuan-role

Input:$0.45/M
Output:$0.89/M
H

hunyuan-pro

Input:$1.60/M
Output:$1.60/M
H

hunyuan-lite

Input:$1.60/M
Output:$1.60/M
H

hunyuan-large-vision

Input:$0.45/M
Output:$1.34/M
H

hunyuan-large

Input:$0.45/M
Output:$1.34/M
H

hunyuan-functioncall

Input:$0.45/M
Output:$0.89/M
H

hunyuan-embedding

Input:$0.08/M
Output:$0.08/M
H

hunyuan-code

Input:$0.39/M
Output:$0.78/M
H

hunyuan-all

Input:$0.11/M
Output:$0.22/M
H

hunyuan-a13b

Input:$0.06/M
Output:$0.22/M
H

hunyuan

Input:$0.11/M
Output:$0.11/M
Z

glm-zero-preview

Input:$60.00/M
Output:$60.00/M
Z

glm-4v-plus

Input:$4.80/M
Output:$4.80/M
Z

glm-4v

Input:$24.00/M
Output:$24.00/M
Z

GLM 4.6

Context:200
Input:$0.64/M
Output:$2.56/M
Zhipu's latest flagship model GLM-4.6 released: total parameters 355B, active parameters 32B. Overall core capabilities surpass GLM-4.5. Coding: Aligns with Claude Sonnet 4, best in China. Context: Extended to 200K (originally 128K). Inference: Improved, supports Tool calls. Search: Optimized Tool and agent framework. Writing: More aligned with human preferences, writing style, and role-playing. Multilingual: Enhanced translation effect.
Z

glm-4.5-x

Input:$3.20/M
Output:$12.80/M
High-performance, strong Inference, extremely fast response model, optimized for scenarios requiring ultra-fast Inference speed and powerful logical capabilities, providing millisecond-level response experience.
Z

glm-4.5-flash

Input:$0.16/M
Output:$0.64/M
GLM-4.5-Flash is an artificial intelligence model provided by ZhipuAI.
Z

glm-4.5-airx

Input:$1.60/M
Output:$6.40/M
Lightweight, high-performance, ultra-fast response model, perfectly combining the cost advantages of Air and the speed advantages of X, an ideal choice for balancing performance and efficiency.
Z

glm-4.5-air

Input:$0.16/M
Output:$1.07/M
GLM-4.5-Air is an artificial intelligence model provided by ZhipuAI.
Z

glm-4.5

Input:$0.48/M
Output:$1.92/M
GLM-4.5 is an artificial intelligence model provided by ZhipuAI.
Z

glm-4-plus

Input:$24.00/M
Output:$24.00/M
Z

glm-4-long

Input:$0.48/M
Output:$0.48/M
Z

glm-4-flash

Input:$0.05/M
Output:$0.05/M
Z

glm-4-airx

Input:$4.80/M
Output:$4.80/M
Z

glm-4-air

Input:$0.48/M
Output:$0.48/M
Z

glm-4-0520

Input:$24.00/M
Output:$24.00/M
Z

glm-4

Input:$24.00/M
Output:$24.00/M
Z

glm-3-turbo

Input:$1.60/M
Output:$1.60/M