ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/OpenAI/GPT-5.1
O

GPT-5.1

Input:$1/M
Output:$8/M
GPT-5.1 is a general-purpose instruction-tuned language model focused on text generation and reasoning across product workflows. It supports multi-turn dialogue, structured output formatting, and code-oriented tasks such as drafting, refactoring, and explanation. Typical uses include chat assistants, retrieval-augmented QA, data transformation, and agent-style automation with tools or APIs when supported. Technical highlights include text-centric modality, instruction following, JSON-style outputs, and compatibility with function calling in common orchestration frameworks.
New
Commercial Use
Playground
Overview
Features
Pricing
API

GPT 5.1 API is what GPT-5.1 Thinking is the advanced reasoning variant of OpenAI’s GPT-5.1 family, it prioritizes adaptive, higher-quality reasoning while giving developers explicit control over the latency / compute trade-off.

Basic features

  • Adaptive reasoning: the model dynamically adjusts thinking depth per request — faster on routine tasks, more persistent on complex ones. This reduces latency and token use for common queries. explicitly allocates more reasoning time for complex prompts, and is more persistent on multi-step problems; can be slower for hard tasks but gives deeper answers.
  • Reasoning modes: none / low / medium / high (GPT-5.1 defaults to none for low-latency cases; choose higher levels for more demanding tasks). The Responses API exposes a reasoning parameter to control this.
  • Default tone & style: written to be clearer on complex topics (less jargon), more explanatory and “patient.”
  • Context window (tokens / long context) Thinking: much larger — 400K token context for paid tiers.

Key technical details

  • Adaptive compute allocation — training and inference design causes the model to expend fewer reasoning tokens on trivial tasks and proportionally more on difficult tasks. This is not a separate “think engine” but a dynamic allocation within the reasoning pipeline.
  • Reasoning parameter in the Responses API — clients pass a reasoning object (for example reasoning: { "effort": "high" }) to request deeper internal reasoning; setting reasoning: { "effort": "none" } effectively disables the extended internal reasoning pass for lower latency. The Responses API also returns reasoning/token metadata (helpful for cost and debugging). )
  • Tools & parallel tool calls — GPT-5.1 improves parallel tool calling and includes named tools (like apply_patch) that reduce failure modes for programmatic edits; parallelization increases end-to-end throughput for tool-heavy workflows.
  • Prompt cache and persistence — prompt_cache_retention='24h' is supported on Responses and Chat Completions endpoints to retain context across multi-turn sessions (reduces repeated token encoding).

Benchmark performance

Latency / token efficiency examples (vendor-provided): on routine queries, OpenAI reports dramatic reductions in tokens/time (example: an npm listing command that took ~10s / ~250 tokens on GPT-5 now takes ~2s / ~50 tokens on GPT-5.1 in their representative test). Third-party early testers (e.g., asset managers, coding firms) reported 2–3× speedups on many tasks and token-efficiency gains in tool-heavy flows.

OpenAI and early partners published representative benchmark claims and measured improvements:

EvaluationGPT‑5.1 (high)GPT‑5 (high)
SWE-bench Verified (all 500 problems)76.3%72.8%
GPQA Diamond (no tools)88.1%85.7%
AIME 2025 (no tools)94.0%94.6%
FrontierMath (with Python tool)26.7%26.3%
MMMU85.4%84.2%
Tau2-bench Airline67.0%62.6%
Tau2-bench Telecom*95.6%96.7%
Tau2-bench Retail77.9%81.1%
BrowseComp Long Context 128k90.0%90.0%

Limitations & safety considerations

  • Hallucination risk persists. Adaptive reasoning helps on complex problems but does not eliminate hallucinations; higher reasoning_effort improves checks but does not guarantee correctness. Always validate high-stakes outputs.
  • Resource and cost tradeoffs: while GPT-5.1 can be far more token-efficient on simple flows, enabling high reasoning effort or long agentic tool-use can increase token consumption and latency. Use prompt caching to mitigate repeated costs where appropriate.
  • Tool safety: apply_patch and shell tools increase automation power (and risk). Production deployments should gate tool execution (review diffs / commands before execution), use least privilege, and ensure robust CI/CD and operational guardrails.

Comparison with other models

  • vs GPT-5: GPT-5.1 improves adaptive reasoning and instruction adherence; OpenAI reports faster response times on easy tasks and better persistence on hard tasks. GPT-5.1 also adds the none reasoning option and extended prompt caching.
  • vs GPT-4.x / 4.1: GPT-5.1 is designed for more agentic, tool-heavy, and coding tasks; OpenAI and partners report gains on coding benchmarks and multi-step reasoning. For many standard conversational tasks, GPT-5.1 Instant may be comparable to earlier GPT-4.x chat models but with improved steerability and personality presets.
  • vs Anthropic / Claude / other LLMs: ChatGPT 5.1′;s MoA architecture gives it a distinct edge in tasks requiring complex, multi-step reasoning. It scored an unprecedented 98.20 on the HELM benchmark for complex reasoning, compared to Claude 4’s 95.60 and Gemini 2.0 Ultra’s 94.80.

Features for GPT-5.1

Explore the key features of GPT-5.1, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for GPT-5.1

Explore competitive pricing for GPT-5.1, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GPT-5.1 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$1/M
Output:$8/M
Input:$1.25/M
Output:$10/M
-20%

Sample code and API for GPT-5.1

GPT 5.1 API is what GPT-5.1 Thinking is the advanced reasoning variant of OpenAI’s GPT-5.1 family, it prioritizes adaptive, higher-quality reasoning while giving developers explicit control over the latency / compute trade-off.
POST
/v1/responses
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)
response = client.responses.create(
    model="gpt-5.1", input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)
response = client.responses.create(
    model="gpt-5.1", input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY;
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const response = await openai.responses.create({
  model: "gpt-5.1",
  input: "Tell me a three sentence bedtime story about a unicorn.",
});

console.log(response);

Curl Code Example

curl https://api.cometapi.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "gpt-5.1",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }'

More Models

A

Claude Opus 4.6

Input:$4/M
Output:$20/M
Claude Opus 4.6 is Anthropic’s “Opus”-class large language model, released February 2026. It is positioned as a workhorse for knowledge-work and research workflows — improving long-context reasoning, multi-step planning, tool use (including agentic software workflows), and computer-use tasks such as automated slide and spreadsheet generation.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT-5.4 nano

Input:$0.16/M
Output:$1/M
GPT-5.4 nano is designed for tasks where speed and cost matter most like classification, data extraction, ranking, and sub-agents.
O

GPT-5.4 mini

Input:$0.6/M
Output:$3.6/M
GPT-5.4 mini brings the strengths of GPT-5.4 to a faster, more efficient model designed for high-volume workloads.
A

Claude Mythos Preview

A

Claude Mythos Preview

Coming soon
Input:$60/M
Output:$240/M
Claude Mythos Preview is our most capable frontier model to date, and shows a striking leap in scores on many evaluation benchmarks compared to our previous frontier model, Claude Opus 4.6.
X

mimo-v2-pro

Input:$0.8/M
Output:$2.4/M
MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.

Related Blog

GPT 5.2 Codex released: Feature, benchmarks and Access
Dec 22, 2025

GPT 5.2 Codex released: Feature, benchmarks and Access

OpenAI released GPT-5.2-Codex, a Codex-optimized version of GPT-5.2 designed specifically for long-horizon, agentic coding tasks, large-scale refactors and migrations, reliable tool use in terminal environments, improved Windows-native behavior, and stronger cybersecurity capabilities. Benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0 place GPT-5.2-Codex at the state-of-the-art among agentic coding models.
How much water does ChatGPT use per day?
Dec 6, 2025
chat-gpt

How much water does ChatGPT use per day?

Short answer: ChatGPT’s global service likely consumes on the order of 2 million to 160 million litres of water each day — a very wide range driven by
What is GPT-5.1 Pro? A professional explainer and status report
Nov 28, 2025
gpt-5-1
gpt-5-1-instant
gpt-5-1-pro

What is GPT-5.1 Pro? A professional explainer and status report

OpenAI’s GPT-5.1 Pro is the latest incremental release in the GPT-5 family: a production-grade model update that refines reasoning, latency/throughput
Claude Opus 4.5: what is it like — and how much will it cost?
Nov 23, 2025
gemini-3-pro
gpt-5-1

Claude Opus 4.5: what is it like — and how much will it cost?

social posts and investigative write-ups have pointed to an upcoming Claude Opus 4.5 (often shortened to “Opus 4.5”) — internally referenced by some sources as Neptune V6 — and to the model being shared with external red-teamers for jailbreak testing. Public details are still fragmentary, so this article collects the available reporting, explains what the leak implies about capability and safety, and gives a grounded estimate of likely pricing and how Opus 4.5 might stack up against Google’s Gemini 3 and OpenAI’s GPT-5.1.
Gemini 3 Pro vs GPT 5.1: which is better? A Complete Comparison
Nov 18, 2025
gemini-3-pro-preview
gpt-5-1

Gemini 3 Pro vs GPT 5.1: which is better? A Complete Comparison

Both OpenAI’s GPT-5.1 and Google’s Gemini 3 Pro represent incremental but meaningful steps in the ongoing arms race for general-purpose, multimodal AI.