ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/OpenAI/gpt-realtime-mini
O

gpt-realtime-mini

Input:$0.48/M
Output:$0.96/M
An economical version of the real-time GPT—capable of responding to Audio and text input in real-time via WebRTC, WebSocket, or SIP connections.
New
Commercial Use
Overview
Features
Pricing
API
Versions

Technical Specifications of gpt-realtime-mini

SpecificationDetails
Model IDgpt-realtime-mini
Model typeRealtime multimodal model
DescriptionAn economical version of the real-time GPT—capable of responding to audio and text inputs in realtime via WebRTC, WebSocket, or SIP connections.
Input modalitiesText, audio, image
Output modalitiesText, audio
Context window32,000 tokens
Max output tokens4,096 tokens
Supported interfacesWebRTC, WebSocket, SIP
Supported featuresFunction calling supported; structured outputs, fine-tuning, distillation, and predicted outputs not supported
Recommended useLow-latency voice agents, realtime multimodal applications, and cost-sensitive interactive experiences

What is gpt-realtime-mini?

gpt-realtime-mini is a cost-efficient realtime model designed for applications that need fast, natural interaction with users through live audio and text. It is intended for low-latency multimodal experiences, allowing developers to build assistants that can listen, respond, and stream output in realtime rather than relying on slower multi-step pipelines.

Compared with larger realtime variants, gpt-realtime-mini is positioned as the economical option for developers who want realtime speech and text capabilities while managing cost and maintaining responsive performance. It works across browser, server, and telephony-style connection patterns through WebRTC, WebSocket, and SIP.

Main features of gpt-realtime-mini

  • Realtime audio and text interaction: Supports low-latency conversations with streaming input and output, making it suitable for live assistants, voice bots, and interactive agents.
  • Cost-efficient deployment: Positioned as an economical version of the realtime model family, making it attractive for high-volume or budget-sensitive applications.
  • Multiple connection methods: Can be integrated through WebRTC for browser clients, WebSocket for server-side systems, and SIP for telephony or VoIP scenarios.
  • Multimodal input support: Accepts text, audio, and image input, enabling richer user interactions and more flexible application design.
  • Speech-capable output: Produces both text and audio output, which is useful for conversational interfaces and spoken response systems.
  • Function calling support: Supports function calling, allowing applications to connect the model to tools, workflows, or backend actions during realtime sessions.
  • Built for voice agents: Well suited for speech-to-speech assistants and realtime customer interaction experiences where interruption handling and fast turn-taking matter.

How to access and integrate gpt-realtime-mini

Step 1: Sign Up for API Key

To get started, sign up on CometAPI and generate your API key from the dashboard. Once you have your key, keep it secure and store it in your environment variables for server-side use.

Step 2: Connect to gpt-realtime-mini API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-realtime-mini",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

Features for gpt-realtime-mini

Explore the key features of gpt-realtime-mini, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for gpt-realtime-mini

Explore competitive pricing for gpt-realtime-mini, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how gpt-realtime-mini can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.48/M
Output:$0.96/M
Input:$0.6/M
Output:$1.2/M
-20%

Sample code and API for gpt-realtime-mini

Access comprehensive sample code and API resources for gpt-realtime-mini to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of gpt-realtime-mini in your projects.

Versions of gpt-realtime-mini

The reason gpt-realtime-mini has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
gpt-realtime-mini

More Models

G

Nano Banana 2

Input:$0.4/M
Output:$2.4/M
Core Capabilities Overview: Resolution: Up to 4K (4096×4096), on par with Pro. Reference Image Consistency: Up to 14 reference images (10 objects + 4 characters), maintaining style/character consistency. Extreme Aspect Ratios: New 1:4, 4:1, 1:8, 8:1 ratios added, suitable for long images, posters, and banners. Text Rendering: Advanced text generation, suitable for infographics and marketing poster layouts. Search Enhancement: Integrated Google Search + Image Search. Grounding: Built-in thinking process; complex prompts are reasoned before generation.
A

Claude Opus 4.6

Input:$4/M
Output:$20/M
Claude Opus 4.6 is Anthropic’s “Opus”-class large language model, released February 2026. It is positioned as a workhorse for knowledge-work and research workflows — improving long-context reasoning, multi-step planning, tool use (including agentic software workflows), and computer-use tasks such as automated slide and spreadsheet generation.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT-5.4 nano

Input:$0.16/M
Output:$1/M
GPT-5.4 nano is designed for tasks where speed and cost matter most like classification, data extraction, ranking, and sub-agents.
O

GPT-5.4 mini

Input:$0.6/M
Output:$3.6/M
GPT-5.4 mini brings the strengths of GPT-5.4 to a faster, more efficient model designed for high-volume workloads.
A

Claude Mythos Preview

A

Claude Mythos Preview

Coming soon
Input:$60/M
Output:$240/M
Claude Mythos Preview is our most capable frontier model to date, and shows a striking leap in scores on many evaluation benchmarks compared to our previous frontier model, Claude Opus 4.6.