ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/OpenAI/GPT-4o Realtime
O

GPT-4o Realtime

Input:$60/M
Output:$240/M
The Realtime API allows developers to build low-latency, Multimodal experiences, including speech-to-speech functionality. Text and Audio processed by the Realtime API are priced separately. This model supports a maximum context length of 128,000 tokens.
Commercial Use
Overview
Features
Pricing
API
Versions

Technical Specifications of gpt-4o-realtime

SpecificationDetails
Model IDgpt-4o-realtime
Model typeRealtime multimodal model
Primary use casesLow-latency multimodal interactions, speech-to-speech experiences, real-time text and audio applications
Context length128,000 tokens
Input modalitiesText, audio
Output modalitiesText, audio
Latency profileOptimized for low-latency realtime experiences
Pricing noteText and audio processed by the Realtime API are priced separately

What is gpt-4o-realtime?

gpt-4o-realtime is a realtime multimodal model available through CometAPI for developers building highly responsive AI applications. It is designed for scenarios where low latency matters, such as live voice assistants, interactive speech-to-speech systems, and applications that need to process text and audio in the same workflow.

This model supports multimodal communication, allowing applications to send text or audio inputs and receive text or audio outputs. With a maximum context length of 128,000 tokens, gpt-4o-realtime can also support longer interactions and more context-aware conversations than smaller-session realtime systems.

Main features of gpt-4o-realtime

  • Low-latency interaction: Built for realtime use cases where fast response times are essential for smooth user experiences.
  • Multimodal input and output: Supports both text and audio workflows, enabling flexible application design.
  • Speech-to-speech support: Well suited for conversational voice interfaces that take spoken input and return spoken output.
  • Large context window: Supports up to 128,000 tokens of context for more coherent extended sessions.
  • Flexible realtime application support: Useful for live assistants, interactive tools, customer support agents, and other responsive multimodal products.
  • Separate text and audio pricing: Developers should account for text and audio usage independently when estimating costs.

How to access and integrate gpt-4o-realtime

Step 1: Sign Up for API Key

To get started, sign up on CometAPI and generate your API key from the dashboard. After that, store the key securely and use it to authenticate every request to the API.

Step 2: Connect to gpt-4o-realtime API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-4o-realtime",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

Features for GPT-4o Realtime

Explore the key features of GPT-4o Realtime, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for GPT-4o Realtime

Explore competitive pricing for GPT-4o Realtime, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GPT-4o Realtime can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$60/M
Output:$240/M
Input:$75/M
Output:$300/M
-20%

Sample code and API for GPT-4o Realtime

Access comprehensive sample code and API resources for GPT-4o Realtime to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of GPT-4o Realtime in your projects.

Versions of GPT-4o Realtime

The reason GPT-4o Realtime has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-12-17
gpt-4o-realtime-preview-2025-06-03
gpt-4o-realtime-preview-2024-10-01

More Models

O

gpt-realtime-1.5

Input:$3.2/M
Output:$12.8/M
The best voice model for audio in, audio out.
O

gpt-audio-1.5

Input:$2/M
Output:$8/M
The best voice model for audio in, audio out with Chat Completions.
O

Whisper-1

Input:$24/M
Output:$24/M
Speech to text, creating translations
O

TTS

Input:$12/M
Output:$12/M
OpenAI Text-to-Speech
K

Kling TTS

Per Request:$0.006608
[Speech Synthesis] Newly launched: text-to-broadcast audio online, with preview function ● Can simultaneously generate audio_id, usable with any Keling API.
K

Kling video-to-audio

K

Kling video-to-audio

Per Request:$0.03304
Kling video-to-audio