模型支持企业博客
500+ AI 模型 API,一次搞定,就在 CometAPI
模型 API
开发者
快速入门文档API 仪表板
资源
AI 模型博客企业更新日志关于
2025 CometAPI。保留所有权利。隐私政策服务条款
Home/Models/OpenAI/GPT-4o Realtime
O

GPT-4o Realtime

输入:$60/M
输出:$240/M
The Realtime API allows developers to build low-latency, Multimodal experiences, including speech-to-speech functionality. Text and Audio processed by the Realtime API are priced separately. This model supports a maximum context length of 128,000 tokens.
商用
概览
功能亮点
定价
API
版本

Technical Specifications of gpt-4o-realtime

SpecificationDetails
Model IDgpt-4o-realtime
Model typeRealtime multimodal model
Primary use casesLow-latency multimodal interactions, speech-to-speech experiences, real-time text and audio applications
Context length128,000 tokens
Input modalitiesText, audio
Output modalitiesText, audio
Latency profileOptimized for low-latency realtime experiences
Pricing noteText and audio processed by the Realtime API are priced separately

What is gpt-4o-realtime?

gpt-4o-realtime is a realtime multimodal model available through CometAPI for developers building highly responsive AI applications. It is designed for scenarios where low latency matters, such as live voice assistants, interactive speech-to-speech systems, and applications that need to process text and audio in the same workflow.

This model supports multimodal communication, allowing applications to send text or audio inputs and receive text or audio outputs. With a maximum context length of 128,000 tokens, gpt-4o-realtime can also support longer interactions and more context-aware conversations than smaller-session realtime systems.

Main features of gpt-4o-realtime

  • Low-latency interaction: Built for realtime use cases where fast response times are essential for smooth user experiences.
  • Multimodal input and output: Supports both text and audio workflows, enabling flexible application design.
  • Speech-to-speech support: Well suited for conversational voice interfaces that take spoken input and return spoken output.
  • Large context window: Supports up to 128,000 tokens of context for more coherent extended sessions.
  • Flexible realtime application support: Useful for live assistants, interactive tools, customer support agents, and other responsive multimodal products.
  • Separate text and audio pricing: Developers should account for text and audio usage independently when estimating costs.

How to access and integrate gpt-4o-realtime

Step 1: Sign Up for API Key

To get started, sign up on CometAPI and generate your API key from the dashboard. After that, store the key securely and use it to authenticate every request to the API.

Step 2: Connect to gpt-4o-realtime API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-4o-realtime",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

GPT-4o Realtime 的功能

了解 GPT-4o Realtime 的核心能力,帮助提升性能与可用性,并改善整体体验。

GPT-4o Realtime 的定价

查看 GPT-4o Realtime 的竞争性定价,满足不同预算与使用需求,灵活方案确保随需求扩展。
Comet 价格 (USD / M Tokens)官方定价 (USD / M Tokens)折扣
输入:$60/M
输出:$240/M
输入:$75/M
输出:$300/M
-20%

GPT-4o Realtime 的示例代码与 API

获取完整示例代码与 API 资源,简化 GPT-4o Realtime 的集成流程,我们提供逐步指导,助你发挥模型潜能。

GPT-4o Realtime 的版本

GPT-4o Realtime 可能存在多个快照,原因包括:更新后保持一致性需要保留旧版、给开发者留出迁移窗口,以及全球/区域端点提供的优化差异。具体差异请参考官方文档。
version
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-12-17
gpt-4o-realtime-preview-2025-06-03
gpt-4o-realtime-preview-2024-10-01

更多模型

O

gpt-realtime-1.5

输入:$3.2/M
输出:$12.8/M
用于音频输入与音频输出的最佳语音模型。
O

gpt-audio-1.5

输入:$2/M
输出:$8/M
用于在 Chat Completions 中实现音频输入与音频输出的最佳语音模型。
O

Whisper-1

输入:$24/M
输出:$24/M
音声をテキストに変換、翻訳を作成
O

TTS

输入:$12/M
输出:$12/M
OpenAI テキスト読み上げ
K

Kling TTS

每次请求:$0.006608
[音声合成] 新登場:テキストから放送用音声をオンラインで生成、プレビュー機能付き ● 同時にaudio_idを生成でき、任意のKeling APIで利用可能。
K

Kling video-to-audio

K

Kling video-to-audio

每次请求:$0.03304
Kling 動画から音声に