模型支援企業部落格
500+ AI 模型 API,全部整合在一個 API 中。就在 CometAPI
模型 API
開發者
快速入門說明文件API 儀表板
資源
AI模型部落格企業更新日誌關於
2025 CometAPI. 保留所有權利。隱私政策服務條款
Home/Models/OpenAI/GPT-4o Realtime
O

GPT-4o Realtime

輸入:$60/M
輸出:$240/M
Realtime API 讓開發者打造低延遲的多模態體驗,包括語音轉語音功能。Realtime API 所處理的文字與音訊將分別計費。此模型支援的最大上下文長度為 128,000 個 token。
商業用途
概覽
功能
定價
API
版本

Technical Specifications of gpt-4o-realtime

SpecificationDetails
Model IDgpt-4o-realtime
Model typeRealtime multimodal model
Primary use casesLow-latency multimodal interactions, speech-to-speech experiences, real-time text and audio applications
Context length128,000 tokens
Input modalitiesText, audio
Output modalitiesText, audio
Latency profileOptimized for low-latency realtime experiences
Pricing noteText and audio processed by the Realtime API are priced separately

What is gpt-4o-realtime?

gpt-4o-realtime is a realtime multimodal model available through CometAPI for developers building highly responsive AI applications. It is designed for scenarios where low latency matters, such as live voice assistants, interactive speech-to-speech systems, and applications that need to process text and audio in the same workflow.

This model supports multimodal communication, allowing applications to send text or audio inputs and receive text or audio outputs. With a maximum context length of 128,000 tokens, gpt-4o-realtime can also support longer interactions and more context-aware conversations than smaller-session realtime systems.

Main features of gpt-4o-realtime

  • Low-latency interaction: Built for realtime use cases where fast response times are essential for smooth user experiences.
  • Multimodal input and output: Supports both text and audio workflows, enabling flexible application design.
  • Speech-to-speech support: Well suited for conversational voice interfaces that take spoken input and return spoken output.
  • Large context window: Supports up to 128,000 tokens of context for more coherent extended sessions.
  • Flexible realtime application support: Useful for live assistants, interactive tools, customer support agents, and other responsive multimodal products.
  • Separate text and audio pricing: Developers should account for text and audio usage independently when estimating costs.

How to access and integrate gpt-4o-realtime

Step 1: Sign Up for API Key

To get started, sign up on CometAPI and generate your API key from the dashboard. After that, store the key securely and use it to authenticate every request to the API.

Step 2: Connect to gpt-4o-realtime API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-4o-realtime",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

GPT-4o Realtime 的功能

探索 GPT-4o Realtime 的核心功能,專為提升效能和可用性而設計。了解這些功能如何為您的專案帶來效益並改善使用者體驗。

GPT-4o Realtime 的定價

探索 GPT-4o Realtime 的競爭性定價,專為滿足各種預算和使用需求而設計。我們靈活的方案確保您只需為實際使用量付費,讓您能夠隨著需求增長輕鬆擴展。了解 GPT-4o Realtime 如何在保持成本可控的同時提升您的專案效果。
彗星價格 (USD / M Tokens)官方價格 (USD / M Tokens)折扣
輸入:$60/M
輸出:$240/M
輸入:$75/M
輸出:$300/M
-20%

GPT-4o Realtime 的範例程式碼和 API

存取完整的範例程式碼和 API 資源,以簡化您的 GPT-4o Realtime 整合流程。我們詳盡的文件提供逐步指引,協助您在專案中充分發揮 GPT-4o Realtime 的潛力。

GPT-4o Realtime的版本

GPT-4o Realtime擁有多個快照的原因可能包括:更新後輸出結果存在差異需保留舊版快照以確保一致性、為開發者提供適應與遷移的過渡期,以及不同快照對應全球或區域端點以優化使用者體驗等潛在因素。各版本間的具體差異請參閱官方文件說明。
version
gpt-4o-realtime-preview-2024-12-17
gpt-4o-realtime-preview-2025-06-03
gpt-4o-realtime-preview-2024-10-01
gpt-4o-realtime-preview

更多模型

O

gpt-realtime-1.5

輸入:$3.2/M
輸出:$12.8/M
用於音訊輸入、音訊輸出的最佳語音模型。
O

gpt-audio-1.5

輸入:$2/M
輸出:$8/M
搭配 Chat Completions 進行音訊輸入、音訊輸出的最佳語音模型。
O

Whisper-1

輸入:$24/M
輸出:$24/M
語音轉文字,生成翻譯
O

TTS

輸入:$12/M
輸出:$12/M
OpenAI 文字轉語音
K

Kling TTS

每次請求:$0.006608
[語音合成] 全新上線:線上文字轉廣播級音訊,支援預覽功能 ● 可同時生成 audio_id,適用於任何 Keling API。
K

Kling video-to-audio

K

Kling video-to-audio

每次請求:$0.03304
Kling 影片轉音訊