模型定價企業
500+ AI 模型 API,全部整合在一個 API 中。就在 CometAPI
模型 API
開發者
快速入門說明文件API 儀表板
公司
關於我們企業
資源
AI模型部落格更新日誌支援
服務條款隱私政策
© 2026 CometAPI · All rights reserved
Home/Models/OpenAI/GPT-4o mini Realtime Preview
O

GPT-4o mini Realtime Preview

輸入:$60/M
輸出:$240/M
GPT-4o mini Realtime Preview 是一款用於互動式語音與視覺體驗的即時多模態模型。它支援串流式輸入與輸出,可處理語音、文字與圖片,並能透過工具/函式呼叫執行實際操作。典型用例包括語音助理、即時通話處理、即時字幕,以及針對相機或螢幕內容的視覺問答。技術亮點包括雙向音訊、視覺理解、串流式回應,以及透過函式產生的結構化輸出。
商業用途
概覽
功能
定價
API
版本

Technical Specifications of gpt-4o-mini-realtime-preview

SpecificationDetails
Model IDgpt-4o-mini-realtime-preview
ProviderOpenAI via CometAPI
ModalitiesText, audio, image
Input typesStreaming audio, text messages, image inputs
Output typesStreaming text, synthesized/streamed audio, structured function calls
Core strengthsLow-latency interaction, multimodal understanding, real-time conversation, tool use
Best forVoice assistants, live support calls, captioning, visual Q&A, interactive agents
Function callingSupported
StreamingSupported
Realtime sessionsSupported
Typical interaction patternContinuous bidirectional session with incremental input and output

What is gpt-4o-mini-realtime-preview?

gpt-4o-mini-realtime-preview is a real-time multimodal model designed for fast, interactive experiences where users speak, type, or share visual input and expect immediate responses. It is well suited for applications that need live back-and-forth communication rather than standard single-turn request/response workflows.

The model can process speech, text, and images within the same experience, making it useful for assistants that listen to a caller, inspect on-screen or camera content, and respond in natural language or audio. Because it supports streaming input and output, developers can build systems that feel responsive during ongoing interactions instead of waiting for a full completion.

It also supports tool or function calling, which allows the model to trigger structured actions such as looking up data, calling backend services, or executing workflow steps. This makes gpt-4o-mini-realtime-preview a strong choice for grounded, action-oriented agents in customer support, operations, productivity, and multimodal assistant scenarios.

Main features of gpt-4o-mini-realtime-preview

  • Real-time multimodal interaction: Accepts and responds across speech, text, and images for fluid live experiences.
  • Bidirectional audio: Supports conversational voice interfaces where audio can be streamed in and responses can be streamed back out.
  • Streaming responses: Delivers partial outputs incrementally, reducing perceived latency and improving responsiveness.
  • Vision understanding: Interprets visual inputs such as camera frames, screenshots, or other images during a live session.
  • Function and tool calling: Produces structured calls that let your application connect the model to business logic, databases, or external tools.
  • Interactive agent behavior: Works well for assistants that must maintain turn-by-turn context during active sessions.
  • Live call handling: Useful for phone or web-call scenarios involving fast speech understanding and immediate replies.
  • Real-time captioning and transcription workflows: Can support experiences that convert ongoing speech into usable text in near real time.
  • Structured outputs for actions: Helps applications turn conversational intent into reliable machine-readable instructions.
  • Low-latency user experiences: Optimized for scenarios where responsiveness matters, such as support, coaching, monitoring, and guided workflows.

How to access and integrate gpt-4o-mini-realtime-preview

Step 1: Sign Up for API Key

First, create an account on CometAPI and generate your API key from the dashboard. This key is required to authenticate every request. Store it securely and avoid exposing it in client-side code or public repositories.

Step 2: Connect to gpt-4o-mini-realtime-preview API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-4o-mini-realtime-preview",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

GPT-4o mini Realtime Preview 的定價

探索 GPT-4o mini Realtime Preview 的競爭性定價,專為滿足各種預算和使用需求而設計。我們靈活的方案確保您只需為實際使用量付費,讓您能夠隨著需求增長輕鬆擴展。了解 GPT-4o mini Realtime Preview 如何在保持成本可控的同時提升您的專案效果。
彗星價格 (USD / M Tokens)官方價格 (USD / M Tokens)折扣
輸入:$60/M
輸出:$240/M
輸入:$75/M
輸出:$300/M
-20%

GPT-4o mini Realtime Preview 的範例程式碼和 API

存取完整的範例程式碼和 API 資源,以簡化您的 GPT-4o mini Realtime Preview 整合流程。我們詳盡的文件提供逐步指引,協助您在專案中充分發揮 GPT-4o mini Realtime Preview 的潛力。

GPT-4o mini Realtime Preview的版本

GPT-4o mini Realtime Preview擁有多個快照的原因可能包括:更新後輸出結果存在差異需保留舊版快照以確保一致性、為開發者提供適應與遷移的過渡期,以及不同快照對應全球或區域端點以優化使用者體驗等潛在因素。各版本間的具體差異請參閱官方文件說明。
version
gpt-4o-mini-realtime-preview
gpt-4o-mini-realtime-preview-2024-12-17