ModellerSupportVirksomhedBlog
500+ AI Model API, Alt I Én API. Kun I CometAPI
Modeller API
Udvikler
Hurtig StartDokumentationAPI Dashboard
Ressourcer
AI-modellerBlogVirksomhedÆndringslogOm os
2025 CometAPI. Alle rettigheder forbeholdes.PrivatlivspolitikServicevilkår
Home/Models/OpenAI/GPT-4o Realtime
O

GPT-4o Realtime

Indtast:$60/M
Output:$240/M
Realtime API gør det muligt for udviklere at skabe multimodale oplevelser med lav latenstid, herunder tale-til-tale-funktionalitet. Tekst og lyd, der behandles af Realtime API'et, prissættes separat. Denne model understøtter en maksimal kontekstlængde på 128,000 tokens.
Kommersiel brug
Oversigt
Funktioner
Priser
API
Versioner

Technical Specifications of gpt-4o-realtime

SpecificationDetails
Model IDgpt-4o-realtime
Model typeRealtime multimodal model
Primary use casesLow-latency multimodal interactions, speech-to-speech experiences, real-time text and audio applications
Context length128,000 tokens
Input modalitiesText, audio
Output modalitiesText, audio
Latency profileOptimized for low-latency realtime experiences
Pricing noteText and audio processed by the Realtime API are priced separately

What is gpt-4o-realtime?

gpt-4o-realtime is a realtime multimodal model available through CometAPI for developers building highly responsive AI applications. It is designed for scenarios where low latency matters, such as live voice assistants, interactive speech-to-speech systems, and applications that need to process text and audio in the same workflow.

This model supports multimodal communication, allowing applications to send text or audio inputs and receive text or audio outputs. With a maximum context length of 128,000 tokens, gpt-4o-realtime can also support longer interactions and more context-aware conversations than smaller-session realtime systems.

Main features of gpt-4o-realtime

  • Low-latency interaction: Built for realtime use cases where fast response times are essential for smooth user experiences.
  • Multimodal input and output: Supports both text and audio workflows, enabling flexible application design.
  • Speech-to-speech support: Well suited for conversational voice interfaces that take spoken input and return spoken output.
  • Large context window: Supports up to 128,000 tokens of context for more coherent extended sessions.
  • Flexible realtime application support: Useful for live assistants, interactive tools, customer support agents, and other responsive multimodal products.
  • Separate text and audio pricing: Developers should account for text and audio usage independently when estimating costs.

How to access and integrate gpt-4o-realtime

Step 1: Sign Up for API Key

To get started, sign up on CometAPI and generate your API key from the dashboard. After that, store the key securely and use it to authenticate every request to the API.

Step 2: Connect to gpt-4o-realtime API

The Realtime API uses WebSocket connections. Connect to CometAPI's WebSocket endpoint:

const ws = new WebSocket(
  "wss://api.cometapi.com/v1/realtime?model=gpt-4o-realtime",
  {
    headers: {
      "Authorization": "Bearer " + process.env.COMETAPI_API_KEY,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      instructions: "You are a helpful assistant."
    }
  }));
});

ws.on("message", (data) => {
  console.log(JSON.parse(data));
});

Step 3: Retrieve and Verify Results

The Realtime API streams responses through the WebSocket connection as server-sent events. Listen for response.audio.delta events for audio output and response.text.delta for text. Verify the session is established and responses are streaming correctly.

Funktioner til GPT-4o Realtime

Udforsk de vigtigste funktioner i GPT-4o Realtime, designet til at forbedre ydeevne og brugervenlighed. Opdag hvordan disse muligheder kan gavne dine projekter og forbedre brugeroplevelsen.

Priser for GPT-4o Realtime

Udforsk konkurrencedygtige priser for GPT-4o Realtime, designet til at passe til forskellige budgetter og brugsbehov. Vores fleksible planer sikrer, at du kun betaler for det, du bruger, hvilket gør det nemt at skalere, efterhånden som dine krav vokser. Opdag hvordan GPT-4o Realtime kan forbedre dine projekter, mens omkostningerne holdes håndterbare.
Comet-pris (USD / M Tokens)Officiel Pris (USD / M Tokens)Rabat
Indtast:$60/M
Output:$240/M
Indtast:$75/M
Output:$300/M
-20%

Eksempelkode og API til GPT-4o Realtime

Få adgang til omfattende eksempelkode og API-ressourcer for GPT-4o Realtime for at strømline din integrationsproces. Vores detaljerede dokumentation giver trin-for-trin vejledning, der hjælper dig med at udnytte det fulde potentiale af GPT-4o Realtime i dine projekter.

Versioner af GPT-4o Realtime

Årsagen til, at GPT-4o Realtime har flere øjebliksbilleder kan omfatte potentielle faktorer såsom variationer i output efter opdateringer, der kræver ældre øjebliksbilleder for konsistens, at give udviklere en overgangsperiode til tilpasning og migration, og at forskellige øjebliksbilleder svarer til globale eller regionale slutpunkter for at optimere brugeroplevelsen. For detaljerede forskelle mellem versioner, henvises der til den officielle dokumentation.
version
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-12-17
gpt-4o-realtime-preview-2025-06-03
gpt-4o-realtime-preview-2024-10-01

Flere modeller

O

gpt-realtime-1.5

Indtast:$3.2/M
Output:$12.8/M
Den bedste stemmemodel til lyd ind, lyd ud.
O

gpt-audio-1.5

Indtast:$2/M
Output:$8/M
Den bedste stemmemodel til lyd ind, lyd ud med Chat Completions.
O

Whisper-1

Indtast:$24/M
Output:$24/M
Tale til tekst, oprettelse af oversættelser
O

TTS

Indtast:$12/M
Output:$12/M
OpenAI tekst-til-tale
K

Kling TTS

Per anmodning:$0.006608
[Talesyntese] Netop lanceret: online tekst-til-broadcast-lyd med forhåndsvisning ● Kan samtidig generere audio_id til brug med enhver Keling-API.
K

Kling video-to-audio

K

Kling video-to-audio

Per anmodning:$0.03304
Kling video-til-lyd