Home/Models/OpenAI/gpt-audio-1.5
O

gpt-audio-1.5

Input:$2/M
Output:$8/M
The best voice model for audio in, audio out with Chat Completions.
New
Commercial Use
Overview
Features
Pricing
API

Technical specifications of gpt-audio-1.5

Itemgpt-audio-1.5 (public specs)
Model familyGPT Audio family (audio-first variant)
Input typesText, audio (speech in)
Output typesText, audio (speech out), structured outputs (function calls supported)
Context window128,000 tokens.
Max output tokens16,384 (documented in related gpt-audio listing).
Performance tierHigher intelligence; Medium speed (balanced).
Latency profileOptimized for voice interactions (mid/low latency depending on endpoint).
AvailabilityChat Completions API (audio in/out) and platform playgrounds; integrated across realtime/voice surfaces.
Safety / usage notesGuardrails for voice content; treat model outputs with the usual safety and verification for production voice agents.

Note: gpt-realtime-1.5 is a closely related realtime audio/voice-first variant optimized for lower latency and realtime sessions; compare below.


What is gpt-audio-1.5?

gpt-audio-1.5 is an audio-capable GPT model that supports both speech input and speech output through the Chat Completions and related audio-capable APIs. It's positioned as the main generally-available audio model for building voice agents and speech-first experiences while balancing quality and speed.


Main features

  1. Speech-in / speech-out support: Handle spoken input and return spoken or textual responses for natural voice flows.
  2. Large context for audio workflows: Supports very large context (documented 128k tokens) enabling multi-turn, long conversation history or large multimodal sessions.
  3. Streaming & Chat Completions compatibility: Works inside Chat Completions with streaming audio responses and function-call structured outputs.
  4. Balanced performance/latency: Tuned to provide high quality audio responses at medium throughput—suitable for chatbots and voice assistants where quality matters.
  5. Ecosystem & integrations: Supported in the platform’s playgrounds and available across official realtime/voice endpoints and partner integrations (Azure/Microsoft Foundry notes reference similar audio models).

gpt-audio-1.5 vs related audio models

Propertygpt-audio-1.5gpt-realtime-1.5
Primary focusHigh-quality audio in/out for Chat Completions and conversational flows.Realtime S2S (speech-to-speech) with lower latency for live voice agents and streaming scenarios.
Context window128k tokens.32k tokens (realtime variant documented).
Max output tokens16,384 (documented).Typically configured for shorter realtime responses (docs list smaller max tokens).
Best useChatbots, voice-enabled assistants where full chat semantics + audio are required.Live voice agents, kiosks, and low-latency conversational interfaces.

Representative use cases

  • Conversational voice agents for customer support and internal help desks.
  • Voice-enabled assistants embedded in apps, devices, and kiosks.
  • Hands-free workflows (dictation, voice search, accessibility).
  • Multimodal experiences that mix audio with text / images via Chat Completions.

Limitations & operational considerations

  • Not a drop-in replacement for human QA: Always validate speech outputs and downstream actions with human review in production flows.
  • Resource planning: Large context and audio I/O can increase compute and latency—design streaming/segmentation strategies for long sessions.
  • Safety & policy constraints: Voice outputs can carry persuasive power; follow platform safety guidelines and guardrails when deploying at scale.
  • How to access GPT Audio 1.5 API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

cometapi-key

Step 2: Send Requests to GPT Audio 1.5 API

Select the “gpt-audio-1.5” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. base url is Chat Completions

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

What are the official context and output token limits for gpt-audio-1.5 API?

gpt-audio-1.5 supports a 128,000-token context window and documents list a max output token configuration around 16,384; verify exact limits per endpoint in the developer docs. :contentReference[oaicite:44]{index=44}

Can gpt-audio-1.5 handle both speech-to-text and text-to-speech in the API?

Yes — it accepts audio inputs and can return audio outputs or textual responses via the Chat Completions/audio endpoints. :contentReference[oaicite:45]{index=45}

When should I use gpt-audio-1.5 vs gpt-realtime-1.5 for a voice agent?

Choose gpt-audio-1.5 for higher-quality audio in Chat Completions flows where larger context is required; choose gpt-realtime-1.5 for low-latency, live streaming voice interactions. :contentReference[oaicite:46]{index=46}

Does gpt-audio-1.5 support streaming and function calling for tool integrations?

Yes — the model supports streaming audio responses and structured outputs/function calling to integrate external tools and workflows. :contentReference[oaicite:47]{index=47}

Is gpt-audio-1.5 suitable for production customer support voice agents?

Yes — it's designed for voice assistants and conversational agents, but you should add human review/QA, logging, and safety controls before production deployment. :contentReference[oaicite:48]{index=48}

What are the main limitations to consider when deploying gpt-audio-1.5?

Key considerations are compute/latency tradeoffs for large context audio sessions, safety guardrails for voice content, and the need to validate ASR/TTS outputs in your domain. :contentReference[oaicite:49]{index=49}

Features for gpt-audio-1.5

Explore the key features of gpt-audio-1.5, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for gpt-audio-1.5

Explore competitive pricing for gpt-audio-1.5, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how gpt-audio-1.5 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$2/M
Output:$8/M
Input:$2.5/M
Output:$10/M
-20%

Sample code and API for gpt-audio-1.5

Access comprehensive sample code and API resources for gpt-audio-1.5 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of gpt-audio-1.5 in your projects.
Python
JavaScript
Curl
from openai import OpenAI
import os
import base64

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="gpt-audio-1.5",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ],
)

# Print the text response
print(completion.choices[0].message.audio.transcript)

# Save the audio response to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
output_path = "gpt-audio-1.5-output.wav"
with open(output_path, "wb") as f:
    f.write(wav_bytes)
print(f"Audio saved to {output_path}")

More Models