/
模型支援企業部落格
500+ AI 模型 API,全部整合在一個 API 中。就在 CometAPI
模型 API
開發者
快速入門說明文件API 儀表板
資源
AI模型部落格企業更新日誌關於
2025 CometAPI. 保留所有權利。隱私政策服務條款
Home/Models/OpenAI/GPT-4o mini TTS
O

GPT-4o mini TTS

輸入:$9.6/M
輸出:$9.6/M
GPT-4o mini TTS là một mô hình chuyển văn bản thành giọng nói dựa trên mạng nơ-ron, được thiết kế để tạo giọng nói tự nhiên với độ trễ thấp cho các ứng dụng hướng tới người dùng. Nó chuyển văn bản thành giọng nói tự nhiên với các giọng đọc có thể lựa chọn, đầu ra đa định dạng và khả năng tổng hợp theo luồng để mang lại trải nghiệm phản hồi nhanh. Các trường hợp sử dụng điển hình bao gồm trợ lý giọng nói, IVR và luồng liên hệ, tính năng đọc to thông tin sản phẩm và thuyết minh nội dung truyền thông. Các điểm nổi bật về kỹ thuật bao gồm truyền phát theo luồng dựa trên API và xuất ra các định dạng âm thanh phổ biến như MP3 và WAV.
商業用途
概覽
功能
定價
API

Technical Specifications of gpt-4o-mini-tts

gpt-4o-mini-tts is a text-to-speech model exposed through the audio speech API for generating natural-sounding spoken audio from text. It is positioned for intelligent realtime applications and supports prompt-based control over speech characteristics such as accent, emotional range, intonation, impressions, speed of speech, tone, and whispering.

From the API perspective, gpt-4o-mini-tts is used with the speech generation endpoint and accepts core inputs including the model ID, input text, and a selected voice. The input text limit is 4096 characters per request. Supported built-in voices include alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, and cedar, with support for custom voice objects where available.

The model supports multiple output formats for generated audio, including mp3, opus, aac, flac, wav, and pcm. It also supports a configurable speech speed from 0.25 to 4.0, with 1.0 as the default. For delivery behavior, the API supports direct audio output as well as streaming options, including SSE streaming for responsive playback workflows.

Typical implementation scenarios include voice assistants, IVR and contact flows, product read-aloud experiences, accessibility narration, and media voice generation where low latency and natural voice output matter. This fits the model’s documented positioning for realtime audio generation.

What is gpt-4o-mini-tts?

gpt-4o-mini-tts is a neural text-to-speech model that converts written text into expressive, natural audio for user-facing applications. It is designed for teams that need fast voice generation without building and training a custom speech stack from scratch.

In practical terms, developers send text plus a chosen voice to the speech API, and the model returns synthesized audio that can be saved, streamed, or played back in an application. Because it supports multiple voices, common audio export formats, and streaming-friendly delivery, it is well suited to production interfaces that need spoken responses with minimal delay.

Compared with basic TTS pipelines, gpt-4o-mini-tts is especially useful when the experience needs more than robotic narration. The documented controls over tone, pacing, accent, and expressive style make it a strong option for assistants, guided workflows, customer service automation, and branded voice experiences.

Main features of gpt-4o-mini-tts

  • Natural speech generation: Converts text into human-like spoken audio intended for user-facing and realtime experiences.
  • Low-latency delivery: Designed for intelligent realtime applications, making it suitable for conversational interfaces and responsive playback flows.
  • Selectable voices: Supports a range of built-in voices such as alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, and cedar.
  • Expressive control: Can be prompted to shape accent, emotional range, intonation, impressions, tone, whispering, and speed of speech.
  • Multiple audio formats: Exports generated speech in mp3, opus, aac, flac, wav, and pcm formats for different application and playback needs.
  • Streaming synthesis support: Supports streaming-oriented response behavior, including SSE, for applications that need progressive audio delivery.
  • Simple API integration: Works through a straightforward speech generation API using model, input text, and voice parameters.
  • Custom voice pathway: Can be paired with custom voice objects where account eligibility and voice-creation workflows are available.

How to access and integrate gpt-4o-mini-tts

Step 1: Sign Up for API Key

To start using gpt-4o-mini-tts, first create an account on CometAPI and generate your API key from the dashboard. After signing in, copy the key and store it securely, since you will use it to authenticate every request to the API.

Step 2: Send Requests to gpt-4o-mini-tts API

Once you have your API key, you can call CometAPI’s OpenAI-compatible endpoint and specify the model as gpt-4o-mini-tts.

curl https://api.cometapi.com/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_API_KEY" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Welcome to our voice assistant. How can I help you today?",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3
import requests

url = "https://api.cometapi.com/v1/audio/speech"
headers = {
    "Authorization": "Bearer YOUR_COMETAPI_API_KEY",
    "Content-Type": "application/json",
}
payload = {
    "model": "gpt-4o-mini-tts",
    "input": "Welcome to our voice assistant. How can I help you today?",
    "voice": "alloy",
    "response_format": "mp3",
}

response = requests.post(url, headers=headers, json=payload)
with open("speech.mp3", "wb") as f:
    f.write(response.content)

Step 3: Retrieve and Verify Results

After sending the request, CometAPI returns the generated audio output for gpt-4o-mini-tts. Save the returned file or stream it directly into your application, then verify that the selected voice, format, pacing, and overall audio quality match your product requirements. If needed, adjust the input text, voice choice, output format, or speech settings and resend the request until the result fits your use case.

GPT-4o mini TTS 的功能

探索 GPT-4o mini TTS 的核心功能,專為提升效能和可用性而設計。了解這些功能如何為您的專案帶來效益並改善使用者體驗。

GPT-4o mini TTS 的定價

探索 GPT-4o mini TTS 的競爭性定價,專為滿足各種預算和使用需求而設計。我們靈活的方案確保您只需為實際使用量付費,讓您能夠隨著需求增長輕鬆擴展。了解 GPT-4o mini TTS 如何在保持成本可控的同時提升您的專案效果。
彗星價格 (USD / M Tokens)官方價格 (USD / M Tokens)折扣
輸入:$9.6/M
輸出:$9.6/M
輸入:$12/M
輸出:$12/M
-20%

GPT-4o mini TTS 的範例程式碼和 API

存取完整的範例程式碼和 API 資源,以簡化您的 GPT-4o mini TTS 整合流程。我們詳盡的文件提供逐步指引,協助您在專案中充分發揮 GPT-4o mini TTS 的潛力。

更多模型