Technical Specifications of tts-1-hd
| Specification | Details |
|---|---|
| Model ID | tts-1-hd |
| Provider | OpenAI |
| Model type | Text-to-speech (TTS) |
| Primary use case | Converting text into high-quality spoken audio |
| Optimization focus | Higher-quality speech synthesis rather than lowest-latency generation |
| API endpoint | /v1/audio/speech |
| Supported alongside | tts-1, gpt-4o-mini-tts |
| Input | Text |
| Output | Generated speech audio |
| Voice selection | Supported via selectable preset voices |
| Common controls | Voice, response format, and speed |
| Integration pattern | Request-based audio generation through the Audio API |
What is tts-1-hd?
tts-1-hd is OpenAI’s higher-quality text-to-speech model for generating natural-sounding spoken audio from text. It is designed for use cases where output quality matters more than minimizing latency, such as narration, voice assistants, accessibility features, educational content, and media production. OpenAI documents tts-1-hd as a model optimized for high-quality text-to-speech and indicates that it is used through the Speech endpoint in the Audio API.
Compared with lighter or faster TTS options, tts-1-hd is generally positioned as the quality-focused choice in the OpenAI speech stack. Developers typically send input text, choose a voice, optionally configure speed or output format, and receive an audio file or streamable audio response suitable for playback or storage.
Main features of tts-1-hd
- High-quality speech synthesis:
tts-1-hdis specifically optimized for higher-quality audio generation, making it suitable when clarity and naturalness are more important than the fastest response time. - Simple text-to-audio workflow: The model accepts text input and returns synthesized speech through OpenAI’s Audio API, which keeps implementation straightforward for developers building narration or voice output features.
- Preset voice support: Developers can select from supported built-in voices when generating speech, enabling different tones or presentation styles without training a custom voice model.
- Configurable output behavior: The API supports practical controls such as response format and speech speed, helping teams tailor generated audio for playback, downloads, accessibility tools, or content pipelines.
- Fits production audio use cases: Because it is exposed through the standard speech endpoint,
tts-1-hdcan be integrated into apps, backend workflows, automation systems, and multi-step voice pipelines with predictable API-based access.
How to access and integrate tts-1-hd
Step 1: Sign Up for API Key
To start using tts-1-hd, first register on CometAPI and generate your API key from the dashboard. After logging in, create a key for your project and store it securely, since it will be required to authenticate every API request.
Step 2: Send Requests to tts-1-hd API
Once you have your API key, send a POST request to CometAPI’s compatible chat completions endpoint, specifying the model as tts-1-hd in the request body.
curl https://api.cometapi.com/v1/audio/speech \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $COMETAPI_API_KEY" \
-d '{
"model": "tts-1-hd",
"input": "Hello! This is a sample speech generation request.",
"voice": "alloy"
}' \
--output speech.mp3
from openai import OpenAI
client = OpenAI(
api_key="YOUR_COMETAPI_API_KEY",
base_url="https://api.cometapi.com/v1"
)
response = client.audio.speech.create(
model="tts-1-hd",
voice="alloy",
input="Hello! This is a sample speech generation request."
)
response.stream_to_file("speech.mp3")
Step 3: Retrieve and Verify Results
After submitting your request, CometAPI will return the generated audio result for tts-1-hd. Save the output file, play it back, and verify that the speech quality, voice selection, pacing, and pronunciation match your application requirements. If needed, iterate by adjusting the input text, voice, or other supported parameters.