Sora API Access in 2026: Pricing, Rate Limits, and What's Actually Available Through Aggregators

Sora 2 is the first generally-available text-to-video model from OpenAI, accessible programmatically through both the official OpenAI API and a growing set of aggregator routes. The pricing model is unusual compared to text models (billing is per second of generated video rather than per token), and the practical questions developers ask before integrating are different to those for an LLM API. What does a clip actually cost? How long does generation take? What are the rate limits? What changes when you access Sora through an aggregator instead of OpenAI directly?

This article is the reference we wish existed when we started scoping our own video-generation features. The piece is structured for the developer who is past "is Sora interesting?" and now needs to answer "what will it cost, what will it take to integrate, and what do I need to know before I commit?"

Quick read: Sora 2 (the standard model) costs $0.10 per second of generated video at 720p. Sora 2 Pro costs $0.30 per second at 720p or $0.50 per second at 1024p. A typical 10-second clip is $1.00 on the standard model and $5.00 on Pro at HD. Generation time is async; expect 30–90 seconds of wall-clock time for a 5–10 second clip. Access requires a paid OpenAI account at usage tier 2 minimum.

The state of Sora API access in 2026

Sora 2 launched in the OpenAI API on October 7, 2025, and access has been continuously available since. The model identifier is sora-2 (with a current snapshot ID of sora-2-2025-12-08), and the higher-fidelity variant is sora-2-pro. Both support text-to-video and image-to-video generation, with synchronised audio output. As of January 10, 2026, free-tier consumer access through the ChatGPT product was discontinued, which has concentrated developer-grade Sora usage onto either paid ChatGPT subscriptions or direct API access.

There are three pathways to use Sora programmatically:

OpenAI direct API. The canonical route. Per-second billing, paid only, requires a minimum $10 top-up to reach usage tier 2 which unlocks Sora model access. SDK and REST API both supported.
Azure OpenAI. Microsoft's enterprise route, mirroring OpenAI's official rates with the addition of Azure subscription overhead and enterprise compliance features. Same per-second pricing; different operational surface.
Aggregators. Services that expose Sora behind their own unified API. Most aggregators pass through OpenAI's per-second pricing at parity; the value is operational (one credential, one bill, the same SDK as your text-model traffic). Some aggregators offer their own tariff structures, which we discuss later in the article.

Sora 2 pricing per second of video

Sora pricing is structured by model tier and output resolution, with a per-second rate that is multiplied by the clip duration to give the generation cost. Verified from OpenAI's official pricing page as of May 2026:

Model	Resolution	Supported durations	Price per second	10-second clip
Sora 2 (standard)	720p	4s, 8s, 12s	$0.10	$1.00
Sora 2 Pro	720p	10s, 15s, 25s	$0.30	$3.00
Sora 2 Pro	1024p (1792×1024)	10s, 15s, 25s	$0.50	$5.00

Notes on the pricing structure. Pricing is by output, not by input; there is no token-based input billing for Sora as there is for text models. Image conditioning (passing a reference image to anchor the generation) does not change the per-second rate. The duration options for each model tier are fixed: you cannot request a 7-second clip on the standard model, only 4, 8, or 12 seconds.

Two practical implications worth being explicit about. First: the pricing model is closer to a video-rendering bill than to an LLM bill. Cost is driven by output duration, not by how complex your prompt is or how many tokens it contains. Second: the cost difference between Sora 2 and Sora 2 Pro at HD is 5x per second: a 10-second clip costs $1.00 on standard and $5.00 on Pro at 1024p. Choosing the right tier for the task is the single biggest cost lever you have, and it is worth being deliberate about which workloads genuinely need Pro's higher fidelity.

Rate limits and quotas

Sora rate limits are organised around OpenAI's standard usage-tier system. The salient details for Sora specifically:

Minimum tier requirement: Tier 2, reached by topping up at least $10 of API credit. Tier 1 (default for new accounts) does not include Sora model access.
Concurrent generation limits: Per OpenAI's rate-limit documentation, concurrent video generation is restricted by tier, typically a small number of in-flight generations at lower tiers, scaling with usage tier. The exact ceiling is set on a per-account basis and visible in the OpenAI dashboard. For high-volume workloads, plan for tier-3 or tier-4 access from day one.
Quota requests: Higher concurrency limits beyond the default tier ceilings can be requested through the OpenAI rate-limit increase form. Approval is workload-specific and not instant; for production launches with predictable demand spikes, request the increase several weeks ahead of launch.

Worth knowing: rate limits on Sora are pooled differently to the text-model rate limits on the same account. A team running heavy Sora traffic does not affect their available rate budget for GPT-5.5 calls. Conversely, large GPT-5.5 traffic does not eat into the Sora budget. Plan the two as separate capacity questions.

Generation time: what to actually expect

Sora is async by design. You submit a generation request, get back a job ID, and poll (or webhook back) for completion. The wall-clock time between request and completion depends on the duration and resolution of the output, current load on the OpenAI infrastructure, and whether the job is queued behind others on your account.

Realistic expectations based on observed behaviour:

Output	Typical wall-clock time	Notes
Sora 2 standard, 4s @ 720p	20–45 seconds	Fastest path; good for iteration
Sora 2 standard, 8s @ 720p	40–90 seconds	Most common production duration
Sora 2 standard, 12s @ 720p	60–120 seconds	Long-form social content
Sora 2 Pro, 10s @ 720p	60–150 seconds	Premium quality; ~3x cost of standard
Sora 2 Pro, 15s @ 1024p	120–240 seconds	Full HD, longer queueing observed at peak times
Sora 2 Pro, 25s @ 1024p	200–360 seconds	Maximum duration; price scales linearly

Two operational consequences:

User-facing latency budgets need rethinking. If your product expects video generation to feel responsive to a user action, the 30–90 second range for short clips means you need a UX that handles the wait: progress indicators, parallel work the user can do while the video generates, or pre-generation for predictable scenarios. Treating Sora like a synchronous API call is the most common architecture mistake teams make.
Polling versus webhooks matters. Naive polling (a tight loop hitting the status endpoint) wastes both your rate limit budget and the model's compute. Use exponential backoff with jitter, or set up webhook callbacks if your environment supports them. The polling pattern that works well in production is to poll at 10-second intervals for the first minute, then 30-second intervals beyond that, with a hard timeout at the model's expected upper bound for the requested duration.

Supported parameters and prompt structure

Sora’s API surface is intentionally simple compared to image generation models like DALL-E 3. There are fewer dials to turn, but the dials that exist matter. The salient parameters:

model: sora-2 or sora-2-pro. The choice drives both pricing and the available duration/resolution options as shown in the pricing table above.
prompt: Free-form text describing the scene. Sora handles cinematic direction (camera angles, movement, lighting), character actions, and environmental details. The model is sensitive to prompt structure: leading with the scene establishment, then the action, then the technical direction, produces more reliable results than a single dense paragraph.
image: Optional reference image for image-to-video generation. The reference acts as the first frame anchor; the model generates motion outward from that starting point. Useful for product demos, character continuity, and any scenario where the static appearance of the subject is non-negotiable.
duration: Duration in seconds. Constrained to the discrete options for the chosen model (4/8/12 for sora-2, 10/15/25 for sora-2-pro). Cost scales linearly with duration.
size: Resolution. 720x1280 (portrait) or 1280x720 (landscape) on the standard model; adds 1024x1792 / 1792x1024 on Pro. Aspect ratio is implicit in the size selection.

Notable absences. Sora does not currently expose seed control through the public API (so reproducibility across runs is not guaranteed), nor does it expose individual style controls in the way Midjourney or other image models do. The model is opinionated; prompt engineering is the primary lever, not parameter tuning.

A simple example of a Sora 2 generation request, using the OpenAI Python SDK:

from openai import OpenAIimport timeclient = OpenAI(api_key="YOUR_API_KEY")# Create the video generation jobjob = client.videos.create(model="sora-2",prompt=("A wide-angle shot of a snow-capped mountain at sunrise. ""The camera slowly tracks left as the first light hits the peak. ""Cinematic, golden hour, 4K-quality lighting."),size="1280x720",duration=8,)# Poll for completionwhile True:job = client.videos.retrieve(job.id)if job.status == "completed":video_url = job.output[0].urlbreakelif job.status == "failed":raise RuntimeError(f"Generation failed: {job.error}")print(f"Current status: {job.status}")time.sleep(10)print(f"Video ready: {video_url}")

Worked cost examples

The per-second pricing makes cost predictable, but only once you are clear about your workload shape. Three representative scenarios:

Scenario 1: A short product demo for a SaaS landing page

A 5-second clip showing the product UI in action, generated once and used as the hero video on the marketing site. You expect to iterate 5–10 times to get a clip you are happy with before publishing.

Cost on Sora 2 standard at 720p: 5s × $0.10 = $0.50 per generation. With 8 iterations to land on the final cut: $4.00. Cost on Sora 2 Pro at 1024p for the final published version: 5s × $0.50 = $2.50 (single take). Total project cost: roughly $6.50 for the iteration runs plus the HD final.

Scenario 2: A batch of 50 clips for a marketing campaign

50 unique 8-second product clips, each based on a different feature description, all on Sora 2 standard at 720p. No iteration budget; you accept the first generation.

Cost: 50 × 8s × $0.10 = $40.00. Add a 30% iteration budget for the clips that don't land first time (50 × 0.30 = 15 retries × 8s × $0.10 = $12). Total: roughly $52.00 for the campaign.

Scenario 3: A user-generated video feature in a consumer product

Users in your app generate 6-second clips on demand, on Sora 2 standard at 720p. Average usage: 1,000 clips per day. You charge users $0.50 per generation and accept the cost differential as the unit margin.

Cost per user clip: 6s × $0.10 = $0.60. With user pricing at $0.50, the workload is loss-making at the standard tier: every generation costs $0.10 more than the user pays. The 720p standard tier requires user pricing of at least $0.65 to break even before infrastructure overhead. At 30,000 clips per month: monthly Sora bill of $18,000. This is the kind of unit-economics check worth doing before launching any user-facing video feature.

The takeaway across the three scenarios: video generation is genuinely affordable for marketing and one-off content workloads, where the iteration count is bounded and the cost-per-final-asset is what matters. It is meaningfully more challenging for user-facing features at scale, where the cost-per-generation has to clear the user-paid price plus product overhead. Be explicit about which workload you are pricing before committing.

OpenAI direct access versus aggregator access

With Sora available through multiple routes, the practical question for most teams is which one to integrate against. The honest answer depends on the rest of your stack.

What's the same

Output quality, generation time at the model layer, supported parameters, and per-second pricing are typically identical regardless of route, since most aggregators pass through OpenAI’s pricing at parity, and the model itself is the same model. If you are choosing a route purely on output quality, the choice is a wash.

What's different

Billing surface. Direct OpenAI access bills through your OpenAI account; aggregators bill through their own credit or subscription system. For teams that already manage OpenAI billing for text-model usage, the direct route adds nothing new. For teams running multi-provider workloads (LLMs from Anthropic, image models from Black Forest Labs, video from Sora), an aggregator consolidates all of that onto one invoice.
Observability. OpenAI's dashboard surfaces request-level Sora usage cleanly. Aggregator dashboards vary in how well they handle video-generation workloads specifically; some have purpose-built video observability; others treat video as a generic API call. Worth checking before committing if observability is a priority.
Rate-limit pooling. On direct OpenAI, your Sora rate limits are tied to your OpenAI account and tier. On an aggregator, the limits are pooled across the aggregator's customer base in some cases, or assigned per-customer in others. For high-volume production workloads, ask the aggregator how they handle rate-limit allocation before integrating.
Geographic and compliance posture. Direct OpenAI is processed through OpenAI's infrastructure with the data residency options OpenAI provides. Some aggregators are based in jurisdictions where data residency rules differ; others route requests through OpenAI's US infrastructure regardless. For regulated workloads, this is decisive, and it is the kind of thing worth asking the aggregator's sales team to put in writing.

How CometAPI fits in

CometAPI exposes Sora 2 and Sora 2 Pro alongside 500+ other models behind a single OpenAI-compatible endpoint, with one credential and unified billing. Pricing on Sora through CometAPI tracks OpenAI's per-second rates; the operational value is consolidating Sora usage with the rest of your model traffic on a single invoice. For teams running a mixed workload (text models from multiple providers, image generation, and Sora video), this is the core argument. For teams using only Sora and only one or two text models, the operational saving is smaller and direct OpenAI access is a defensible choice.

Production considerations

A few patterns worth getting right before Sora goes near production traffic:

Async job lifecycle handling. Treat each Sora generation as a long-running job, not a request. Persist the job ID immediately on creation; survive a server restart by being able to resume polling for in-flight jobs; handle the case where the job completes while your worker is offline. This is standard distributed-systems hygiene but is often skipped at first because Sora is the first async API the team has integrated.
Webhook fallback. If the platform supports webhooks for completion events (the OpenAI API does), use them. Webhooks remove the need for polling and reduce both your rate-limit pressure and the wasted compute of frequent status checks. Polling is the fallback for environments that cannot expose a webhook endpoint.
Failure modes that cost money. OpenAI does not bill for failed generations, but partial completions and retried requests that succeed on the second attempt do incur cost. In production, log the cost of each retry and alert if your retry rate exceeds expectations, since that’s usually a signal of a content-policy issue with the prompts you're sending, which is cheaper to fix at the prompt layer than to absorb in the bill.
Content policy and production deployment. Sora is bounded by OpenAI's usage policies, which restrict certain categories of content. For production deployments (especially user-facing ones where the prompt is partly under user control), review OpenAI’s official content policy documentation and design upstream guardrails accordingly. Linking out to OpenAI's policy is the right reference; that documentation is the source of truth and changes more often than this article will.

What to build first

The honest read on which Sora workloads are ready for production today, which are on the edge, and which are premature:

Production-ready today

Marketing and creative content workloads where iteration is bounded and the cost-per-final-asset is the right metric. Product demo videos, social media campaign content, hero videos for landing pages, internal training material. The economics work, the failure modes are well-understood, and the latency story (30–90 seconds for short clips) is acceptable when the human in the loop is content-team rather than end-user.

On the edge

User-facing video generation features where the per-clip cost has to clear user-paid pricing. This is workable but requires careful unit-economics: bound the duration users can request, use Sora 2 standard at 720p as the default, charge a price that has margin over the per-clip cost. The early-2026 wave of consumer video generation apps is mostly in this category, and the ones with sustainable economics have all been deliberate about constraining what users can generate.

Premature

Long-form video at scale (anything over 25 seconds, since that is Sora's current duration ceiling), high-volume real-time scenarios where wall-clock latency matters more than dollars, and applications expecting frame-level control or seed-based reproducibility. These are workloads to revisit when Sora's capability surface expands, not to force-fit today.

The framing: Sora 2 is genuinely production-ready for content workloads with a human in the loop. It is workable for user-facing features with deliberate unit economics. It is premature for long-form video and for use cases that require parameters Sora doesn't yet expose. Build for what's ready today; track the ones that aren't yet.

Trying it on your workload: All Sora 2 and Sora 2 Pro variants are available on CometAPI alongside the text models you may already be using. The free trial credit lets you generate a handful of clips at standard pricing without any setup beyond pointing your existing OpenAI-compatible client at the CometAPI endpoint.