128K

reasoning

Chat

openAI

GPT-OSS-120B API

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<YOUR_API_KEY>",    
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {
            "role": "system",
            "content": "You are an AI assistant who knows everything.",
        },
        {
            "role": "user",
            "content": "Tell me, why is the sky blue?"
        },
    ],
)

message = response.choices[0].message.content

print(f"Assistant: {message}")

All AI Models in One API
500+ AI Models

Free For A Limited Time! Register Now

Get 1M Free Token Instantly！

GPT-OSS-120B API

OpenAI’s gpt-oss-120b marks the organization’s first open-weight release since GPT-2, offering developers transparent, customizable, and high-performance AI capabilities under the Apache 2.0 license. Designed for sophisticated reasoning and agentic applications, this model democratizes access to advanced large-language technologies, enabling on-premises deployment and in-depth fine-tuning.

Core Features and Design Philosophy

GPT‑OSS models are designed as general-purpose, text-only LLMs. They support high-level cognitive tasks, including mathematical reasoning, structured analysis, and language comprehension. Unlike closed commercial models such as GPT‑4, GPT‑OSS allows full download and use of model weights, giving researchers and developers unprecedented access to inspect, fine-tune, and deploy models entirely on their infrastructure.

Basic Information

Parameters: 117 billion total, 5.1 billion active via Mixture-of-Experts (MoE)
License: Apache 2.0 for unrestricted commercial and academic use
Context Window: Up to 128 K tokens, supporting long-form inputs and multi-document reasoning
Chain-of-Thought: Full CoT outputs for auditability and fine-grained control
Structured Outputs: Native support for JSON, XML, and custom schemas .

Technical Details

GPT-OSS leverages a Transformer backbone augmented with a Mixture-of-Experts (MoE) architecture to achieve sparse activation and reduce inference costs. The gpt-oss-120b model contains 128 experts distributed across 36 layers, activating 4 experts per token (5.1 B active parameters), while gpt-oss-20b utilizes 32 experts over 24 layers, activating 4 experts per token (3.6 B active parameters). It employ alternating dense and locally banded sparse attention, grouped multi-query attention (group size 8), and support a 128 k token context window—unmatched in open-weight offerings to date. Memory efficiency is further enhanced via **4-bit mixed-precision quantization **, enabling larger contexts on commodity hardware.

GPT‑OSS models have undergone rigorous benchmarking against well-known datasets, revealing competitive—if not superior—performance when compared to similarly sized proprietary models.

Benchmarking and Performance Evaluation

On standard benchmarks, gpt-oss-120b matches or exceeds OpenAI’s proprietary o4-mini model:

MMLU (Massive Multitask Language Understanding): ~88% accuracy
Codeforces Elo (coding reasoning): ~2205
AIME (math competition with tools): ~87.9%
HealthBench: Significantly outperforms o4-mini in clinical QA and diagnosis tasks
Tau-Bench (Retail + Reasoning tasks): ~62% on average

Model Version

Default Variant: gpt-oss-120b (v1.0)
Active Parameters: 5.1 B (dynamic MoE selection)
Follow-Up Releases: Planned patches to improve safety filters and specialized domain fine-tuning

Limitations

Despite their power, GPT‑OSS models come with certain limitations:

Text-only interface: Unlike GPT-4o or Gemini, GPT‑OSS does not support multimodal inputs (images, audio, video).
No training set transparency: OpenAI has not released details on specific datasets used, which may raise concerns for academic reproducibility or bias auditing.
Performance inconsistency: Some community benchmarks (e.g., Simple-Bench) report poor results in specific reasoning tests (~22% on some tasks for 120b), suggesting performance may vary significantly across domains.
Hardware limitations: The 120B model requires significant compute for local inference, making it inaccessible for casual developers without GPU access.
Safety tradeoffs: Although tested under adversarial fine-tuning scenarios, the open-weight nature means these models can still be misused—e.g., for spam, misinformation, or model jailbreaks—if not properly governed.

Nevertheless, OpenAI reports that gpt‑oss models do not raise current frontier-level safety risks, especially in biorisk or cybersecurity domains.

How to call gpt-oss-120b API from CometAPI

`gpt-oss-120b` API Pricing in CometAPI，20% off the official price:

Input Tokens	$0.16
Output Tokens	$0.80

Required Steps

Log in to cometapi.com. If you are not our user yet, please register first
Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.
Get the url of this site: https://api.cometapi.com/

Use Method

Select the “gpt-oss-120b” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience.
Replace <YOUR_API_KEY> with your actual CometAPI key from your account.
Insert your question or request into the content field—this is what the model will respond to.
. Process the API response to get the generated answer.

CometAPI provides a fully compatible REST API—for seamless migration. Key details to API doc:

Endpoint: https://api.cometapi.com/v1/chat/completions
Model Parameter: gpt-oss-120b
Authentication: Bearer YOUR_CometAPI_API_KEY
Content-Type: application/json .
Core Parameters: prompt, max_tokens_to_sample, temperature, stop_sequences

While GPT‑OSS can be used entirely offline, it also supports OpenAI-compatible chat APIs when hosted on services like Hugging Face or AWS Bedrock.

Here’s a sample integration using Python:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.cometapi.com/v1/chat/completions",  # or AWS/Azure provider
    api_key=cometapi_key
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Explain how quantum tunneling works."}
    ]
)

print(response.choices[0].message.content)

Alternatively, you can run the models locally using tools like LMDeploy, Text Generation Inference (TGI), or vLLM.

See Also GPT-OSS-20B

Start Today

One API
Access 500+ AI Models!

Free For A Limited Time! Register Now
Get 1M Free Token Instantly！

Get Free API Key

API Docs

Technology

How much computing power is required for GPT-OSS deployment?

2025-10-12 anna No comments yet

OpenAI’s recent gpt-oss family (notably the gpt-oss-20B and gpt-oss-120B releases) explicitly targets two different classes of deployment: lightweight local inference (consumer/edge) and large-scale data-center inference. That release — and the flurry of community tooling around quantization, low-rank adapters, and sparse/Mixture-of-Experts (MoE) design patterns — makes it worth asking: how much compute do you actually need to run, fine-tune, and serve these models in production?

Technology, Guide

OpenAI GPT-OSS: How to Run it Locally or self-host on Cloud, Hardware Requirements

2025-10-11 anna No comments yet

GPT-OSS is unusually well-engineered for accessibility: the gpt-oss-20B variant is designed to run on a single consumer GPU (~16 GB VRAM) or recent high-end laptops using quantized GGUF builds, while gpt-oss-120B—despite its 117B total parameters—is shipped with MoE/active-parameter tricks and an MXFP4 quantization that lets it run on single H100-class GPUs (≈80 GB) or on […]

openai-gpt-oss-120b-open-weight11-1754468029

New, Technology

Could GPT-OSS Be the Future of Local AI Deployment?

2025-08-07 anna No comments yet

OpenAI has announced the release of GPT-OSS, a family of two open-weight language models—gpt-oss-120b and gpt-oss-20b—under the permissive Apache 2.0 license, marking its first major open-weight offering since GPT-2. The announcement, published on August 5, 2025, emphasizes that these models deliver state-of-the-art reasoning performance at a fraction of the cost associated with proprietary alternatives, and […]

128K

reasoning

Chat

openAI

GPT-OSS-120B API

All AI Models in One API
500+ AI Models

GPT-OSS-120B API

Core Features and Design Philosophy

Basic Information

Technical Details

Benchmarking and Performance Evaluation

Model Version

Limitations

How to call gpt-oss-120b API from CometAPI

`gpt-oss-120b` API Pricing in CometAPI，20% off the official price:

Required Steps

Use Method

Start Today

One API
Access 500+ AI Models!

Models API

Developer

Resources

Get in touch

128K

reasoning

Chat

openAI

GPT-OSS-120B API

All AI Models in One API 500+ AI Models

GPT-OSS-120B API

Core Features and Design Philosophy

Basic Information

Technical Details

Benchmarking and Performance Evaluation

Model Version

Limitations

How to call gpt-oss-120b API from CometAPI

gpt-oss-120b API Pricing in CometAPI，20% off the official price:

Required Steps

Use Method

Start Today

One API Access 500+ AI Models!

Related posts

How much computing power is required for GPT-OSS deployment?

OpenAI GPT-OSS: How to Run it Locally or self-host on Cloud, Hardware Requirements

Could GPT-OSS Be the Future of Local AI Deployment?

Models API

Developer

Resources

Get in touch

All AI Models in One API
500+ AI Models

`gpt-oss-120b` API Pricing in CometAPI，20% off the official price:

One API
Access 500+ AI Models!