GLM‑4.5 API

CometAPI
AnnaJul 28, 2025
GLM‑4.5 API

Zhipu’s GLM‑4.5 API is a unified RESTful service on the Z.ai (global) and Zhipu AI Open (Mainland China) platforms that exposes the 355 billion‑parameter, hybrid‑expert GLM‑4.5 model—capable of complex reasoning, coding, and agentic tasks—with configurable options (e.g., temperature, max tokens, streaming).


Basic Features

GLM‑4.5 is designed as a unified agentic model, integrating reasoning, coding, and autonomous decision‑making capabilities within a single architecture. It natively supports two operational modes—thinking for complex reasoning and tool usage, and non‑thinking for rapid, on‑demand responses—making it ideal for versatile agent workflows.


Technical Details

  • Parameter Scale: The flagship GLM‑4.5 comprises 355 billion total parameters with 32 billion active parameters.
  • Hybrid Reasoning: GLM‑4.5 employs a hybrid FP8 quantization strategy to optimize inference efficiency without substantially sacrificing accuracy.
  • Parameter Efficiency: Uses 32 B active parameters out of 355 B to minimize hardware load during inference .
  • Layer Optimization: Components pruned and redistributed into deeper layers, enhancing logical reasoning without ballooning model size .

Training Workflow

Multi‑Stage Training:

  1. Foundation Pre‑training on ~15 trillion tokens.
  2. Reasoning Fine‑tuning on >7 trillion curated tokens to sharpen decision‑making and code synthesis.

Benchmark Performance

On a suite of 12 industry‑standard benchmarks covering agentic, reasoning, and coding tasks, GLM‑4.5 achieved an overall score of 63.2, ranking third globally behind proprietary titans such as GPT‑4 and Grok 4. Highlights include:

BenchmarkGLM‑4.5 ScoreTop Proprietary Comparison
BrowseComp (web)26.4 %Claude 4 Opus: 18.8 %
MATH 50098.2 %GPT‑4 Turbo
AIME2491.0 %Claude 4 Sonnet
GPQA79.1 %Gemini 2.5 Pro

In a suite of 12 competitive tests—spanning coding, reasoning, and agentic benchmarks—GLM‑4.5 ranks third overall, matching or surpassing leading proprietary models such as Claude 4 Sonnet and Gemini 2.5 Pro on tasks like SWE‑bench and AIME24 .

GLM‑4.5


Model Versions

The GLM‑4.5 family includes several specialized variants accessible via API:

  • GLM‑4.5 (355 B total parameters; 32 B active)
  • GLM‑4.5‑Air (106 B total; lightweight, faster inference)
  • GLM‑4.5‑X, GLM‑4.5‑AirX (ultra‑fast inference)
  • GLM‑4.5‑Flash (free, optimized for coding & reasoning)

How to call GLM‑4.5 API from CometAPI

GLM‑4.5 Series API Pricing in CometAPI,20% off the official price:

ModelintroducePrice
glm-4.5Our most powerful reasoning model, with 355 billion parametersInput Tokens $0.48 Output Tokens $1.92
glm-4.5-airCost-Effective Lightweight Strong PerformanceInput Tokens $0.16 Output Tokens $1.07
glm-4.5-xHigh Performance Strong Reasoning Ultra-Fast ResponseInput Tokens $1.60 Output Tokens $6.40
glm-4.5-airxLightweight Strong Performance Ultra-Fast ResponseInput Tokens $0.02 Output Tokens $0.06
glm-4.5-flashStrong Performance Excellent for Reasoning Coding & AgentsInput Tokens $3.20 Output Tokens $12.80

Required Steps

  • Log in to cometapi.com. If you are not our user yet, please register first
  • Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.
  • Get the url of this site: https://api.cometapi.com/

Use Method

  1. Select the “glm-4.5” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience.
  2. Replace <YOUR_API_KEY> with your actual CometAPI key from your account.
  3. Insert your question or request into the content field—this is what the model will respond to.
  4. . Process the API response to get the generated answer.

CometAPI provides a fully compatible REST API—for seamless migration. Key details to API doc:

API Integration & Examples

Below is a Python snippet demonstrating how to invoke GLM‑4.5 via CometAPI’s API. Replace <API_KEY> and <PROMPT> accordingly:

import requests

API_URL = "https://api.cometapi.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json"
}
payload = {
    "model": "glm-4.5",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "<PROMPT>"}
    ],
    "max_tokens": 512,
    "temperature": 0.7
}

response = requests.post(API_URL, json=payload, headers=headers)
print(response.json())

Key Parameters:

  • model: Specifies the GLM‑4.5 variant
  • max_tokens: Controls output length
  • temperature: Adjusts creativity vs. determinism

See Also [GLM-4.5 Air API](https://www.cometapi.com/grok-4-api/)


Read More

500+ Models in One API

Up to 20% Off