Home/Models/Zhipu AI/GLM 5
Z

GLM 5

Input:$0.672/M
Output:$2.688/M
GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of GLM-5

ItemGLM-5 (reported)
Model familyGLM (Z.ai / Zhipu AI) — flagship generation
ArchitectureMixture-of-Experts (MoE) + sparse attention (DeepSeek/DSA optimizations).
Total parameters≈744–745B (MoE pool).
Active / routed params (per token)~40–44B active (depends on routing/experts).
Pre-training tokens~28.5T tokens (reported).
Context window (input)Up to 200,000 tokens (long-context mode).
Max output tokens128,000 tokens (max generation per call reported).
Input modalitiesText only (primary); engineered for rich text → outputs (doc/xlsx generation via tools).

What is GLM-5

GLM-5 is Zhipu AI’s next-generation foundation model that scales the GLM line with an MoE routing design and sparse attention optimizations to deliver long-context reasoning and agentic workflows (multi-step planning, code & system orchestration). It’s explicitly positioned to be an open-weights contender for agentic and engineering tasks, with enterprise accessibility via APIs and self-hosting.

🚀 Main Features of GLM-5

1. Agentic Intelligence & Reasoning

GLM-5 is optimized for workflows where the model breaks down long, complex tasks into ordered steps with reduced hallucination — a major improvement over prior GLM versions. It leads certain open weights model benchmarks on knowledge reliability and task productivity.

2. Long Context Support

With a 200K token context window, GLM-5 can sustain very long conversations, large documents, and extended reasoning chains without losing coherence — an increasingly critical capability for real-world professional applications.

3. DeepSeek Sparse Attention

By integrating a sparse attention mechanism, GLM-5 efficiently scales its memory footprint, allowing longer sequences without linear cost increases.

4. Tool Integration & Output Formats

Native support for structured outputs and external tool integrations (JSON, API calls, dynamic tool use) makes GLM-5 practical for enterprise applications like spreadsheets, reports, and automated coding assistants.

5. Cost Efficiency

GLM-5 is positioned as cost-competitive compared to proprietary counterparts, with input/output pricing substantially lower than major offerings, making it attractive for large-scale deployment.

Benchmark Performance of GLM-5

Multiple independent evaluations and early industry benchmarks show GLM-5 performing strongly among open-weight models:

  • It achieved record-low hallucination rates on the Artificial Analysis Intelligence Index — a measure of reliability and truthfulness — outperforming prior models by a wide margin.
  • Agent-centric benchmarks indicate substantial gains in complex task execution compared to GLM-4.7 and other open models.
  • Cost-to-performance metrics position GLM-5 as 4th quartile for speed but top tier (best) on intelligence and price among open-weight models.

Quantitative Scores (Example from ranking platform):

  • Intelligence Index: #1 among open weights models.
  • Pricing Efficiency: High ratings for low input/output costs.

How to access and use GLM-5 API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to glm-5 API

Select the “glm-5” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

What distinguishes GLM-5’s architecture from earlier GLM models?

GLM-5 uses a Mixture of Experts (MoE) architecture with ~745B total parameters and 8 active experts per token (~44B active), enabling efficient large-scale reasoning and agentic workflows compared to previous GLM series.

How long of a context window does GLM-5 support via its API?

GLM-5 supports a 200K token context window with up to 128K output tokens, making it suitable for extended reasoning and document tasks.

Can GLM-5 handle complex agentic and engineering tasks?

Yes — GLM-5 is explicitly optimized for long-horizon agent tasks and complex systems engineering workflows, with deep reasoning and planning capabilities beyond standard chat models.

Does GLM-5 support tool calling and structured output?

Yes — GLM-5 supports function calling, structured JSON outputs, context caching, and real-time streaming to integrate with external tools and systems.

How does GLM-5 compare to proprietary models like GPT and Claude?

GLM-5 is competitive with top proprietary models in benchmarks, performing close to Claude Opus 4.5 and offering significantly lower per-token costs and open-weight availability, though closed-source models may still lead in some fine-grained benchmarks.

Is GLM-5 open source and what license does it use?

Yes — GLM-5 is released under a permissive MIT license, enabling open-weight access and community development.

What are typical use cases where GLM-5 excels?

GLM-5 is well suited for long-sequence reasoning, agentic automation, coding assistance, creative writing at scale, and backend system design tasks that demand coherent multi-step outputs.

What are known limitations of GLM-5?

While powerful, GLM-5 is primarily text-only (no native multimodal support) and may be slower or more resource-intensive than smaller models, especially for shorter tasks.

Features for GLM 5

Explore the key features of GLM 5, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for GLM 5

Explore competitive pricing for GLM 5, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GLM 5 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.672/M
Output:$2.688/M
Input:$0.84/M
Output:$3.36/M
-20%

Sample code and API for GLM 5

Access comprehensive sample code and API resources for GLM 5 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of GLM 5 in your projects.
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

# glm-5: Zhipu GLM-5 model via chat/completions
completion = client.chat.completions.create(
    model="glm-5",
    messages=[{"role": "user", "content": "Hello! Tell me a short joke."}],
)

print(completion.choices[0].message.content)

More Models