ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Zhipu AI/GLM 5
Z

GLM 5

Input:$0.8/M
Output:$3.2/M
GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of GLM-5

ItemGLM-5 (reported)
Model familyGLM (Z.ai / Zhipu AI) — flagship generation
ArchitectureMixture-of-Experts (MoE) + sparse attention (DeepSeek/DSA optimizations).
Total parameters≈744–745B (MoE pool).
Active / routed params (per token)~40–44B active (depends on routing/experts).
Pre-training tokens~28.5T tokens (reported).
Context window (input)Up to 200,000 tokens (long-context mode).
Max output tokens128,000 tokens (max generation per call reported).
Input modalitiesText only (primary); engineered for rich text → outputs (doc/xlsx generation via tools).

What is GLM-5

GLM-5 is Zhipu AI’s next-generation foundation model that scales the GLM line with an MoE routing design and sparse attention optimizations to deliver long-context reasoning and agentic workflows (multi-step planning, code & system orchestration). It’s explicitly positioned to be an open-weights contender for agentic and engineering tasks, with enterprise accessibility via APIs and self-hosting.

🚀 Main Features of GLM-5

1. Agentic Intelligence & Reasoning

GLM-5 is optimized for workflows where the model breaks down long, complex tasks into ordered steps with reduced hallucination — a major improvement over prior GLM versions. It leads certain open weights model benchmarks on knowledge reliability and task productivity.

2. Long Context Support

With a 200K token context window, GLM-5 can sustain very long conversations, large documents, and extended reasoning chains without losing coherence — an increasingly critical capability for real-world professional applications.

3. DeepSeek Sparse Attention

By integrating a sparse attention mechanism, GLM-5 efficiently scales its memory footprint, allowing longer sequences without linear cost increases.

4. Tool Integration & Output Formats

Native support for structured outputs and external tool integrations (JSON, API calls, dynamic tool use) makes GLM-5 practical for enterprise applications like spreadsheets, reports, and automated coding assistants.

5. Cost Efficiency

GLM-5 is positioned as cost-competitive compared to proprietary counterparts, with input/output pricing substantially lower than major offerings, making it attractive for large-scale deployment.

Benchmark Performance of GLM-5

Multiple independent evaluations and early industry benchmarks show GLM-5 performing strongly among open-weight models:

  • It achieved record-low hallucination rates on the Artificial Analysis Intelligence Index — a measure of reliability and truthfulness — outperforming prior models by a wide margin.
  • Agent-centric benchmarks indicate substantial gains in complex task execution compared to GLM-4.7 and other open models.
  • Cost-to-performance metrics position GLM-5 as 4th quartile for speed but top tier (best) on intelligence and price among open-weight models.

Quantitative Scores (Example from ranking platform):

  • Intelligence Index: #1 among open weights models.
  • Pricing Efficiency: High ratings for low input/output costs.

How to access and use GLM-5 API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to glm-5 API

Select the “glm-5” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

What distinguishes GLM-5’s architecture from earlier GLM models?

GLM-5 uses a Mixture of Experts (MoE) architecture with ~745B total parameters and 8 active experts per token (~44B active), enabling efficient large-scale reasoning and agentic workflows compared to previous GLM series.

How long of a context window does GLM-5 support via its API?

GLM-5 supports a 200K token context window with up to 128K output tokens, making it suitable for extended reasoning and document tasks.

Can GLM-5 handle complex agentic and engineering tasks?

Yes — GLM-5 is explicitly optimized for long-horizon agent tasks and complex systems engineering workflows, with deep reasoning and planning capabilities beyond standard chat models.

Does GLM-5 support tool calling and structured output?

Yes — GLM-5 supports function calling, structured JSON outputs, context caching, and real-time streaming to integrate with external tools and systems.

How does GLM-5 compare to proprietary models like GPT and Claude?

GLM-5 is competitive with top proprietary models in benchmarks, performing close to Claude Opus 4.5 and offering significantly lower per-token costs and open-weight availability, though closed-source models may still lead in some fine-grained benchmarks.

Is GLM-5 open source and what license does it use?

Yes — GLM-5 is released under a permissive MIT license, enabling open-weight access and community development.

What are typical use cases where GLM-5 excels?

GLM-5 is well suited for long-sequence reasoning, agentic automation, coding assistance, creative writing at scale, and backend system design tasks that demand coherent multi-step outputs.

What are known limitations of GLM-5?

While powerful, GLM-5 is primarily text-only (no native multimodal support) and may be slower or more resource-intensive than smaller models, especially for shorter tasks.

Features for GLM 5

Explore the key features of GLM 5, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for GLM 5

Explore competitive pricing for GLM 5, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GLM 5 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.8/M
Output:$3.2/M
Input:$1/M
Output:$4/M
-20%

Sample code and API for GLM 5

Access comprehensive sample code and API resources for GLM 5 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of GLM 5 in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

# glm-5: Zhipu GLM-5 model via chat/completions
completion = client.chat.completions.create(
    model="glm-5",
    messages=[{"role": "user", "content": "Hello! Tell me a short joke."}],
)

print(completion.choices[0].message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

# glm-5: Zhipu GLM-5 model via chat/completions
completion = client.chat.completions.create(
    model="glm-5",
    messages=[{"role": "user", "content": "Hello! Tell me a short joke."}],
)

print(completion.choices[0].message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://api.cometapi.com/console/token
const COMETAPI_KEY = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const BASE_URL = "https://api.cometapi.com/v1";

const client = new OpenAI({
  apiKey: COMETAPI_KEY,
  baseURL: BASE_URL,
});

// glm-5: Zhipu GLM-5 model via chat/completions
const completion = await client.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "Hello! Tell me a short joke." }],
});

console.log(completion.choices[0].message.content);

Curl Code Example

#!/bin/bash

# Get your CometAPI key from https://api.cometapi.com/console/token
COMETAPI_KEY="${COMETAPI_KEY:-<YOUR_COMETAPI_KEY>}"

# glm-5: Zhipu GLM-5 model via chat/completions
curl -s https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "glm-5",
    "messages": [
      {
        "role": "user",
        "content": "Hello! Tell me a short joke."
      }
    ]
  }'

More Models

C

Claude Opus 4.7

Input:$4/M
Output:$20/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
X

Grok 4.3

Input:$1/M
Output:$2/M
Excels at agentic reasoning, knowledge work, and tool use.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.

Related Blog

 GLM-5.1 + Claude Code Guide (2026): Setup, Benchmarks, Cost Comparison, and the Best API Strategy for Developers
Apr 28, 2026
glm-5-1

GLM-5.1 + Claude Code Guide (2026): Setup, Benchmarks, Cost Comparison, and the Best API Strategy for Developers

GLM-5.1 can be used with Claude Code by connecting it throughan OpenAl-compatible or Anthropic-compatible API bridge, allowing developers toleverage Claude Code's age workflow while using GLM-5.1's lower-cost, high-performarnce coding model. This setup gives teams access to long-horizon autonomous co.ding, stronger terminal task execution, and significantly reduced API costs compared with ClaudeOpus, while preserving the Claude Code developer experience.
How to Use GLM-5.1 API
Apr 19, 2026
glm-5-1

How to Use GLM-5.1 API

GLM-5.1 is Z.ai’s flagship open-source model (released April 7, 2026) optimized for long-horizon agentic tasks like autonomous coding and multi-step reasoning. To use the GLM-5.1 API, use CometAPI for cheaper unified access, get your API key
What is GLM-5.1? All You need to know
Apr 8, 2026

What is GLM-5.1? All You need to know

GLM-5.1 is Z.ai’s (formerly Zhipu AI) next-generation open-source flagship large language model, released on April 7, 2026. Built for agentic engineering and long-horizon tasks, it delivers state-of-the-art performance on SWE-Bench Pro (58.4 score), leads all open-source models, ranks #3 globally across major coding benchmarks, and achieves 94.6% of Claude Opus 4.6’s coding performance in independent evaluations (45.3 vs. 47.9). It sustains productive optimization over 600+ iterations and 8+ hour autonomous runs—far beyond GLM-5—while remaining significantly cheaper than Western frontier models.
Google Gemma 4: The Complete Guide to Google's Open-Source AI Model (2026)
Apr 5, 2026

Google Gemma 4: The Complete Guide to Google's Open-Source AI Model (2026)

Gemma 4 is Google DeepMind’s latest open model family, launched on March 31, 2026 and announced publicly on April 2, 2026. It is designed for advanced reasoning, agentic workflows, multimodal understanding, and efficient deployment across phones, laptops, workstations, and edge devices. Google says the family ships in four versions — E2B, E4B, 26B A4B, and 31B Dense — with up to 256K context, support for more than 140 languages, open weights, and an Apache 2.0 license.
GLM-5V-Turbo: Turns Design Drafts into Executable Code in Seconds – 2026 Full Review
Apr 4, 2026

GLM-5V-Turbo: Turns Design Drafts into Executable Code in Seconds – 2026 Full Review

GLM-5V-Turbo is Zhipu AI’s (Z.ai) first native multimodal coding foundation model, released April 1-2, 2026. It natively processes images, videos, design drafts, screenshots, and text to generate complete, runnable frontend code, debug interfaces, and power GUI agents. Key specs include 200K token context, up to 128K output tokens, and leading benchmarks such as 94.8 on Design2Code (vs. Claude Opus 4.6’s 77.3). Pricing starts at $1.20 per million input tokens and $4 per million output tokens via API. It excels at “design-to-code” workflows while maintaining top-tier pure-text coding performance.