Home/Models/Zhipu AI/GLM 4.6
Z

GLM 4.6

輸入:$0.96/M
輸出:$3.84/M
上下文:200,000
最大输出:128,000
Zhipu 最新旗舰模型 GLM-4.6 发布:总参数 355B,活跃参数 32B。整体核心能力超过 GLM-4.5。编程:与 Claude Sonnet 4 对齐,国内最佳。上下文:扩展至 200K(原为 128K)。推理:增强,支持 Tool 调用。搜索:优化 Tool 与 agent 框架。写作:更贴合人类偏好、写作风格与角色扮演。多语言:翻译效果增强。
新
商用
Playground
概览
功能亮点
定价
API

GLM-4.6 is the latest major release in Z.ai’s (formerly Zhipu AI) GLM family: a 4th-generation, large-language MoE (Mixture-of-Experts) model tuned for agentic workflows, long-context reasoning and real-world coding. The release emphasizes practical agent/tool integration, a very large context window, and open-weight availability for local deployment.

Key features

  • Long context — native 200K token context window (expanded from 128K). (docs.z.ai)
  • Coding & agentic capability — marketed improvements on real-world coding tasks and better tool invocation for agents.
  • Efficiency — reported ~30% lower token consumption vs GLM-4.5 on Z.ai’s tests.
  • Deployment & quantization — first announced FP8 and Int4 integration for Cambricon chips; native FP8 support on Moore Threads via vLLM.
  • Model size & tensor type — published artifacts indicate a ~357B-parameter model (BF16 / F32 tensors) on Hugging Face.

Technical details

Modalities & formats. GLM-4.6 is a text-only LLM (input and output modalities: text). Context length = 200K tokens; max output = 128K tokens.

Quantization & hardware support. The team reports FP8/Int4 quantization on Cambricon chips and native FP8 execution on Moore Threads GPUs using vLLM for inference — important for lowering inference cost and allowing on-prem and domestic cloud deployments.

Tooling & integrations. GLM-4.6 is distributed through Z.ai’s API, third-party provider networks (e.g., CometAPI), and integrated into coding agents (Claude Code, Cline, Roo Code, Kilo Code).

Technical details

Modalities & formats. GLM-4.6 is a text-only LLM (input and output modalities: text). Context length = 200K tokens; max output = 128K tokens.

Quantization & hardware support. The team reports FP8/Int4 quantization on Cambricon chips and native FP8 execution on Moore Threads GPUs using vLLM for inference — important for lowering inference cost and allowing on-prem and domestic cloud deployments.

Tooling & integrations. GLM-4.6 is distributed through Z.ai’s API, third-party provider networks (e.g., CometAPI), and integrated into coding agents (Claude Code, Cline, Roo Code, Kilo Code).

Benchmark performance

  • Published evaluations: GLM-4.6 was tested on eight public benchmarks covering agents, reasoning and coding and shows clear gains over GLM-4.5. On human-evaluated, real-world coding tests (extended CC-Bench), GLM-4.6 uses ~15% fewer tokens vs GLM-4.5 and posts a ~48.6% win rate vs Anthropic’s Claude Sonnet 4 (near-parity on many leaderboards).
  • Positioning: results claim GLM-4.6 is competitive with leading domestic and international models (examples cited include DeepSeek-V3.1 and Claude Sonnet 4).

Limitations & risks

  • Hallucinations & mistakes: like all current LLMs, GLM-4.6 can and does make factual errors — Z.ai’s docs explicitly warn outputs may contain mistakes. Users should apply verification & retrieval/RAG for critical content.
  • Model complexity & serving cost: 200K context and very large outputs dramatically increase memory & latency demands and can raise inference costs; quantized/inference engineering is required to run at scale.
  • Domain gaps: while GLM-4.6 reports strong agent/coding performance, some public reports note it still lags certain versions of competing models in specific microbenchmarks (e.g., some coding metrics vs Sonnet 4.5). Assess per-task before replacing production models.
  • Safety & policy: open weights increase accessibility but also raise stewardship questions (mitigations, guardrails, and red-teaming remain the user’s responsibility).

Use cases

  • Agentic systems & tool orchestration: long agent traces, multi-tool planning, dynamic tool invocation; the model’s agentic tuning is a key selling point.
  • Real-world coding assistants: multi-turn code generation, code review and interactive IDE assistants (integrated in Claude Code, Cline, Roo Code—per Z.ai). Token efficiency improvements make it attractive for heavy-use developer plans.
  • Long-document workflows: summarization, multi-document synthesis, long legal/technical reviews due to the 200K window.
  • Content creation & virtual characters: extended dialogues, consistent persona maintenance in multi-turn scenarios.

How GLM-4.6 compares to other models

  • GLM-4.5 → GLM-4.6: step change in context size (128K → 200K) and token efficiency (~15% fewer tokens on CC-Bench); improved agent/tool use.
  • GLM-4.6 vs Claude Sonnet 4 / Sonnet 4.5: Z.ai reports near parity on several leaderboards and a ~48.6% win rate on the CC-Bench real-world coding tasks (i.e., close competition, with some microbenchmarks where Sonnet still leads). For many engineering teams, GLM-4.6 is positioned as a cost-efficient alternative.
  • GLM-4.6 vs other long-context models (DeepSeek, Gemini variants, GPT-4 family): GLM-4.6 emphasizes large context & agentic coding workflows; relative strengths depend on metric (token efficiency/agent integration vs raw code synthesis accuracy or safety pipelines). Empirical selection should be task-driven.

Zhipu AI’s latest flagship model GLM-4.6 released: 355B total params, 32B active. Surpasses GLM-4.5 in all core capabilities.

  • Coding: Aligns with Claude Sonnet 4, best in China.
  • Context: Expanded to 200K (from 128K).
  • Reasoning: Improved, supports tool calling during inference.
  • Search: Enhanced tool calling and agent performance.
  • Writing: Better aligns with human preferences in style, readability, and role-playing.
  • Multilingual: Boosted cross-language translation.

常见问题

What are the context window and output limits for GLM-4-6?

GLM-4-6 supports a 200,000 token context window (extended from 128K in GLM-4.5) with up to 128,000 output tokens, enabling extensive document analysis and long-form generation.

How does GLM-4-6 compare to Claude Sonnet 4 in coding?

According to Zhipu, GLM-4-6's coding capabilities align with Claude Sonnet 4, making it the best coding model among Chinese domestic models.

Does GLM-4-6 support tool calling and agent workflows?

Yes, GLM-4-6 features improved inference capabilities with enhanced Tool calls support and an optimized agent framework for complex multi-step task automation.

What is the architecture of GLM-4-6?

GLM-4-6 is a Mixture-of-Experts model with 355B total parameters and 32B active parameters, balancing capability with efficiency.

What makes GLM-4-6 different from GLM-4.5?

GLM-4-6 offers extended context (200K vs 128K), improved reasoning and tool calling, enhanced writing aligned with human preferences, better multilingual translation, and optimized role-playing.

Is GLM-4-6 suitable for enterprise Chinese language applications?

Yes, GLM-4-6 is particularly strong for Chinese language tasks including translation, content writing, and conversational AI, with enhanced multilingual capabilities.

When should I choose GLM-4-6 over GPT-5.2 or Claude?

Choose GLM-4-6 for Chinese-first applications, cost-effective 200K context needs, or when you need a strong domestic AI alternative with coding capabilities comparable to frontier models.

GLM 4.6 的功能

了解 GLM 4.6 的核心能力,帮助提升性能与可用性,并改善整体体验。

GLM 4.6 的定价

查看 GLM 4.6 的竞争性定价,满足不同预算与使用需求,灵活方案确保随需求扩展。
Comet 价格 (USD / M Tokens)官方定价 (USD / M Tokens)折扣
輸入:$0.96/M
輸出:$3.84/M
輸入:$1.2/M
輸出:$4.8/M
-20%

GLM 4.6 的示例代码与 API

GLM-4.6 is the latest major release in Z.ai’s (formerly Zhipu AI) GLM family: a 4th-generation, large-language MoE (Mixture-of-Experts) model tuned for agentic workflows, long-context reasoning and real-world coding. The release emphasizes practical agent/tool integration, a very large context window, and open-weight availability for local deployment.
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="glm-4.6",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

更多模型