GLM-5 的技術規格

項目	GLM-5（據報）
模型家族	GLM（Z.ai / Zhipu AI）— 旗艦世代
架構	專家混合（MoE）+ 稀疏注意力（DeepSeek/DSA 優化）。
參數總量	≈744–745B（MoE 池）。
活躍 / 路由參數（每個 token）	~40–44B 活躍（取決於路由/專家）。
預訓練 token 數	~28.5T tokens（據報）。
上下文視窗（輸入）	最多 200,000 tokens（長上下文模式）。
最大輸出 tokens	128,000 tokens（據報每次呼叫的最長生成）。
輸入模態	僅文字（為主）；針對豐富文字 → 輸出進行工程化（透過工具生成 doc/xlsx）。

什麼是 GLM-5

GLM-5 是 Zhipu AI 的下一代基礎模型，透過 MoE 路由設計與稀疏注意力優化擴展 GLM 系列，以提供長上下文推理與具代理性的工作流程（多步規劃、程式碼與系統協同編排）。其明確定位為面向代理與工程任務的開放權重競爭者，並可透過 API 與自託管供企業存取。

🚀 GLM-5 的主要特性

1. 代理式智能與推理

GLM-5 針對將冗長且複雜的任務拆解為有序步驟的工作流程進行最佳化，同時降低幻覺——相較先前 GLM 版本為一大改進。它在部分開放權重模型基準上，於知識可靠性與任務生產力方面領先。

2. 長上下文支援

憑藉 200K token 的上下文視窗，GLM-5 能夠在不喪失連貫性的情況下，維持超長對話、大型文件與延展推理鏈——這對真實世界的專業應用愈發關鍵。

3. DeepSeek 稀疏注意力

透過整合稀疏注意力機制，GLM-5 能高效擴展記憶體佔用，使更長序列在成本不線性增加的前提下得以處理。

4. 工具整合與輸出格式

原生支援結構化輸出與外部工具整合（JSON、API 呼叫、動態工具使用），使 GLM-5 能實際應用於企業場景，如試算表、報告與自動化程式設計助理。

5. 成本效率

GLM-5 的定位具有與專有產品競爭的成本優勢，輸入/輸出定價明顯低於主要對手，適合大規模部署。

GLM-5 的基準表現

多項獨立評估與業界早期基準顯示，GLM-5 在開放權重模型中表現強勁：

在 Artificial Analysis Intelligence Index（衡量可靠性與真實性）上達成史上最低的幻覺率，遠勝以往模型。
以代理為中心的基準顯示，相較 GLM-4.7 與其他開放模型，在複雜任務執行上有顯著提升。
依照成本-效能指標，GLM-5 在速度上位於第 4 四分位，但在智能與價格方面於開放權重模型中名列頂尖。

量化分數（來自排名平台的示例）：

Intelligence Index: 開放權重模型中排名第 #1。
Pricing Efficiency: 以低輸入/輸出成本獲得高評等。

如何存取並使用 GLM-5 API

步驟 1：註冊取得 API 金鑰

登入 cometapi.com。若您尚非我們的使用者，請先註冊。登入您的 CometAPI console。取得介面的存取憑證 API 金鑰。在個人中心的 API token 處點選「Add Token」，取得金鑰：sk-xxxxx 並提交。

步驟 2：向 `glm-5` API 發送請求

選擇「glm-5」端點以發送 API 請求並設定請求本文。請求方法與請求本文可從我們網站的 API 文件取得。我們的網站也提供 Apifox 測試以供便利。將 <YOUR_API_KEY> 替換為您帳戶中的實際 CometAPI 金鑰。可呼叫位置：Chat 格式。

將您的問題或請求填入 content 欄位——模型將回應該內容。處理 API 回應以取得生成的答案。

步驟 3：擷取並驗證結果

處理 API 回應以取得生成的答案。處理後，API 將回傳任務狀態與輸出資料。

GLM-5 uses a Mixture of Experts (MoE) architecture with ~745B total parameters and 8 active experts per token (~44B active), enabling efficient large-scale reasoning and agentic workflows compared to previous GLM series.

GLM-5 supports a 200K token context window with up to 128K output tokens, making it suitable for extended reasoning and document tasks.

Yes — GLM-5 is explicitly optimized for long-horizon agent tasks and complex systems engineering workflows, with deep reasoning and planning capabilities beyond standard chat models.

Yes — GLM-5 supports function calling, structured JSON outputs, context caching, and real-time streaming to integrate with external tools and systems.

GLM-5 is competitive with top proprietary models in benchmarks, performing close to Claude Opus 4.5 and offering significantly lower per-token costs and open-weight availability, though closed-source models may still lead in some fine-grained benchmarks.

Yes — GLM-5 is released under a permissive MIT license, enabling open-weight access and community development.

GLM-5 is well suited for long-sequence reasoning, agentic automation, coding assistance, creative writing at scale, and backend system design tasks that demand coherent multi-step outputs.

While powerful, GLM-5 is primarily text-only (no native multimodal support) and may be slower or more resource-intensive than smaller models, especially for shorter tasks.