Can DeepSeek-V4-Pro handle 1M-token documents in the API?

Yes. DeepSeek-V4-Pro with a 1M-token context length and up to 384K output tokens, so it is built for very long documents and multi-file workflows.

Does DeepSeek-V4-Pro support thinking mode and tool calls?

Yes. DeepSeek-V4-Pro supports both thinking and non-thinking modes, plus JSON output and tool calls.

When should I use DeepSeek-V4-Pro instead of DeepSeek-V4-Flash?

Use DeepSeek-V4-Pro when accuracy and agentic coding matter more than speed. DeepSeek says V4-Flash is the faster, more economical option, while V4-Pro is stronger on coding and broader agent evaluations.

Is DeepSeek-V4-Pro good for coding agents like Claude Code or OpenCode?

Yes. DeepSeek-V4-Pro configured for Claude Code and OpenCode, with `reasoningEffort` set to `max` and thinking enabled.

How do I integrate DeepSeek-V4-Pro with OpenAI-compatible SDKs?

Use the CometAPI base URL `https://api.cometapi.com` with the model name `deepseek-v4-pro`

Is DeepSeek-V4-Pro suitable for search-heavy research workflows?

Yes. V4-Pro performs strongly on search and retrieval-style tasks, and it outperforms DeepSeek-V3.2 by a substantial margin in both objective and subjective Q&A categories.

實惠的 DeepSeek V4 Pro API | text-to-text

技術規格

項目	DeepSeek-V4-Pro
提供者	DeepSeek
API 模型名稱	deepseek-v4-pro
基礎 URL	https://api.deepseek.com 與 https://api.deepseek.com/anthropic
輸入類型	文字
輸出類型	文字、工具呼叫、推理輸出
上下文長度	1,000,000 個 token
最大輸出	384,000 個 token
推理模式	非思考、思考（預設）
代理/程式碼預設	reasoning_effort 可設為 high；複雜的代理請求可能使用 max
支援功能	JSON 輸出、工具呼叫、Chat 前綴補全（測試版）、FIM 補全（非思考模式下的測試版）
本地/開源權重發佈	1.6T 總參數、49B 啟用參數、FP4 + FP8 混合精度
授權（模型卡）	MIT
參考模型卡	DeepSeek-V4-Pro 在 Hugging Face 上的預覽

什麼是 DeepSeek-V4-Pro？

DeepSeek-V4-Pro 是 DeepSeek 的 V4 預覽家族中更強大的成員。官方模型卡將其描述為一個具有 1.6T 參數、49B 啟用參數與百萬 token 上下文視窗的 MoE 模型，定位於長期知識工作、程式碼生成與代理任務。API 文件透過標準的 DeepSeek 聊天補全介面提供，並同時支援 OpenAI 與 Anthropic SDK 風格。

主要特性

百萬 token 上下文：DeepSeek 記錄了 1M token 的上下文長度，使得該模型適合非常大型的文檔集、程式庫以及多步驟的代理工作流程。
兩種推理模式：API 支援非思考與思考模式；思考為預設，文件指出像 Claude Code 或 OpenCode 這類複雜代理請求可能會自動使用「max」努力等級。
支援工具呼叫：DeepSeek 的思考模式支援工具呼叫，這對需要搜尋、檔案操作或外部函式的代理而言非常重要。
長上下文效能：模型卡表示 V4 採用混合注意力設計，使用 Compressed Sparse Attention 與 Heavily Compressed Attention，相較於 V3.2 降低長上下文的運算與 KV 快取成本。 citeturn980363view2
聚焦程式編碼與推理：DeepSeek 表示 V4-Pro-Max 推理模式在程式基準上有進展，並在推理與代理任務上縮小與領先閉源模型之間的差距。 citeturn980363view2
SDK 彈性：可透過相容 OpenAI 的標準聊天補全或使用 DeepSeek 的 Anthropic 相容端點進行偏工具導向的工作流程。

基準測試表現

官方 DeepSeek 模型卡報告了基礎模型家族及 V4-Pro-Max 比較組的以下評估結果。在基礎模型表中，V4-Pro 在多項知識與長上下文基準上優於 V3.2-Base，包括 MMLU-Pro（73.5 vs. 65.5）、FACTS Parametric（62.6 vs. 27.1）與 LongBench-V2（51.5 vs. 40.2）。

基準	V3.2-Base	V4-Flash-Base	V4-Pro-Base
MMLU-Pro（EM）	65.5	68.3	73.5
FACTS Parametric（EM）	27.1	33.9	62.6
HumanEval（Pass@1）	62.8	69.5	76.8
LongBench-V2（EM）	40.2	44.7	51.5

同一份模型卡也顯示 V4-Pro-Max 在特定任務上仍具備與前沿頂級模型競爭的能力。例如，在已發布的比較表中，其在 MMLU-Pro 取得 87.5、SimpleQA-Verified 57.9、GPQA Diamond 90.1、Terminal Bench 2.0 67.9。

DeepSeek-V4-Pro 與 DeepSeek-V4-Flash 與 DeepSeek-V3.2 的比較

模型	最佳適用	上下文	備註
DeepSeek-V4-Pro	重度推理、程式編碼、代理、長文檔	1M	V4 系列中最大模型，49B 啟用參數，整體能力最強。 citeturn980363view2turn980363view0
DeepSeek-V4-Flash	更快、更輕量的一般用途	1M	較小的 284B/13B 模型，仍支援思考模式與工具呼叫。 citeturn980363view2turn980363view0
DeepSeek-V3.2	前一代長上下文基線	早期 API 文件為 128K；V4 採用不同的 1M 上下文設計	可作為效率提升的參考；V4-Pro 的模型卡報告相較於 V3.2 在長上下文的 FLOPs 與 KV 快取上有大幅降低。 citeturn321011view1turn980363view2

最佳使用情境

存放庫規模的程式助理與重構工具
長文檔分析與綜合
需要多輪推理的工具型代理
受惠於長期記憶與結構化輸出的技術支援工作流程
中文與多語知識任務，模型卡在相關基準上表現強勁

如何存取並使用 Deepseek v4 pro API

步驟 1：申請 API 金鑰

登入 cometapi.com。若您尚未成為我們的使用者，請先註冊。登入您的 CometAPI console。取得介面的存取憑證 API 金鑰。在個人中心的 API token 處點選「Add Token」，取得 token 金鑰：sk-xxxxx 並提交。

步驟 2：將請求發送至 Deepseek v4 proAPI

選擇 “deepseek-v4-pro” 端點來發送 API 請求並設定請求本文。請求方法與請求本文可從我們網站的 API 文件取得。我們的網站也提供 Apifox 測試以利使用。將 <YOUR_API_KEY> 替換為您帳戶中的實際 CometAPI 金鑰。Where to call it: Anthropic Messages 格式與 Chat 格式。

將您的問題或請求插入到 content 欄位——模型會回應此內容。處理 API 回應以取得生成的答案。

步驟 3：擷取並驗證結果

處理 API 回應以取得生成的答案。處理後，API 會回傳任務狀態與輸出資料。可透過標準參數啟用串流、提示快取或長上下文處理等功能。

DeepSeek V4 Pro 的定價

探索 DeepSeek V4 Pro 的競爭性定價，專為滿足各種預算和使用需求而設計。我們靈活的方案確保您只需為實際使用量付費，讓您能夠隨著需求增長輕鬆擴展。了解 DeepSeek V4 Pro 如何在保持成本可控的同時提升您的專案效果。

彗星價格 (USD / M Tokens)	官方價格 (USD / M Tokens)	折扣
輸入:$0.416/M 輸出:$0.832/M	輸入:$0.52/M 輸出:$1.04/M	-20%

DeepSeek V4 Pro 的範例程式碼和 API

存取完整的範例程式碼和 API 資源，以簡化您的 DeepSeek V4 Pro 整合流程。我們詳盡的文件提供逐步指引，協助您在專案中充分發揮 DeepSeek V4 Pro 的潛力。

Python
JavaScript
Curl

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("
</reasoning>

<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("\n</reasoning>\n\n<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const client = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Which number is greater, 9.11 or 9.8? Answer with one sentence." },
  ],
  thinking: { type: "enabled" },
  reasoning_effort: "high",
  max_tokens: 256,
  stream: true,
});

let thinking = false;
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta ?? {};
  const reasoning = delta.reasoning_content ?? "";
  const content = delta.content ?? "";

  if (reasoning) {
    if (!thinking) {
      process.stdout.write("<reasoning>\n");
      thinking = true;
    }
    process.stdout.write(reasoning);
  }

  if (content) {
    if (thinking) {
      process.stdout.write("\n</reasoning>\n\n<answer>\n");
      thinking = false;
    }
    process.stdout.write(content);
  }
}

process.stdout.write("\n");

Curl Code Example

#!/usr/bin/env bash
# Get your CometAPI key from https://www.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

if ! command -v jq >/dev/null 2>&1; then
  echo "jq is required to parse streamed reasoning_content in this shell example." >&2
  exit 1
fi

thinking=false

curl --silent --no-buffer --location --request POST "https://api.cometapi.com/v1/chat/completions" \
  --header "Authorization: Bearer $COMETAPI_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
    "max_tokens": 256,
    "stream": true
  }' | while IFS= read -r line; do
    case "$line" in
      data:\ *) data=${line#data: } ;;
      *) continue ;;
    esac

    [ "$data" = "[DONE]" ] && break

    reasoning=$(printf '%s' "$data" | jq -r '.choices[0].delta.reasoning_content // empty')
    content=$(printf '%s' "$data" | jq -r '.choices[0].delta.content // empty')

    if [ -n "$reasoning" ]; then
      if [ "$thinking" = false ]; then
        printf '<reasoning>\n'
        thinking=true
      fi
      printf '%s' "$reasoning"
    fi

    if [ -n "$content" ]; then
      if [ "$thinking" = true ]; then
        printf '\n</reasoning>\n\n<answer>\n'
        thinking=false
      fi
      printf '%s' "$content"
    fi
  done

printf '\n'

DeepSeek V4 Pro