Can DeepSeek-V4-Pro handle 1M-token documents in the API?

Yes. DeepSeek-V4-Pro with a 1M-token context length and up to 384K output tokens, so it is built for very long documents and multi-file workflows.

Does DeepSeek-V4-Pro support thinking mode and tool calls?

Yes. DeepSeek-V4-Pro supports both thinking and non-thinking modes, plus JSON output and tool calls.

When should I use DeepSeek-V4-Pro instead of DeepSeek-V4-Flash?

Use DeepSeek-V4-Pro when accuracy and agentic coding matter more than speed. DeepSeek says V4-Flash is the faster, more economical option, while V4-Pro is stronger on coding and broader agent evaluations.

Is DeepSeek-V4-Pro good for coding agents like Claude Code or OpenCode?

Yes. DeepSeek-V4-Pro configured for Claude Code and OpenCode, with `reasoningEffort` set to `max` and thinking enabled.

How do I integrate DeepSeek-V4-Pro with OpenAI-compatible SDKs?

Use the CometAPI base URL `https://api.cometapi.com` with the model name `deepseek-v4-pro`

Is DeepSeek-V4-Pro suitable for search-heavy research workflows?

Yes. V4-Pro performs strongly on search and retrieval-style tasks, and it outperforms DeepSeek-V3.2 by a substantial margin in both objective and subjective Q&A categories.

实惠的 DeepSeek V4 Pro API | text-to-text

技术规格

项目	DeepSeek-V4-Pro
提供方	DeepSeek
API 模型名称	deepseek-v4-pro
基础 URL	https://api.deepseek.com 和 https://api.deepseek.com/anthropic
输入类型	文本
输出类型	文本、工具调用、推理输出
上下文长度	1,000,000 tokens
最大输出	384,000 tokens
推理模式	非思考、思考（默认）
Agent/编码默认设置	reasoning_effort 可设为 high；复杂 Agent 请求可能使用 max
支持的特性	JSON Output, Tool Calls, Chat Prefix Completion (beta), FIM Completion (beta in non-thinking mode)
本地/开源权重发布	总参数 1.6T，激活参数 49B，FP4 + FP8 混合精度
许可证（模型卡）	MIT
参考模型卡	Hugging Face 上的 DeepSeek-V4-Pro 预览

什么是 DeepSeek-V4-Pro？

DeepSeek-V4-Pro 是 DeepSeek 的 V4 预览家族中更强的一员。官方模型卡将其描述为一款 1.6T 参数的 MoE 模型，激活参数为 49B，提供一百万 token 的上下文窗口，面向长周期知识工作、代码生成与 Agent 任务。API 文档通过标准的 DeepSeek 聊天补全接口公开，且同时支持 OpenAI 与 Anthropic SDK 风格。

主要特性

百万 token 上下文：DeepSeek 文档记录了 1M-token 的上下文长度，适用于超大文档集、代码仓库以及多步 Agent 会话。
两种推理模式：API 支持非思考与思考模式；默认启用思考模式。文档指出，诸如 Claude Code 或 OpenCode 等复杂的 Agent 请求可能会自动使用 max 努力。
支持工具调用：DeepSeek 的思考模式支持工具调用，这对需要搜索、文件操作或外部函数的 Agent 十分重要。
长上下文效率：模型卡称 V4 采用由 Compressed Sparse Attention 与 Heavily Compressed Attention 组成的混合注意力设计，相较于 V3.2 降低了长上下文计算与 KV 缓存成本。 citeturn980363view2
编码与推理重点：DeepSeek 表示 V4-Pro-Max 推理模式在编程基准上有所进步，并在推理与 Agent 任务上大幅缩小了与领先闭源模型的差距。 citeturn980363view2
SDK 灵活性：既可通过兼容 OpenAI 的标准聊天补全，也可通过 DeepSeek 的 Anthropic 兼容端点，满足面向工具的工作流需求。

基准表现

官方 DeepSeek 模型卡报告了基础模型系列以及 V4-Pro-Max 对比集的如下评测结果。在基础模型表中，V4-Pro 在多项知识与长上下文基准上优于 V3.2-Base，包括 MMLU-Pro（73.5 vs. 65.5）、FACTS Parametric（62.6 vs. 27.1）与 LongBench-V2（51.5 vs. 40.2）。

基准	V3.2-Base	V4-Flash-Base	V4-Pro-Base
MMLU-Pro (EM)	65.5	68.3	73.5
FACTS Parametric (EM)	27.1	33.9	62.6
HumanEval (Pass@1)	62.8	69.5	76.8
LongBench-V2 (EM)	40.2	44.7	51.5

同一模型卡还显示，V4-Pro-Max 在选定任务上依然与顶级前沿模型保持竞争力。例如，它在 MMLU-Pro 上取得 87.5、在 SimpleQA-Verified 上取得 57.9、在 GPQA Diamond 上取得 90.1、在 Terminal Bench 2.0 上取得 67.9。

DeepSeek-V4-Pro vs DeepSeek-V4-Flash vs DeepSeek-V3.2

模型	最佳适配	上下文	备注
DeepSeek-V4-Pro	重度推理、编码、Agent、超大文档	1M	V4 系列中最大模型，激活参数 49B，整体能力最强。 citeturn980363view2turn980363view0
DeepSeek-V4-Flash	更快、更轻的一般用途	1M	更小的 284B/13B 模型，仍支持思考与工具调用。 citeturn980363view2turn980363view0
DeepSeek-V3.2	上一代长上下文基线	早期 API 文档为 128K；V4 采用不同的 1M 上下文设计	可作为效率提升的参考点；V4-Pro 的模型卡报告相较 V3.2 在长上下文 FLOPs 与 KV 缓存上有显著降低。 citeturn321011view1turn980363view2

最佳使用场景

以代码仓为尺度的编程助手与重构工具
长文档分析与综合
需要多轮推理的工具型 Agent
受益于长记忆与结构化输出的技术支持工作流
模型卡显示表现突出的中文与多语种知识任务

如何访问和使用 Deepseek v4 pro API

步骤 1：申请 API Key

登录 cometapi.com。若尚未成为我们的用户，请先注册。登录您的 CometAPI console。获取接口的访问凭证 API key。在个人中心的 API token 处点击“Add Token”，获取令牌 key：sk-xxxxx 并提交。

步骤 2：向 Deepseek v4 pro API 发送请求

选择“deepseek-v4-pro”端点发送 API 请求并设置请求体。请求方法与请求体可在我们网站的 API 文档中获取。我们的网站也提供 Apifox 测试以便使用。将 <YOUR_API_KEY> 替换为您账号中的实际 CometAPI key。在哪里调用： Anthropic Messages 格式和 Chat 格式。

将您的问题或请求插入到 content 字段中——模型将对此作出响应。处理 API 响应以获取生成的答案。

步骤 3：检索并验证结果

处理 API 响应以获取生成的答案。处理完成后，API 会返回任务状态与输出数据。可通过标准参数启用流式、提示缓存或长上下文处理等功能。

DeepSeek V4 Pro 的示例代码与 API

获取完整示例代码与 API 资源，简化 DeepSeek V4 Pro 的集成流程，我们提供逐步指导，助你发挥模型潜能。

Comet 价格 (USD / M Tokens)	官方定价 (USD / M Tokens)	折扣
输入:$0.416/M 输出:$0.832/M	输入:$0.52/M 输出:$1.04/M	-20%

Python
JavaScript
Curl

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("
</reasoning>

<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("\n</reasoning>\n\n<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const client = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Which number is greater, 9.11 or 9.8? Answer with one sentence." },
  ],
  thinking: { type: "enabled" },
  reasoning_effort: "high",
  max_tokens: 256,
  stream: true,
});

let thinking = false;
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta ?? {};
  const reasoning = delta.reasoning_content ?? "";
  const content = delta.content ?? "";

  if (reasoning) {
    if (!thinking) {
      process.stdout.write("<reasoning>\n");
      thinking = true;
    }
    process.stdout.write(reasoning);
  }

  if (content) {
    if (thinking) {
      process.stdout.write("\n</reasoning>\n\n<answer>\n");
      thinking = false;
    }
    process.stdout.write(content);
  }
}

process.stdout.write("\n");

Curl Code Example

#!/usr/bin/env bash
# Get your CometAPI key from https://www.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

if ! command -v jq >/dev/null 2>&1; then
  echo "jq is required to parse streamed reasoning_content in this shell example." >&2
  exit 1
fi

thinking=false

curl --silent --no-buffer --location --request POST "https://api.cometapi.com/v1/chat/completions" \
  --header "Authorization: Bearer $COMETAPI_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
    "max_tokens": 256,
    "stream": true
  }' | while IFS= read -r line; do
    case "$line" in
      data:\ *) data=${line#data: } ;;
      *) continue ;;
    esac

    [ "$data" = "[DONE]" ] && break

    reasoning=$(printf '%s' "$data" | jq -r '.choices[0].delta.reasoning_content // empty')
    content=$(printf '%s' "$data" | jq -r '.choices[0].delta.content // empty')

    if [ -n "$reasoning" ]; then
      if [ "$thinking" = false ]; then
        printf '<reasoning>\n'
        thinking=true
      fi
      printf '%s' "$reasoning"
    fi

    if [ -n "$content" ]; then
      if [ "$thinking" = true ]; then
        printf '\n</reasoning>\n\n<answer>\n'
        thinking=false
      fi
      printf '%s' "$content"
    fi
  done

printf '\n'

DeepSeek V4 Pro