Home/Models/Doubao/Doubao-Seed-1.8
X

Doubao-Seed-1.8

Input:$0.2/M
Output:$1.6/M
Context:256k
Max Output:224k
Doubao-Seed-1.8 is optimized for multimodal agent scenarios. In terms of agent capabilities, tool use and complex command compliance have been significantly enhanced. Regarding multimodal understanding, basic visual capabilities have been significantly improved, enabling low-frame-rate understanding of extremely long videos. Video motion understanding, complex spatial understanding, and document structure parsing capabilities have also been optimized, and intelligent context management is now natively supported, allowing users to configure context strategies.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of Seed 1.8 API

ItemSpecification / note
Model name / familyDoubao-Seed-1.8 (Seed1.8) — ByteDance Seed / Volcano Engine
Modalities supportedText, images, video (multimodal VLM capabilities), audio tooling in ecosystem (separate models for audio/video generation).
Context window (text)256K tokens
Video / visual capacityDesigned for long-video reasoning, supports efficient visual encoding and large video-token budgets (model card reports video token experiments and long-video benchmarks).
Input formatsFree-text prompts; image uploads (screenshots, charts, photos); video as tokenized frames / video tools for segment inspection; file uploads (documents).
Output formatsNatural-language text, structured outputs (structured-output beta), function calls / tool calls, code, and multimodal outputs via orchestration.
Thinking / inference modesno_think, think-low, think-medium, think-high — trade accuracy vs latency/cost.

What is Doubao Seed 1.8?

Doubao Seed 1.8 is the Seed team’s 1.8 release: a unified LLM+VLM that explicitly targets generalized real-world agency — i.e., perception (images/video), reasoning, tool orchestration (search, function calls, code execution, GUI grounding) and multi-step decision making inside a single model. The design emphasizes configurable “thinking modes” (tradeoffs between latency and depth), efficient visual encoding and native support for long context and multimodal inputs so the model can operate as an autonomous assistant/agent in production workflows.

Main features of Seed 1.8 API

  1. Unified multimodal agentic model. Integrates perception (image/video), reasoning (LLM), and action (tool/G U I calls, code execution) in a single model rather than a split pipeline. This enables compact agent workflows and lower orchestration complexity.
  2. Ultra-long context & long-video handling. Long context (product support to 256k tokens) and specific long-video benchmarks (Seed1.8 shows strong long-video token efficiency). Model supports selective video tools (VideoCut) to focus reasoning on timestamps.
  3. Agentic GUI automation & tool use. Benchmarks and internal tests (OSWorld, AndroidWorld, LiveCodeBench, GUI grounding benchmarks) show improvements in GUI agent tasks and multi-step automation. The model can output GUI grounding commands and operate within simulated OS/web/mobile contexts.
  4. Configurable thinking modes for latency/cost control. Four inference modes let developers tune compute at test-time for interactive vs. high-quality batch tasks. This is useful for production systems with strict latency budgets.
  5. Improved token efficiency (multimodal). Seed 1.8 demonstrates stronger token efficiency on multimodal benchmarks versus its predecessors (Seed-1.5/1.6 series), achieving high accuracy with smaller token budgets in several long-video tasks.
  6. Configurable thinking modes: trade inference depth vs latency/cost with distinct modes (no_think → think-high) to tune for interactive production use.
  7. Technical capabilities
  • Token efficiency: Seed1.8 shows marked token efficiency vs predecessors (Seed-1.5/1.6), delivering stronger accuracy at lower token budgets on long video tasks (e.g., achieving competitive accuracy even at 32K video tokens). This enables lower inference cost for long inputs.
  • Multimodal reasoning & perception: The model reaches SOTA on several multi-image VQA and motion/perception tasks and obtains second-place or near-SOTA on many multimodal reasoning benchmarks; specifically it outperforms its predecessor on nearly every visual/video dimension measured.
  • Agentic tool use & GUI grounding: Documented support for GUI grounding and screen-based operation benchmarks (ScreenSpot-Pro, GUI agenting) with strong grounding scores (e.g., improvements over Seed-1.5-VL on ScreenSpot-Pro).
  • Parallel / stepped reasoning: Increasing test-time compute (parallel thinking) yields measurable gains on math, coding, and multi-modal reasoning benchmarks

Selected public benchmark highlights of Seed1.8

  • VCRBench (visual commonsense reasoning): Seed1.8 scored 59.8 (Pass@1 reported in the model card table), an improvement over Seed-1.5-VL and competitive with top models
  • VideoHolmes (video reasoning): Seed1.8 65.5, outperforming Seed-1.5-VL and approaching pro-grade competitor models.
  • MMLB-NIAH (multimodal long-context, 128k): Seed1.8 achieved 72.2 Pass@1 at 128k context in MMLB-NIAH, surpassing some contemporary pro models.
  • Motion & Perception suite: SOTA in 5 of 6 evaluated tasks; examples include TVBench, TempCompass and TOMATO where Seed1.8 shows substantial gains in temporal perception.
  • Agentic workflows: On BrowseComp and other agentic search/code benchmarks, Seed1.8 often ranks near or above competing pro models

Seed 1.8 vs Gemini 3 Pro / GPT-5.x

  • Seed1.8 vs Seed-1.5-VL / Seed-1.6: Clear improvements in multimodal perception, token efficiency for long videos, and agentic execution.
  • Seed1.8 vs Gemini 3 Pro / GPT-5.x: On many multimodal benchmarks Seed1.8 matches or exceeds Gemini 3 Pro (SOTA on several VQA / motion tasks; better on MMLB-NIAH 128k run). However, the card also shows areas where Gemini family models retain advantages on certain disciplinary knowledge tasks — so the relative ordering is benchmark-dependent.
  • Seed-Code variant (Doubao-Seed-Code): specialized for programming/agentic code tasks (large context for codebases; specialized SWE benchmarks). Seed1.8 is the generalist agentic multimodal model, while Seed-Code is the programming-focused variant.

Practical use-cases by the Seedream 4.5 API on CometAPI

  • Multimodal research assistants & document analysis: extract, summarize, and reason across long documents, slide decks, and multi-page reports.
  • Long-video comprehension & monitoring: security/sports broadcasting analytics, long meeting summarization, and streaming analysis where the model’s long-video token efficiency matters.
  • Agentic workflows / automation: multi-step web search + code execution + data extraction scenarios (e.g., automated competitive analysis, travel planning, research pipelines demonstrated in internal benchmarks).
  • Developer tooling (if using Seed-Code): large codebase analysis, IDE assistants, and agentic code execution for testing & repair (Seed-Code is the recommended specialized variant).
  • GUI automation & RPA: screen grounding and GUI agent benchmarks indicate the model can perform structured GUI tasks better than prior Seed releases.

How to Use doubao Seed 1.8 API via CometAPI

Doubao seed1.8 is exposed commercially through CometAPI as a hosted inference API now. The API supports multimodal payloads (text + images + video fragments / timestamps) and configurable inference modes to trade latency and compute against answer quality.

Call patterns: The API supports standard chat/completion style requests, streaming responses, and agentic flows where the model issues tool calls (search, code execution, GUI actions) and ingests tool outputs as subsequent context.

Streaming & long-context handling: The API supports streaming and has built-in context management primitives for long sessions (to enable 100K+ contexts / multi-step agent traces).

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to doubao Seed 1.8 API

Select the “doubao-seed-1-8-251228 ”endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Compatibility with the Chat APIs.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

What variants exist of Seed 1.8 and when to use each?

Seed1.8 is the generalist multimodal agent. Related variants include: Seed-Code / Doubao-Seed-Code: specialized for very large code contexts (some SKUs claim 256K contexts) and coding workflows. Seedance / Seedream: media/generation specialized variants (video/image generation). Pick Seed-Code for IDE/codebase tasks; pick Seed1.8 for broad multimodal agent tasks. Confirm SKU context windows and capabilities in product docs.

How does Seed1.8 differ from prior Seed versions?

Seed1.8 emphasizes agentic integration (tool use, GUI agenting, multi-step workflows), improved long-context handling and better long-video/motion perception vs earlier Seed 1.x models. It is positioned as the multimodal/agent upgrade in the Seed line.

What input/output modalities does Seed1.8 support?

Native multimodal support: text + images + video. Outputs include natural language answers, structured outputs (JSON/action plans), code, and references to visual segments/timestamps for agentic workflows. The model is explicitly designed for multimodal perception → reasoning → action.

What are the “thinking” or inference modes of Seed1.8?

There are tunable “thinking” modes — designed to trade off latency/compute vs. depth of reasoning (useful when you must balance interactivity vs. solution quality). Use the modes to tune for interactive UIs or deeper batch reasoning.

Features for Doubao-Seed-1.8

Explore the key features of Doubao-Seed-1.8, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Doubao-Seed-1.8

Explore competitive pricing for Doubao-Seed-1.8, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Doubao-Seed-1.8 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.2/M
Output:$1.6/M
Input:$0.25/M
Output:$2/M
-20%

Sample code and API for Doubao-Seed-1.8

Doubao seed1.8 is exposed commercially through CometAPI as a hosted inference API now. The API supports multimodal payloads (text + images + video fragments / timestamps) and configurable inference modes to trade latency and compute against answer quality.
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="doubao-seed-1-8-251228",
    max_completion_tokens=65535,
    extra_body={"reasoning_effort": "medium"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://ark-project.tos-cn-beijing.ivolces.com/images/view.jpeg"
                    },
                },
                {"type": "text", "text": "What is the main idea of the picture?"},
            ],
        }
    ],
)

print(completion.choices[0].message.content)

More Models