How to Debug Failed AI API Generations

AI API failures are different from regular API failures. A 200 response doesn't mean your generation succeeded. A null content field isn't always an error. And the same prompt that worked yesterday might fail today because a provider updated their content policy.

This guide covers how to read AI API errors, what each failure mode actually means, and how to build error handling that tells you what broke instead of just that something broke.

Note: Model names like gpt-5.4 and gpt-5.4-mini used in this article are CometAPI's platform identifiers. They work through https://api.cometapi.com/v1 only — not through OpenAI or Anthropic's APIs directly. See the full model list.

Why AI API debugging is harder than regular API debugging

With a typical REST API, a 200 means success and a 4xx means you did something wrong. AI APIs add a third category: soft failures — responses that return 200 but contain no usable content.

Three things can go wrong:

Hard failure — HTTP error (4xx, 5xx). The request didn't complete.
Soft failure — HTTP 200, but finish_reason is content_filter or length, or content is null.
Silent failure — HTTP 200, content looks fine, but the output is wrong in a way you only catch at the application layer.

Most error handling only covers case 1. Cases 2 and 3 are where most production bugs live.

Understand the error response format

The text completions endpoint returns a consistent error structure:

{  "error": {    "message": "human-readable description (often includes request id)",    "type": "comet_api_error",    "param": "the_problematic_parameter_or_null",    "code": "error_code_or_null"  }}

Image and video endpoints return different error formats — always parse the raw response body rather than assuming a fixed structure across endpoints.

The message field usually tells you exactly what's wrong. The param field tells you which parameter caused it. Always log both.

Know what each HTTP status code means

Status	Meaning	Common cause	Fix
400	Bad request	Missing model, wrong parameter for this model	Check error.param in the response
401	Unauthorized	Wrong or missing API key	Verify Authorization: Bearer <key> format
429	Rate limited	Too many requests	Exponential backoff (see Step 4)
500	Server error	Provider-side issue, or malformed request body	Retry with backoff; check request format
504	Gateway timeout	Provider took too long	Retry; consider a faster model

Source**: CometAPI chat completions docs

The 400 vs 500 distinction matters for retry logic. A 400 means your request is wrong — retrying the same request won't help. A 500 or 504 means the server had a problem — retrying makes sense.

Check `finish_reason` — the most overlooked field

A 200 response with finish_reason: "content_filter" means your generation was blocked. The content field will be null or empty. If you don't check this, your app silently returns nothing.

finish_reason	Meaning	What to do	Fix
stop	Normal completion	Nothing — this is success	Check error.param in the response
length	Hit token limit	Increase max_tokens or shorten the prompt	Verify Authorization: Bearer <key> format
content_filter	Blocked by safety policy	Rephrase the prompt; avoid specific names/topics	Exponential backoff (see Step 4)
tool_calls	Model called a tool instead of returning text	Handle the tool call; content will be null	Retry with backoff; check request format
504	Gateway timeout	Provider took too long	Retry; consider a faster model

Source**: CometAPI chat completions docs

import osimport loggingfrom openai import OpenAI, APIStatusError, APIConnectionError, APITimeoutErrorfrom dotenv import load_dotenvload_dotenv()api_key = os.environ.get("COMETAPI_KEY")if not api_key:    raise ValueError("COMETAPI_KEY is not set")client = OpenAI(    base_url="https://api.cometapi.com/v1",    api_key=api_key,)def safe_complete(messages: list, model: str = "gpt-5.4-mini", **kwargs) -> dict:    """    Complete a chat request with full error and finish_reason handling.    Returns {"content": str, "finish_reason": str, "tool_calls": list | None}    Raises on API errors.    """    try:        response = client.chat.completions.create(            model=model,            messages=messages,            **kwargs        )    except APIStatusError as e:        error_body = {}        try:            error_body = e.response.json().get("error", {})        except Exception:            pass        logging.error(            f"API error status={e.status_code} "            f"message={error_body.get('message')} "            f"param={error_body.get('param')}"        )        raise    except (APIConnectionError, APITimeoutError) as e:        logging.error(f"Network/timeout error: {e}")        raise    choice = response.choices[0]    finish_reason = choice.finish_reason    if finish_reason == "content_filter":        raise ValueError(            f"Generation blocked by content filter. "            f"Model: {model}. Rephrase the prompt."        )    if finish_reason == "length":        used = response.usage.completion_tokens if response.usage else "unknown"        logging.warning(f"Output truncated at token limit. Used {used} tokens.")    # Return structured result so callers can handle tool_calls explicitly    return {        "content": choice.message.content or "",        "finish_reason": finish_reason,        "tool_calls": choice.message.tool_calls,    }# Usageresult = safe_complete(    messages=[{"role": "user", "content": "Summarize this article: [text]"}],    model="gpt-5.4-mini")if result["finish_reason"] == "tool_calls":    # Handle tool call — content will be empty    print("Model wants to call a tool:", result["tool_calls"])else:    print(result["content"])

Detect silent failures at the application layer

Silent failures are the hardest to catch. The API returns 200, finish_reason is stop, but the output is semantically wrong. You can only catch these at the application layer.

Common patterns:

def validate_completion(result: dict, task: str) -> str:    """    Application-layer validation for silent failures.    Raises ValueError if the output doesn't meet basic expectations.    """    content = result["content"].strip()    # Empty output that isn't a tool call    if not content and result["finish_reason"] != "tool_calls":        raise ValueError(f"Empty output for task '{task}' with finish_reason='{result['finish_reason']}'")    # Task-specific checks    if task == "classify":        valid_labels = {"positive", "negative", "neutral"}        if content.lower() not in valid_labels:            logging.warning(                f"Unexpected classification output: '{content}'. "                f"Expected one of {valid_labels}. "                f"Model may have returned explanation instead of label."            )    if task == "json_extract":        import json        try:            json.loads(content)        except json.JSONDecodeError:            raise ValueError(                f"Expected JSON output but got: '{content[:100]}...'. "                f"Try adding 'respond with valid JSON only' to the prompt, "                f"or use response_format={{\"type\": \"json_object\"}}."            )    if task == "summarize" and len(content.split()) < 10:        logging.warning(            f"Suspiciously short summary ({len(content.split())} words). "            f"Check if the input was too short or the model misunderstood the task."        )    return content# Full flow with silent failure detectionresult = safe_complete(    messages=[{"role": "user", "content": "Classify as positive/negative/neutral: 'Great product!'"}],    model="claude-haiku-4-5")label = validate_completion(result, task="classify")

Silent failures usually come from one of three sources: the prompt is ambiguous, the model ignored your format instructions, or the input was too short/long for the task. Logging the full output when validation fails is the fastest way to diagnose which one.

Add exponential backoff for rate limits

Rate limit errors (429) are temporary. The right response is to wait and retry with increasing delays — a standard practice for any API with rate limits:

import timeimport randomfrom openai import RateLimitErrordef complete_with_retry(    messages: list,    model: str = "gpt-5.4-mini",    max_retries: int = 3,    **kwargs) -> dict:    """Retry on rate limits and server errors with exponential backoff."""    last_error = None    for attempt in range(max_retries):        try:            return safe_complete(messages, model=model, **kwargs)        except APIStatusError as e:            if e.status_code < 500:                raise  # 4xx: don't retry, request is wrong            last_error = e        except RateLimitError as e:            last_error = e        except (APIConnectionError, APITimeoutError) as e:            last_error = e        if attempt < max_retries - 1:            wait = (2 ** attempt) + random.random()  # jitter prevents thundering herd            logging.warning(f"Attempt {attempt + 1} failed. Waiting {wait:.1f}s before retry.")            time.sleep(wait)    raise RuntimeError(f"All {max_retries} attempts failed") from last_error

Don't retry on 400 or 401 — those are client errors that won't resolve on their own.

Debug image generation failures

Image generation has its own failure modes on top of the standard HTTP errors:

import base64import requestsdef generate_image_safe(prompt: str, model: str = "dall-e-3") -> dict:    """    Generate an image with full error handling.    Returns {"url": str | None, "bytes": bytes | None, "blocked": bool}    """    api_key = os.environ.get("COMETAPI_KEY")    if not api_key:        raise ValueError("COMETAPI_KEY is not set")    BASE64_MODELS = {"gpt-image-2", "qwen-image"}    headers = {        "Authorization": f"Bearer {api_key}",        "Content-Type": "application/json"    }    payload = {"model": model, "prompt": prompt, "size": "1024x1024"}    if model in BASE64_MODELS:        payload["output_format"] = "png"    else:        payload["response_format"] = "url"    try:        response = requests.post(            "https://api.cometapi.com/v1/images/generations",            json=payload,            headers=headers,            timeout=60        )        response.raise_for_status()    except requests.exceptions.HTTPError as e:        logging.error(f"Image generation HTTP error: {e.response.status_code} {e.response.text}")        raise    except requests.exceptions.Timeout:        logging.error("Image generation timed out after 60s")        raise    data = response.json().get("data", [])    if not data:        logging.warning("Image generation returned empty data — prompt may have been filtered.")        return {"url": None, "bytes": None, "blocked": True}    item = data[0]    if "revised_prompt" in item:        logging.info(f"Provider revised prompt to: {item['revised_prompt']}")    if "url" in item:        return {"url": item["url"], "bytes": None, "blocked": False}    return {        "url": None,        "bytes": base64.b64decode(item["b64_json"]),        "blocked": False    }

Image-specific issues to watch for:

Symptom	Cause	Fix
Empty data array	Prompt filtered	Check revised_prompt; rephrase
response_format error on GPT Image 2	Parameter not supported	Use output_format instead
n > 1 error on Qwen Image	Model limitation	Loop requests instead
URL returns 403 later	URL expired	Download immediately after generation

Source**: CometAPI image generation docs

Debug video generation failures

Video generation fails differently because it's async. Initialize status variables before the loop so the timeout error message is always well-formed:

def submit_and_poll_video(    prompt: str,    model: str = "veo3-fast",    max_wait: int = 600) -> str:    """Submit video task and poll to completion. Returns video URL."""    api_key = os.environ.get("COMETAPI_KEY")    if not api_key:        raise ValueError("COMETAPI_KEY is not set")    headers = {"Authorization": f"Bearer {api_key}"}    try:        response = requests.post(            "https://api.cometapi.com/v1/videos",            headers=headers,            files={                "prompt": (None, prompt),                "model": (None, model),                "size": (None, "16x9")            },            timeout=30        )        response.raise_for_status()    except requests.exceptions.HTTPError as e:        logging.error(f"Video submit failed: {e.response.status_code} {e.response.text}")        raise    task_id = response.json()["id"]    logging.info(f"Video task submitted: {task_id}")    poll_url = f"https://api.cometapi.com/v1/videos/{task_id}"    elapsed = 0    interval = 10    status = "unknown"   # initialize before loop    progress = 0         # initialize before loop    while elapsed < max_wait:        try:            poll_response = requests.get(poll_url, headers=headers, timeout=30)            poll_response.raise_for_status()        except requests.exceptions.HTTPError as e:            logging.error(f"Poll request failed: {e.response.status_code}")            raise        result = poll_response.json()        status = result.get("status", "unknown")        progress = result.get("progress", 0)        logging.info(f"Task {task_id}: status={status} progress={progress}%")        if status == "succeeded":            return result["output"][0]        elif status in ("failed", "cancelled"):            error_detail = result.get("error", "no error detail returned")            raise RuntimeError(f"Video task {task_id} failed: {error_detail}")        time.sleep(interval)        elapsed += interval    raise TimeoutError(        f"Video task {task_id} did not complete within {max_wait}s. "        f"Last status: {status}, progress: {progress}%"    )

Video-specific issues:

Symptom	Cause	Fix
Task stuck in queued 10+ min	Server load	Retry with a different model
failed with no error detail	Prompt filtered or model error	Rephrase prompt
Video URL returns 403	URL expired	Download immediately
task_not_exist on Runway first poll	Task still initializing (CometAPI-documented behavior)	Wait 5s and retry
Kling returns "succeed" not "succeeded"	Kling's API uses non-standard status string	Handle both in polling logic

Source**: CometAPI video generation docs**, Kling Video docs

Node.js version

import OpenAI from 'openai';const apiKey = process.env.COMETAPI_KEY;if (!apiKey) throw new Error('COMETAPI_KEY is not set');const client = new OpenAI({  baseURL: 'https://api.cometapi.com/v1',  apiKey,});async function safeComplete(messages, model = 'gpt-5.4-mini', options = {}) {  let response;  try {    response = await client.chat.completions.create({ model, messages, ...options });  } catch (err) {    if (err.status && err.status < 500) {      console.error(`Client error ${err.status}: ${err.message}`);    } else {      console.error(`Server/network error: ${err.message}`);    }    throw err;  }  const choice = response.choices[0];  const finishReason = choice.finish_reason;  if (finishReason === 'content_filter') {    throw new Error(`Generation blocked by content filter. Model: ${model}`);  }  if (finishReason === 'length') {    console.warn(`Output truncated. Used ${response.usage?.completion_tokens ?? 'unknown'} tokens.`);  }  return {    content: choice.message.content ?? '',    finishReason,    toolCalls: choice.message.tool_calls ?? null,  };}async function completeWithRetry(messages, model = 'gpt-5.4-mini', maxRetries = 3) {  let lastError;  for (let attempt = 0; attempt < maxRetries; attempt++) {    try {      return await safeComplete(messages, model);    } catch (err) {      // Don't retry 4xx client errors      if (err.status && err.status < 500) throw err;      lastError = err;      if (attempt < maxRetries - 1) {        const wait = (2 ** attempt + Math.random()) * 1000;        console.warn(`Attempt ${attempt + 1} failed. Retrying in ${(wait / 1000).toFixed(1)}s`);        await new Promise(r => setTimeout(r, wait));      }    }  }  throw new Error(`All ${maxRetries} attempts failed: ${lastError?.message}`);}// Usageconst result = await safeComplete(  [{ role: 'user', content: 'Classify as positive/negative/neutral: "Great product!"' }],  'claude-haiku-4-5');if (result.finishReason === 'tool_calls') {  console.log('Tool call requested:', result.toolCalls);} else {  console.log(result.content);}

A debugging checklist

When a generation fails and you're not sure where to start:

For text generation:

Is the API key set and in Authorization: Bearer <key> format?
Is finish_reason something other than stop?
Is content null? Check if finish_reason is tool_calls
Did the output get truncated? Check finish_reason: "length" and usage.completion_tokens
Is the error a 4xx (fix the request) or 5xx (retry)?
Does the output pass your application-layer validation? (silent failure)

For image generation:

Is data array empty? (content filter)
Did you use response_format on GPT Image 2? (not supported — use output_format)
Did you set n > 1 on Qwen Image? (not supported)
Did you download the image before the URL expired?

For video generation:

Is the task stuck in queued? (try a different model)
Did you check the error field in the failed task response?
Did you download the video before the URL expired?
Are you handling both "succeed" (Kling) and "succeeded" (Veo, Runway)?

FAQ

Q: My request returns 200 but there's no content. What happened?

Check finish_reason. If it's content_filter, the generation was blocked — the request succeeded but the output was suppressed. If it's tool_calls, the model called a tool instead of returning text, and content is null by design. If finish_reason is stop but content is still empty, that's a silent failure — log the full response and check your prompt.

Q: How do I know if my prompt is being filtered?

For text: check finish_reason === "content_filter". For images: check if the data array is empty. For video: check if the task reaches failed status shortly after submission with no error detail. In all cases, try rephrasing the prompt to be more neutral.

Q: When should I retry a failed request?

Retry on 429 and 5xx using exponential backoff. Don't retry on 4xx — a bad request won't fix itself. The exception is 401 if you're rotating API keys.

Q: What's exponential backoff and why does it matter?

Instead of retrying immediately, you wait progressively longer: 1s, 2s, 4s. Adding random jitter (+ random.random()) prevents multiple clients from retrying in sync. This is a standard practice for any API with rate limits — not specific to CometAPI.

Q: The video task is stuck in `queued` for 10 minutes. Is it failed?

Not necessarily — queues can back up under load. Wait up to your max_wait threshold, then raise a TimeoutError and retry with a different model. Log the task ID so you can check status manually if needed.

Q: How do I catch silent failures?

Silent failures require application-layer validation — the API won't tell you the output is semantically wrong. Check that the output matches the expected format (valid JSON, expected label, minimum length). Log the full output when validation fails. The most common causes are ambiguous prompts, ignored format instructions, or inputs that are too short or too long for the task.

Why AI API debugging is harder than regular API debugging

Understand the error response format

Know what each HTTP status code means

Check `finish_reason` — the most overlooked field

Detect silent failures at the application layer

Add exponential backoff for rate limits

Debug image generation failures

Debug video generation failures

Node.js version

A debugging checklist

For text generation:

For image generation:

For video generation:

FAQ

Q: My request returns 200 but there's no content. What happened?

Q: How do I know if my prompt is being filtered?

Q: When should I retry a failed request?

Q: What's exponential backoff and why does it matter?

Q: The video task is stuck in `queued` for 10 minutes. Is it failed?

Q: How do I catch silent failures?

Ready to cut AI development costs by 20%?

Read More

How to Debug Failed AI API Generations

Why AI API debugging is harder than regular API debugging

Understand the error response format

Know what each HTTP status code means

Check finish_reason — the most overlooked field

Detect silent failures at the application layer

Add exponential backoff for rate limits

Debug image generation failures

Debug video generation failures

Node.js version

A debugging checklist

For text generation:

For image generation:

For video generation:

FAQ

Q: My request returns 200 but there's no content. What happened?

Q: How do I know if my prompt is being filtered?

Q: When should I retry a failed request?

Q: What's exponential backoff and why does it matter?

Q: The video task is stuck in queued for 10 minutes. Is it failed?

Q: How do I catch silent failures?

Ready to cut AI development costs by 20%?

Read More

Check `finish_reason` — the most overlooked field

Q: The video task is stuck in `queued` for 10 minutes. Is it failed?