ModelsSupportEnterpriseBlog
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Resources
AI ModelsBlogEnterpriseChangelogAbout
2025 CometAPI. All right reserved.Privacy PolicyTerms of Service
Home/Models/Kling/Kling Image Recognize
K

Kling Image Recognize

Per Request:$0.013216
Keling image element recognition API, usable for multi-image reference video generation, Multimodal video editing features ● Can recognize subjects, faces, clothing, etc., and can obtain 4 sets of results (if available) per request.
New
Commercial Use
Overview
Features
Pricing
API

Technical Specifications of kling-image-recognize

SpecificationDetails
Model IDkling-image-recognize
CategoryImage recognition / multimodal analysis
Primary CapabilityRecognizes image elements for downstream creative workflows, including multi-image reference video generation and multimodal video editing
Input TypeImage input
Output TypeStructured recognition results
Recognition ScopeSubjects, faces, clothing, and other visual elements
Result VolumeCan return up to 4 sets of results per request, if available
Use CasesVisual asset analysis, reference preparation for video generation, content understanding for editing pipelines, subject and apparel recognition

What is kling-image-recognize?

kling-image-recognize is a Keling image element recognition API designed to analyze visual content and identify important elements within an image. It is especially useful in workflows that require multi-image reference video generation or multimodal video editing, where understanding the contents of source images is an important preprocessing step.

The model can recognize a range of visual attributes such as subjects, faces, clothing, and related image components. Depending on the input, it can provide up to 4 sets of recognition results in a single request, helping developers capture multiple possible detections or interpretations when available.

Main features of kling-image-recognize

  • Image element recognition: Detects and identifies important visual elements contained in an input image.
  • Subject analysis: Recognizes primary subjects that can be used in downstream media generation or editing workflows.
  • Face recognition support: Extracts face-related recognition results when faces are present in the image.
  • Clothing identification: Detects apparel and clothing-related elements to support more detailed visual understanding.
  • Multi-image reference workflow support: Useful for preparing and analyzing image references used in video generation pipelines.
  • Multimodal video editing compatibility: Helps power editing scenarios where image content needs to be understood before transformation or composition.
  • Multiple result sets per request: Can obtain up to 4 sets of results per request, if available, enabling richer recognition output.
  • Integration-friendly API usage: Suitable for developers building automated media analysis and creative application pipelines.

How to access and integrate kling-image-recognize

Step 1: Sign Up for API Key

To get started, sign up on the CometAPI platform and generate your API key from the dashboard. After obtaining your key, store it securely and use it to authenticate every request to the kling-image-recognize API.

Step 2: Send Requests to kling-image-recognize API

Once you have your API key, send requests to the CometAPI endpoint using kling-image-recognize as the model ID. Include your authentication headers and provide the required image input payload based on your application workflow.

curl --request POST \
  --url https://api.cometapi.com/v1/responses \
  --header "Authorization: Bearer YOUR_COMETAPI_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "kling-image-recognize",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Recognize the main visual elements in this image."
          },
          {
            "type": "input_image",
            "image_url": "YOUR_IMAGE_URL"
          }
        ]
      }
    ]
  }'

Step 3: Retrieve and Verify Results

After submission, the API returns recognition results generated by kling-image-recognize. Parse the response in your application, verify the detected subjects or attributes, and store the returned data for use in video generation, editing, or other downstream automation tasks.

Features for Kling Image Recognize

Explore the key features of Kling Image Recognize, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Kling Image Recognize

Explore competitive pricing for Kling Image Recognize, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Kling Image Recognize can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Per Request:$0.013216
Per Request:$0.01652
-20%

Sample code and API for Kling Image Recognize

Access comprehensive sample code and API resources for Kling Image Recognize to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of Kling Image Recognize in your projects.

More Models

O

Sora 2 Pro

Per Second:$0.24
Sora 2 Pro is our most advanced and powerful media generation model, capable of generating videos with synchronized Audio. It can create detailed, dynamic video clips from natural language or images.
O

Sora 2

Per Second:$0.08
Super powerful video generation model, with sound effects, supports chat format.
M

mj_fast_video

Per Request:$0.6
Midjourney video generation
X

Grok Imagine Video

Per Second:$0.04
Generate videos from text prompts, animate still images, or edit existing videos with natural language. The API supports configurable duration, aspect ratio, and resolution for generated videos — with the SDK handling the asynchronous polling automatically.
G

Veo 3.1 Pro

Per Second:$0.25
Veo 3.1-Pro refers to the high-capability access/configuration of Google’s Veo 3.1 family — a generation of short-form, audio-enabled video models that add richer native audio, improved narrative/editing controls and scene-extension tools.
G

Veo 3.1

Per Second:$0.05
Veo 3.1 is Google’s incremental-but-significant update to its Veo text-and-image→video family, adding richer native audio, longer and more controllable video outputs, and finer editing and scene-level controls.