gen4_image is Runway’s flagship multimodal image-generation model in the Gen-4 family that supports prompted generation plus visual references (you can “@mention” reference images) to produce highly controllable, stylistically consistent outputs for image and image→video pipelines.
Introduction — what Gen-4 Image is
gen4_image is Runway’s fourth-generation visual generative model family, engineered to take text prompts + visual references and produce high-fidelity still images or media-ready frames that preserve identity and style across angles and lighting. The model is presented as part of a broader Gen-4 suite (including video variants such as gen4_turbo
) and is explicitly designed for creative production — e.g., consistent character rendering, product photography at scale, virtual try-on, and game asset generation.
Key features
- Reference-based generation (1–3 refs). Use up to three reference images so the model can preserve identity, style, or location while transforming pose, lighting, background, etc.
- High visual fidelity (production-ready outputs). Outputs target high resolution (1080p options available) with strong detail and stylistic control.
- Identity & scene consistency. Designed to keep the same character(s) or environment consistent across multiple generations — useful for multi-shot visuals or character-centric assets.
- Multimodal (text + images) prompts. Combine natural language instructions with reference images to steer composition, mood, clothing, camera angle, etc.
- Image → image plus text → image workflows. Works as image-to-image (edit/transform) and as text-to-image using references to maintain continuity.
- Performance tier (Turbo) available. A “Gen-4 Image Turbo” variant trades cost and speed (e.g., ~2.5× faster) while keeping the reference-driven features.
- Controls & reproducibility. Typical API options include aspect ratio presets, resolution (720p/1080p), seed for reproducibility, and reference tags to point to specific inputs.
Technical details
Input: Text/Image
Outputs: Image
workflow:
- User supplies: text prompt + 0–3 reference images (and optional masks, keyframes, camera motion instructions).
- Preprocess: references are normalized and encoded; text is tokenized. Identity/style embeddings are extracted and cached for reuse.
- Conditioning: text and reference embeddings are fused in the multimodal backbone; optional control signals (pose, depth, mask) are attached.
- Sampling / denoising: the decoder runs denoising iterations (diffusion steps) producing an image (or sequence of frames for video).
gen4_image — concrete limits
Temporal / motion edge cases. Reviewers and creators report occasional motion artifacts, odd temporal dynamics (glitches early/late in generated clips), and failures on very complex multi-actor choreography — test with your target scenes.
Compute, cost & queuing. High-quality image→video generation is GPU-heavy; users report queue times and cost/per-render that can be significant for mass production. Plan budget/throughput accordingly.
Creative tradeoffs vs pure artistry models. Gen-4’s strength is consistency; if you need highly stylized, painterly, or “surprising” aesthetic outputs, Midjourney or tuned SDXL checkpoints may produce preferred art directions.
Canonical use cases
- Pre-production & storyboarding: rapidly create style-consistent character/scene variants from reference photos.
- Marketing & content generation: fast production of hero images, animated social clips, and campaign assets with consistent brand characters. (Runway lists enterprise examples including live tours and music videos.)
- Game/asset prototyping & virtual try-on: generate multiple camera angles, outfit variants and environment concepts from a small set of references.
Comparison to other models
- gen4_image→ best when you need reference / identity consistency (single-character or object kept the same across shots) and when you want image→video and multi-shot pipelines.
- DALL·E 3 → best for tight prompt-to-image fidelity and a conversational ChatGPT-driven editing flow plus built-in safety/provenance work.
- SDXL (Stable Diffusion family) → best when you want open models, local/custom fine-tuning, and cost-flexible deployment.
- Midjourney → best for highly stylized, artistically pleasing renders and strong community-driven presets / “stylize” controls.
- Runway Gen-4 vs. ByteDance Seedream 4.0 / Google “Nano Banana” type models: recent competitor launches (e.g., Seedream 4.0) emphasize ultra-fast rendering and multi-reference handling aimed at commercial creators; Runway’s advantage is a tightly integrated image→video pipeline and production-oriented controls plus a mature API and SDK ecosystem.
How to call gen4_image API from CometAPI
Price | $0.32000 |
---|
Required Steps
- Log in to cometapi.com. If you are not our user yet, please register first
- Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.
- Get the url of this site: https://api.cometapi.com/
Use Method
- Select the “gen4_image” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience.
- Replace <YOUR_API_KEY> with your actual CometAPI key from your account.
- Insert your question or request into the content field—this is what the model will respond to.
- . Process the API response to get the generated answer.
CometAPI provides a fully compatible REST API—for seamless migration. Key details to API doc:
- Endpoint:
https://api.cometapi.com/runwayml/v1/text_to_image
- Model Parameter:
gen4_image
- Authentication:
Bearer YOUR_CometAPI_API_KEY
- Content-Type:
application/json
.
curl --location --request POST 'https://api.cometapi.com/runwayml/v1/text_to_image' \
--header 'X-Runway-Version: 2024-11-06' \
--header 'Authorization: {{api-key}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"promptText": "cat",
"ratio": "1920:1080",
"seed": 4294967295,
"model": "gen4_image",
"referenceImages": [
{
"uri": "https://cdn.britannica.com/70/234870-050-D4D024BB/Orange-colored-cat-yawns-displaying-teeth.jpg",
"tag": "string"
}
],
"contentModeration": {
"publicFigureThreshold": "auto"
}
}'
See also Runway/Act_two