Veo 3.1 ブログ

- Can it do audio?
  - No. Based on current public info, Veo 3.1 outputs silent video. It doesn’t synthesize speech, music, or sound effects. Add audio in post or pair it with dedicated tools (e.g., TTS for narration, sound-design libraries, or generative music/FX services).

- Professional usage guidelines
  - Define specs up front: aspect ratio, resolution, fps, duration, safe margins, target platforms, brand/style constraints.
  - Prompt like a shot list: subject, action, composition, camera move, lens/focal length, lighting, color mood, era, texture, and pacing. Avoid vague adjectives; include negatives for what to exclude.
  - Use references: upload style frames, color palettes, logos, product shots, or short reference clips to enforce look and continuity.
  - Ensure consistency: lock character, wardrobe, props, and environments; reuse reference frames; keep a seed for reproducibility.
  - Iterate and select: generate multiple variants, review for artifacts (hands, text, physics), then refine with targeted re-prompts or masked edits.
  - Post-production pipeline: stabilize, denoise, de-flicker, retime to 24/25/30 fps, color manage (LUTs/ACES if needed), upscale if required, conform to a mezzanine codec (e.g., ProRes/DNxHR), then deliver H.264/H.265.
  - Audio workflow: script VO first, produce/record VO, then cut picture to VO; add music/FX; mix to platform loudness (-14 LUFS streaming; -23 LUFS broadcast), 48 kHz sample rate, and export stereo unless spec says otherwise.
  - Compliance and rights: clear rights for likenesses, brands, and references; follow platform usage policies; disclose AI use if required; avoid deceptive or sensitive content without consent.
  - Documentation: keep prompts, seeds, versions, source assets, and approvals for auditability and future updates.
  - Use cases that work well: ideation and pre-viz, B‑roll, motion backgrounds, social shorts, product loops, concept explorations; reserve manual VFX or traditional production for critical hero shots when needed.
Mar 30, 2026
Veo 3.1

- Can it do audio? - No. Based on current public info, Veo 3.1 outputs silent video. It doesn’t synthesize speech, music, or sound effects. Add audio in post or pair it with dedicated tools (e.g., TTS for narration, sound-design libraries, or generative music/FX services). - Professional usage guidelines - Define specs up front: aspect ratio, resolution, fps, duration, safe margins, target platforms, brand/style constraints. - Prompt like a shot list: subject, action, composition, camera move, lens/focal length, lighting, color mood, era, texture, and pacing. Avoid vague adjectives; include negatives for what to exclude. - Use references: upload style frames, color palettes, logos, product shots, or short reference clips to enforce look and continuity. - Ensure consistency: lock character, wardrobe, props, and environments; reuse reference frames; keep a seed for reproducibility. - Iterate and select: generate multiple variants, review for artifacts (hands, text, physics), then refine with targeted re-prompts or masked edits. - Post-production pipeline: stabilize, denoise, de-flicker, retime to 24/25/30 fps, color manage (LUTs/ACES if needed), upscale if required, conform to a mezzanine codec (e.g., ProRes/DNxHR), then deliver H.264/H.265. - Audio workflow: script VO first, produce/record VO, then cut picture to VO; add music/FX; mix to platform loudness (-14 LUFS streaming; -23 LUFS broadcast), 48 kHz sample rate, and export stereo unless spec says otherwise. - Compliance and rights: clear rights for likenesses, brands, and references; follow platform usage policies; disclose AI use if required; avoid deceptive or sensitive content without consent. - Documentation: keep prompts, seeds, versions, source assets, and approvals for auditability and future updates. - Use cases that work well: ideation and pre-viz, B‑roll, motion backgrounds, social shorts, product loops, concept explorations; reserve manual VFX or traditional production for critical hero shots when needed.

Gemini/Vertex(Veo)のエンドポイントを呼び出すと、Veo 3.1 は動画とともに同期した音声を標準で生成します—音声はテキストプロンプト(音響キュー、セリフ、SFX、アンビエンス)で制御でき、同じ生成ジョブからダウンロード可能な MP4 が返されます。複数のプロバイダを束ねた単一の統合 API を好む場合は、CometAPI でも Veo 3.1 にアクセスできます(Comet キーで CometAPI を呼び出し、veo3.1/veo3.1-pro をリクエストします)。このリリースは他のメディアモデル(たとえば OpenAI の Sora 2)に対抗する直接の競合として位置付けられており、音声のリアリズム、ナラティブ制御、マルチショットの連続性に焦点を当てて改善が図られています。