Lip Sync — Prompting Guide (OmniHuman 1.5)
This article explains how to write effective prompts for Lip Sync, based on the official OmniHuman-1.5 prompt guidelines.
The goal is to help you unlock the full expressive power of the model: emotional acting, realistic movements, multi-style performance, and context-aware reactions.
1. How Prompts Work in Lip Sync
Lip Sync uses three inputs:
- Image — identity, appearance, environment reference.
- Audio — timing + speech + emotional cues + semantics.
- Prompt — everything the audio cannot explicitly define: style, mood, acting instructions, camera motion, scenario, personality, intensity.
OmniHuman 1.5 reads prompts as direction for the actor — similar to giving instructions on a movie set.
Prompts can influence:
- emotional style
- expression intensity
- micro-gestures
- acting logic
- camera behavior
- character behavior
- style (film / social media / dramatic / natural)
- posture
- mood
- environment feeling
They cannot override:
- identity from the input image
- timing of the lip motion
- semantic correctness of speech
2. Core Principles from OmniHuman-1.5
✓ Principle 1: Prompt = Acting Instructions
Prompts are best when they describe tone, attitude, emotional state, intent, and subtle behaviors. Example:
“Warm, sincere tone, soft smile, thoughtful eye movement, slight nod.”
✓ Principle 2: Respect the Audio
If the audio is sad/slow but the prompt says “energetic influencer style,” the result becomes unnatural. Prompts must align with the emotion and rhythm of the audio.
✓ Principle 3: Don’t Over-direct
Avoid complex or contradictory instructions.
❌ “Fast camera spin, walking through a forest, laughing loudly, whispering seductively.” ✔ “Soft natural expression, slight head movement, gentle smile.”
✓ Principle 4: Use prompts to enhance, not replace
The audio defines core behavior, the prompt defines style.
3. Prompt Structure (Recommended Template)
A best-practice prompt should include these blocks: