The goal is to help you unlock the full expressive power of the model: emotional acting, realistic movements, multi-style performance, and context-aware reactions.

1. How Prompts Work in Lip Sync

Lip Sync uses three inputs:

Image — identity, appearance, environment reference.
Audio — timing + speech + emotional cues + semantics.
Prompt — everything the audio cannot explicitly define: style, mood, acting instructions, camera motion, scenario, personality, intensity.

OmniHuman 1.5 reads prompts as direction for the actor — similar to giving instructions on a movie set.

Prompts can influence:

emotional style
expression intensity
micro-gestures
acting logic
camera behavior
character behavior
style (film / social media / dramatic / natural)
posture
mood
environment feeling

They cannot override:

identity from the input image
timing of the lip motion
semantic correctness of speech

2. Core Principles from OmniHuman-1.5

✓ Principle 1: Prompt = Acting Instructions

Prompts are best when they describe tone, attitude, emotional state, intent, and subtle behaviors. Example:

“Warm, sincere tone, soft smile, thoughtful eye movement, slight nod.”

✓ Principle 2: Respect the Audio

If the audio is sad/slow but the prompt says “energetic influencer style,” the result becomes unnatural. Prompts must align with the emotion and rhythm of the audio.

✓ Principle 3: Don’t Over-direct

Avoid complex or contradictory instructions.

❌ “Fast camera spin, walking through a forest, laughing loudly, whispering seductively.” ✔ “Soft natural expression, slight head movement, gentle smile.”

✓ Principle 4: Use prompts to enhance, not replace

The audio defines core behavior, the prompt defines style.

3. Prompt Structure (Recommended Template)

A best-practice prompt should include these blocks: