Skip to main content

Talking Head

Talking head generation is the fastest way to turn a character into a speaking presenter. It combines a source image, text, and the configured character voice pipeline to produce a video job you can preview or queue.

Inputs

Talking head requests support:

  • content_item_id
  • model_id
  • source_image_url
  • text

You can use the preview endpoint for quick validation or the generation endpoint when you are ready to create a full queued job.

Preview vs. full generation

Preview

Use preview when you want to test:

  • whether the script sounds right
  • whether the source image works
  • whether the overall motion feels natural

Preview returns a generation object immediately without sending it through the full queue flow.

Full generation

Use full generation when you want the result to enter the normal content pipeline. The system creates a talking-head generation and a linked job, then enqueues it.

[screenshot: Talking head generator showing text script, source image picker, and preview player]

Good scripts

Talking head videos work best with script text that sounds spoken, not written:

  • short sentences
  • natural rhythm
  • one clear message
  • direct audience framing

Example:

Three things I do before every morning Pilates session, because how you start changes everything.

Linking to content items

Attach a content_item_id when the talking head is part of a larger content plan. This keeps the speech asset tied to the queue, approvals, and scheduling logic you already use elsewhere.

Reviewing history

Influgen stores talking-head generations and lets you:

  • list them per character
  • filter by content_item_id
  • include or exclude previews
  • fetch a specific generation by ID

This is especially useful when you are iterating on the same script with multiple source images or voices.