Voice Cloning

Voice cloning lets a character sound as consistent as it looks. Influgen uses a voice provider layer, with ElevenLabs-style controls for stability, similarity, style, and speaker boost when the configured backend supports them.

What voice cloning is used for

Once a voice clone is attached to a character, it can be used for:

voice previews
generated speech clips
talking head videos
narration layered onto content items

The same settings can also become the default for future voice generations.

Two ways to create a voice clone

Upload samples

Use multipart form upload when you have raw files. Influgen accepts sample fields named samples or sample.

This is the best choice when:

you have clean WAV or MP3 files on hand
you want the browser workflow
you are building the voice directly from the web app

Provide sample URLs

Use sample_urls when the audio already lives remotely. This is especially useful in mobile workflows, where the voice lab expects direct audio sample URLs if you are not uploading files.

This is the best choice when:

you are using the mobile app
your asset pipeline already stores speech samples in cloud storage
you want to clone voices from a backend automation flow

[screenshot: Voice clone panel with sample upload area, tuning sliders, and preview player]

Voice clone inputs

You can provide:

name
description
labels
sample_urls
model_id
stability
similarity_boost
style
use_speaker_boost

Keep the labels practical. They are most useful for tracking accents, language, pacing, or intended use, not for writing long descriptions.

Best practices for sample quality

Use dry speech without music or room echo.
Keep the microphone quality reasonably consistent.
Avoid stacking multiple speakers in one sample.
Aim for natural pacing rather than exaggerated performance.
Use the same language and accent you expect to publish with.

Poor samples create a clone that sounds unstable no matter how much tuning you do later.

Preview vs. generate

Influgen exposes two voice actions:

Preview: generate a short test clip from text
Generate: create speech for a full asset or a linked content item

If you generate without a content_item_id, the call runs synchronously and returns the voice output immediately. If you include a content item, Influgen queues the generation so it can move through the job system alongside the rest of your content workflow.

Tuning controls

Different providers interpret these values slightly differently, but the working intent is consistent:

stability: reduces variation between takes
similarity_boost: pushes the output closer to the cloned identity
style: increases expressive shaping
use_speaker_boost: helps preserve speaker character

If a clone sounds flat, adjust style before adding more samples. If it sounds unstable, improve sample quality before increasing similarity.

Where voice shows up next

After cloning a voice, the next high-leverage workflows are:

What voice cloning is used for​

Two ways to create a voice clone​

Upload samples​

Provide sample URLs​

Voice clone inputs​

Best practices for sample quality​

Preview vs. generate​

Tuning controls​

Where voice shows up next​