Voice Cloning
Voice cloning lets a character sound as consistent as it looks. Influgen uses a voice provider layer, with ElevenLabs-style controls for stability, similarity, style, and speaker boost when the configured backend supports them.
What voice cloning is used for
Once a voice clone is attached to a character, it can be used for:
- voice previews
- generated speech clips
- talking head videos
- narration layered onto content items
The same settings can also become the default for future voice generations.
Two ways to create a voice clone
Upload samples
Use multipart form upload when you have raw files. Influgen accepts sample fields named samples or sample.
This is the best choice when:
- you have clean WAV or MP3 files on hand
- you want the browser workflow
- you are building the voice directly from the web app
Provide sample URLs
Use sample_urls when the audio already lives remotely. This is especially useful in mobile workflows, where the voice lab expects direct audio sample URLs if you are not uploading files.
This is the best choice when:
- you are using the mobile app
- your asset pipeline already stores speech samples in cloud storage
- you want to clone voices from a backend automation flow
[screenshot: Voice clone panel with sample upload area, tuning sliders, and preview player]
Voice clone inputs
You can provide:
namedescriptionlabelssample_urlsmodel_idstabilitysimilarity_booststyleuse_speaker_boost
Keep the labels practical. They are most useful for tracking accents, language, pacing, or intended use, not for writing long descriptions.
Best practices for sample quality
- Use dry speech without music or room echo.
- Keep the microphone quality reasonably consistent.
- Avoid stacking multiple speakers in one sample.
- Aim for natural pacing rather than exaggerated performance.
- Use the same language and accent you expect to publish with.
Poor samples create a clone that sounds unstable no matter how much tuning you do later.
Preview vs. generate
Influgen exposes two voice actions:
- Preview: generate a short test clip from text
- Generate: create speech for a full asset or a linked content item
If you generate without a content_item_id, the call runs synchronously and returns the voice output immediately. If you include a content item, Influgen queues the generation so it can move through the job system alongside the rest of your content workflow.
Tuning controls
Different providers interpret these values slightly differently, but the working intent is consistent:
stability: reduces variation between takessimilarity_boost: pushes the output closer to the cloned identitystyle: increases expressive shapinguse_speaker_boost: helps preserve speaker character
If a clone sounds flat, adjust style before adding more samples. If it sounds unstable, improve sample quality before increasing similarity.
Where voice shows up next
After cloning a voice, the next high-leverage workflows are:
- Talking Head
- Video Generation
- Mobile Overview for on-the-go previews