Question: Handling filler voices (e.g., “uh”, “um”) when training TTS models #3465

J4BEZ · 2025-10-16T10:24:38Z

J4BEZ
Oct 16, 2025

First of all, I’d like to express my sincere gratitude to the Unsloth team for providing such an accessible environment where everyone can train and run language models, even with limited hardware resources🙇‍♂️

Thanks to your excellent notebook and documentation, I was able to train a TTS model smoothly, even as a junior developer with relatively little background knowledge in LLMs.

I trained a TTS model (orpheus-3b-0.1-ft) using a custom interview dataset, which includes a large number of filler voices such as “ah”, “um”, and “eh”.

As a result, the trained model sometimes unintentionally generates filler sounds in sentences where they don’t belong.
For that reason, I decided to post this discussion to kindly seek your advice on the matter.

In this case, would it help improve the model’s learning if I explicitly annotate filler voices in the text dataset using custom tokens such as <filler_um> or <filler_ah>?
If I add such custom tags (e.g., <filler_um>, <filler_ah>, ...) to the text dataset, should I also manually update the tokenizer configuration, such as in tokenizer_config.json, to ensure they are properly recognized during training?

Thank you very much for taking the time to read my discussion.
I deeply appreciate all the work the unsloth team has done to make this remarkable project available to the community. 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: Handling filler voices (e.g., “uh”, “um”) when training TTS models #3465

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Question: Handling filler voices (e.g., “uh”, “um”) when training TTS models #3465

Uh oh!

Uh oh!

J4BEZ Oct 16, 2025

Replies: 0 comments

J4BEZ
Oct 16, 2025