My last blog post was about Watson Speech to Text language model customization and this blog post is about IBM Cloud Watson Text to Speech (TTS) custom voice model configuration. Because, now it’s time to have some fun with the Watson TTS service. I created a fun customisation of the service that the German pronunciation sounds a little bit like the palatinate dialect.
Here are the differences with two wav file I created with a custom Watson Text to Speech voice model.
- German standard
- German which sounds a bit link the palatinate dialect
You can get the code and a fast technical overview how to customize a voice model using cURL
inside a bash script in the GitHub project watson-tts-invocation.
This is an extract of the IBM Cloud Catalog:
IBM Cloud documentation
Simplified architecture dependencies
Let’s have a short look at the simplified architecture dependencies in following image.

The TTS service offers the possibility to customize a language voice model, with an custom voice model containing custom words.
A custom voice model is used to extend the base model for example with a new accent or jargon for a region. The custom words do extends the custom voice model with written text which contains unknown words or map known words to a different pronunciation. Therefor the “IBM Watson™ Text to Speech uses synthesize text applies language-dependent pronunciation rules. The service applies the rules to convert the ordinary (orthographic) spelling of each word to a phonetic spelling. A word’s phonetic spelling uses phoneme symbols to define how the word is pronounced. These symbols are the distinct units of sound that distinguish words in a language, the boundaries between syllables, and the stress marks for the syllables. (partly form the IBM Cloud documentation)
The service has a new beta feature to “Tune by Example”. This feature let you control exactly how specified text is spoken by the service. The feature lets you dictate the intonation, stress, tempo, cadence, rhythm, and pauses of the synthesized text. These aspects of speech are collectively referred to as prosody.
Now you can create “speaker models” and configure ist with custom prompts with a sound and text combination.
The automation example
The links in the following text are pointing directly to the related source code in the GitHub project watson-tts-invocation
.
The bash automation script automation includes two flows and here is one flow:
- Creation of a custom model: This are the steps:
- Create a custom voice model
- List all customized models
- Create words
- Use the custom voice model
The following code is an extract of the words configuration in the pfaelzer-words.json file of the GitHub project.
{"words":[
{"word":"ich",
"translation":"isch"
},
{"word":"Ich",
"translation":"isch"
},
{"word":"rede",
"translation":"babbel"
},
{"word":"ein",
"translation":"ähn"
},
{"word":"komme",
"translation":"kumm"
},
{"word":"der",
"translation":"dä"
},
{"word":"Pfalz",
"translation":"Pallz"
},
{"word":"Pfälzer",
"translation":"Pällzer"
}
]
}
Summary
The Watson TTS service is easy to configure with the curl command and needs no additional interfaces. The API documentation is great and here is a list of API calls I used in the example:
By the way, the getting started in the IBM Cloud documentation is very good.
I hope this was useful for you and let’s see what’s next?
Greetings,
Thomas
#watsontexttospeech, #bashscript, #tts, #ai, #ibmcloud
Leave a Reply