English Amiga Board - View Single Post

FromWithin · 21 August 2011, 18:00

Can't tell if the original version worked exactly the same way, but the synthesis was created by Softvoice Inc..

"The SoftVoice system is built around the concept of formant synthesis in which we mathematically model the human speech production mechanism and, in particular, the acoustic resonances (formants) of the vocal tract. As opposed to the time-domain methods (demi-syllable, diphone, etc.) in which pre-recorded pieces of speech are spliced together, formant synthesis is not restricted to a single sounding (or minimally modifiable) vocal personality, nor is it constrained to a narrow pitch range. With the SoftVoice system, users can customize vocal personalities to suit their individual needs, everything from a singing choir boy to a menacing Martian. In addition, the SoftVoice formant synthesis algorithm, being continuous and splice free, does not suffer from such artifacts as glitches, gurgling, false consonants, chorusing, etc. that reduce intelligibility and increase listener fatigue."

The above description applies to version 5 of the synthesis (Amiga was probably version 1). But it's likely to be doing formant synthesis in the same way with some form of algorithm based on the user voice parameters for generating the root waveform to apply the filters to. The examples on their page sound pretty similar to the Amiga synthesis.

Format synthesis is pretty easy. You basically have three band-pass filters applied in parallel to the root waveform. In various frequency ratios, they make the "aaa", "eeee", etc. sounds. If you record yourself going "aaaaa" and have a look at the spectral graph of it, you'll clearly see the format peaks. You could record yourself making different vowel sounds and soft consonants and note down the format frequencies. Apply the band-pass filters at those same frequencies to any old waveform will make it sound like that letter/sounds. Slide the frequencies around to smoothly transition between different vowels. Alternatively, of course, you can do it without filters and just add together sine waves at the right frequencies. Then you need to apply bits of noise in the right place for stuff like "s", "f", "t" and all that.

21 August 2011, 18:00	#2
FromWithin Music lord Join Date: Jun 2003 Location: Liverpool, UK Age: 50 Posts: 630	Can't tell if the original version worked exactly the same way, but the synthesis was created by Softvoice Inc.. "The SoftVoice system is built around the concept of formant synthesis in which we mathematically model the human speech production mechanism and, in particular, the acoustic resonances (formants) of the vocal tract. As opposed to the time-domain methods (demi-syllable, diphone, etc.) in which pre-recorded pieces of speech are spliced together, formant synthesis is not restricted to a single sounding (or minimally modifiable) vocal personality, nor is it constrained to a narrow pitch range. With the SoftVoice system, users can customize vocal personalities to suit their individual needs, everything from a singing choir boy to a menacing Martian. In addition, the SoftVoice formant synthesis algorithm, being continuous and splice free, does not suffer from such artifacts as glitches, gurgling, false consonants, chorusing, etc. that reduce intelligibility and increase listener fatigue." The above description applies to version 5 of the synthesis (Amiga was probably version 1). But it's likely to be doing formant synthesis in the same way with some form of algorithm based on the user voice parameters for generating the root waveform to apply the filters to. The examples on their page sound pretty similar to the Amiga synthesis. Format synthesis is pretty easy. You basically have three band-pass filters applied in parallel to the root waveform. In various frequency ratios, they make the "aaa", "eeee", etc. sounds. If you record yourself going "aaaaa" and have a look at the spectral graph of it, you'll clearly see the format peaks. You could record yourself making different vowel sounds and soft consonants and note down the format frequencies. Apply the band-pass filters at those same frequencies to any old waveform will make it sound like that letter/sounds. Slide the frequencies around to smoothly transition between different vowels. Alternatively, of course, you can do it without filters and just add together sine waves at the right frequencies. Then you need to apply bits of noise in the right place for stuff like "s", "f", "t" and all that. Last edited by FromWithin; 21 August 2011 at 18:27. Reason: More info