Speech Synthesis (SAY)

Dunny · 21 August 2011, 17:30

Hi all -

As some of you are probably aware, I'm currently writing my own BASIC interpreter, SpecBAS. One thing that the Amiga's OS had that I would like to add is a speech synthesiser!

So my request is - is there any documentation of how the SAY command worked - not the actual "BASIC Manual" kind of docs, but how it actually did its job - does it use an algorithm to generate the speech, does it use samples, how was the syntax of the SAY command parsed and interpreted etc. As it was part of the ROM, I'm guessing that minimal use of samples was employed due to space restrictions?

Anyone got any pointers?

D.

FromWithin · 21 August 2011, 18:00

Can't tell if the original version worked exactly the same way, but the synthesis was created by Softvoice Inc..

"The SoftVoice system is built around the concept of formant synthesis in which we mathematically model the human speech production mechanism and, in particular, the acoustic resonances (formants) of the vocal tract. As opposed to the time-domain methods (demi-syllable, diphone, etc.) in which pre-recorded pieces of speech are spliced together, formant synthesis is not restricted to a single sounding (or minimally modifiable) vocal personality, nor is it constrained to a narrow pitch range. With the SoftVoice system, users can customize vocal personalities to suit their individual needs, everything from a singing choir boy to a menacing Martian. In addition, the SoftVoice formant synthesis algorithm, being continuous and splice free, does not suffer from such artifacts as glitches, gurgling, false consonants, chorusing, etc. that reduce intelligibility and increase listener fatigue."

The above description applies to version 5 of the synthesis (Amiga was probably version 1). But it's likely to be doing formant synthesis in the same way with some form of algorithm based on the user voice parameters for generating the root waveform to apply the filters to. The examples on their page sound pretty similar to the Amiga synthesis.

Format synthesis is pretty easy. You basically have three band-pass filters applied in parallel to the root waveform. In various frequency ratios, they make the "aaa", "eeee", etc. sounds. If you record yourself going "aaaaa" and have a look at the spectral graph of it, you'll clearly see the format peaks. You could record yourself making different vowel sounds and soft consonants and note down the format frequencies. Apply the band-pass filters at those same frequencies to any old waveform will make it sound like that letter/sounds. Slide the frequencies around to smoothly transition between different vowels. Alternatively, of course, you can do it without filters and just add together sine waves at the right frequencies. Then you need to apply bits of noise in the right place for stuff like "s", "f", "t" and all that.

vidarh · 22 August 2011, 10:50

Look at Espeak.

Espeak is an open source speech synthesizer that uses formant synthesis just like say/narrator.device.

There are a number of other ones, such as Festival (which is huge, but there's a trimmed down one called Festival Lite), Mbrola and others that are also available for free and/or open source.

thomas · 22 August 2011, 11:01

Quote:

Originally Posted by Dunny

As it was part of the ROM

It's not part of the ROM. Say is a frontend for translation.library and narrator.device. Both are disk-based.

In WB 1.3 there was also speak-handler and the SPEAK DOS device. Text copied to this device was converted into speech by the same interface.

Photon · 01 September 2011, 21:22

Yep, 'just' use narrator.device. Docs should be in Libraries & Devices, online at amigadev.elowar.com.

21 August 2011, 17:30	#1
Dunny Registered User Join Date: Aug 2006 Location: Scunthorpe/United Kingdom Posts: 1,973	Speech Synthesis (SAY) Hi all - As some of you are probably aware, I'm currently writing my own BASIC interpreter, SpecBAS. One thing that the Amiga's OS had that I would like to add is a speech synthesiser! So my request is - is there any documentation of how the SAY command worked - not the actual "BASIC Manual" kind of docs, but how it actually did its job - does it use an algorithm to generate the speech, does it use samples, how was the syntax of the SAY command parsed and interpreted etc. As it was part of the ROM, I'm guessing that minimal use of samples was employed due to space restrictions? Anyone got any pointers? D.

21 August 2011, 18:00	#2
FromWithin Music lord Join Date: Jun 2003 Location: Liverpool, UK Age: 50 Posts: 630	Can't tell if the original version worked exactly the same way, but the synthesis was created by Softvoice Inc.. "The SoftVoice system is built around the concept of formant synthesis in which we mathematically model the human speech production mechanism and, in particular, the acoustic resonances (formants) of the vocal tract. As opposed to the time-domain methods (demi-syllable, diphone, etc.) in which pre-recorded pieces of speech are spliced together, formant synthesis is not restricted to a single sounding (or minimally modifiable) vocal personality, nor is it constrained to a narrow pitch range. With the SoftVoice system, users can customize vocal personalities to suit their individual needs, everything from a singing choir boy to a menacing Martian. In addition, the SoftVoice formant synthesis algorithm, being continuous and splice free, does not suffer from such artifacts as glitches, gurgling, false consonants, chorusing, etc. that reduce intelligibility and increase listener fatigue." The above description applies to version 5 of the synthesis (Amiga was probably version 1). But it's likely to be doing formant synthesis in the same way with some form of algorithm based on the user voice parameters for generating the root waveform to apply the filters to. The examples on their page sound pretty similar to the Amiga synthesis. Format synthesis is pretty easy. You basically have three band-pass filters applied in parallel to the root waveform. In various frequency ratios, they make the "aaa", "eeee", etc. sounds. If you record yourself going "aaaaa" and have a look at the spectral graph of it, you'll clearly see the format peaks. You could record yourself making different vowel sounds and soft consonants and note down the format frequencies. Apply the band-pass filters at those same frequencies to any old waveform will make it sound like that letter/sounds. Slide the frequencies around to smoothly transition between different vowels. Alternatively, of course, you can do it without filters and just add together sine waves at the right frequencies. Then you need to apply bits of noise in the right place for stuff like "s", "f", "t" and all that. Last edited by FromWithin; 21 August 2011 at 18:27. Reason: More info

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Network: Synthesis install improvements	nzo	Games images which need to be WHDified	5	23 June 2018 12:33
Say speech synthesizer question	robinhood2013	New to Emulation or Amiga scene	6	24 May 2013 00:04
Commodore Amiga 500 Text to Speech Synthesis... HELP PLEASE?!	nzdj	New to Emulation or Amiga scene	21	04 March 2010 19:07
Miggy, games and synthesis	Mr Softy	Amiga scene	13	25 February 2003 23:51
Mortville Manor voice synthesis	RocketMack	support.Games	2	17 January 2002 03:39

22 August 2011, 10:50	#3
vidarh Registered User Join Date: Sep 2010 Location: Croydon, UK Posts: 46	Look at Espeak. Espeak is an open source speech synthesizer that uses formant synthesis just like say/narrator.device. There are a number of other ones, such as Festival (which is huge, but there's a trimmed down one called Festival Lite), Mbrola and others that are also available for free and/or open source.

01 September 2011, 21:22	#5
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,602	Yep, 'just' use narrator.device. Docs should be in Libraries & Devices, online at amigadev.elowar.com.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)