Paula FM synthesis - Page 2

Estrayk · 21 November 2023, 21:19

I have always wondered how the famous workbench 1.x "say" program works inside because it is obvious that there are no pcm samples inside the executable itself. what does it actually do inside? does it generate little bits of waves and modulate them to create the voice imitation?

Karlos · 21 November 2023, 21:26

Quote:

Originally Posted by Estrayk

I have always wondered how the famous workbench 1.x "say" program works inside because it is obvious that there are no pcm samples inside the executable itself. what does it actually do inside? does it generate little bits of waves and modulate them to create the voice imitation?

I don't know, but what I do know is that speech is generally broken into noise (all the fricatives, hissing and so on) and tuned (generally vowel). The latter come from subtractive synthesis. Your vocal chords buzz, rich in harmonics. This sound passes through a tunable, resonant filter (your vocal tract) which is approximated well using formants. So you can definitely do this using a combination of subtractive and additive synthesis techniques but I'd be lying if I said I knew that's how the narrator device does it. Sounds expensive for a vanilla 68000 7MHz.

Joel_w · 21 November 2023, 23:17

Quote:

Originally Posted by Estrayk

I have always wondered how the famous workbench 1.x "say" program works inside because it is obvious that there are no pcm samples inside the executable itself. what does it actually do inside? does it generate little bits of waves and modulate them to create the voice imitation?

I think early speech synthesizers worked with phonemes, the vowel and consonant sounds that are used in a language. In English there are about 50 phonemes needed to pronounce any word. But I have no idea how Say does it, if they’re stored as short samples or generated in some way.

Estrayk · 22 November 2023, 00:11

Karlos and Joel_w,

thank you for your explanations but I think my question was not well formulated. I really meant that it is possible that the narrator.library uses some kind of wavetable generator with the paula in real time because it does it too fast, has anyone monitored the memory at the time of the speech? there should be PCM samples in the chip ram even if they are short to be able to run the whole speech synthesis algorithm. if there is no pcm audio in memory, it is possible that ross is right and the paula has some way to generate any kind of PSG or FM o any kind of waveforms in realtime.

Karlos · 22 November 2023, 00:42

I think it probably just creates wave table data for the phonemes based on the current voice parameters and then blends those over time, which could be done in software (doesn't sound much more than a few kHz sample rate, maybe 8.0). If you change the voice properties it just recalculates whatever set it needs perhaps.

22khz according to this https://wiki.amigaos.net/wiki/Narrat...audio%20device.

I guess it's all synthesised on the fly, given the amount of parameters it has since v37

ross · 22 November 2023, 09:07

I quickly checked out the Da Jormas demo and it uses both volume and period modulation (I don't think together to modulate the same channel, but take it with a pinch of salt, I actually read the code for 2 minutes).

FM synthesis is carried out using the modulator (channel 0) with a fixed table of periods, something like:

Code:

00E1 00E8 00EF 00F6 00FD 0104 010A 0110
0116 011A 011F 0123 0126 0128 012A 012B
012B 012B 012A 0128 0126 0123 011F 011A
0116 0110 010A 0104 00FD 00F6 00EF 00E8
00E1 00D9 00D2 00CB 00C4 00BD 00B7 00B1
00AB 00A7 00A2 009E 009B 0099 0097 0096
0096 0096 0097 0099 009B 009E 00A2 00A7
00AB 00B1 00B7 00BD 00C4 00CB 00D2 00D9

Channel 0 fetches values in loop and the period of the fetches is also changed.

The modulated waveform (channel 1) is a much longer sample; its frequency therefore changes continuously (and irregularly) generating the various distortions that are audible in the demo.

Of course there is no attempt to keep the signals in phase and the waveform to be modulated and modulating have no harmonic relationships (the modulator is not even a waveform*..), but this is a brutual and dirty FM modulation synthesis

--
EDIT: *this too is not actually correct because in fact the period table has a sinusoidal trend, with the 0 point centered on a 15764Hz frequency and with +1/-1 at 23646Hz and 11863Hz. So in effect the modulator oscillates between these frequencies in a variable time, defined by the period of channel 0.
As Karlos said, arrangemen is very different than 'natural' FM..
Maybe this part could have been managed in a softer way (with a wider table, closer frequencies and a lower table period) to avoid too strong distortions in the sound; but it is probable that the effects are desired.
--

From the definition:
Frequency modulation synthesis (or FM synthesis) is a form of sound synthesis whereby the frequency of a waveform is changed by modulating its frequency with a modulator.

Source waveform? Yes.
Changed by a modulator? Yes.
Real time and continuous? Yes.
Completely in hardware (DMA)? Yes.
No samples modded from code? Yes.

Mathematically and physically incorrect? Yes

Does it work? Somehow

Thomas Richter · 22 November 2023, 10:16

Quote:

Originally Posted by Estrayk

thank you for your explanations but I think my question was not well formulated. I really meant that it is possible that the narrator.library uses some kind of wavetable generator with the paula in real time because it does it too fast, has anyone monitored the memory at the time of the speech?

It is a parametric sample generator where the samples are generated on the fly from parameters that come from the phonems. There is even some additional trickery to create smooth transition between one phonem and the next, and some parametrized digital filters. It is by no means an FM synthesis, and it just uses the Paula DAC generator. It is really not that complex, after all, and most parts of it are even written in C.

All of this comes from SoftVoice, and it was a follow-up product from S.A.M, which was even able to generate synthetic voice on the 8-bit machines (both Atari and C64 were supported), though it took a while to generate an entire sample - not exactly real-time back then.

ross · 22 November 2023, 10:49

Quote:

Originally Posted by Thomas Richter

It is a parametric sample generator where the samples are generated on the fly from parameters that come from the phonems.

Is the entire phoneme generated or are there small parts of samples?
I guess if it comes from 8-bit era, then using simple basic waveforms and tones, everything is generated...
(but maybe improvements have been made for chips with DAC..)

I have no idea how complex it actually is, but it's certainly fascinating.
Are there any examples of simple enough (source) code to generate this in real time on a 68000 at 7MHz?

Thomas Richter · 22 November 2023, 17:27

Quote:

Originally Posted by ross

Is the entire phoneme generated or are there small parts of samples?

No, there are no samples at all.

Quote:

Originally Posted by ross

I have no idea how complex it actually is, but it's certainly fascinating.
Are there any examples of simple enough (source) code to generate this in real time on a 68000 at 7MHz?

The source is unfortunately not available, but there are open source speech synthetisers.

nogginthenog · 22 November 2023, 19:26

For those that are interested the way 'say' works (phonemes etc) is actually documented in the manual that comes with the Amiga!

I found this fascinating when I was young.

copse · 22 November 2023, 20:04

Quote:

Originally Posted by nogginthenog

For those that are interested the way 'say' works (phonemes etc) is actually documented in the manual that comes with the Amiga!

I found this fascinating when I was young.

Maybe you mean Using the System Software. Here's a link to the Say command documentation starting on page 284.

paraj · 22 November 2023, 20:22

Quote:

Originally Posted by Karlos

I designed an alternative to 14-bit replay that instead uses Paula volume modulation to gain adjust a preprocessed 8 bit stream (generated from a 16-bit source) composed of frames where the volume of a frame is set by the modulator channel running at some fixed divisor. I mocked it up in software (encode/decode) but someone here - grond maybe? - actually implemented the replay proof of concept on hardware.

https://eab.abime.net/showthread.php?p=1611140

Karlos · 22 November 2023, 20:49

Sorry squire, getting folks muddled up. It was indeed your good self.

I wonder if someone with the right equipment could get a quantitative measure of the quality.

nogginthenog · 23 November 2023, 11:12

Quote:

Originally Posted by copse

Maybe you mean Using the System Software. Here's a link to the Say command documentation starting on page 284.

No, it broke down how all the phonemes work etc. Might have been in the AmigaBASIC manual?

Found it! Pages 299..307

https://archive.org/details/Amiga_BA.../n298/mode/1up

copse · 23 November 2023, 21:03

Quote:

Originally Posted by nogginthenog

No, it broke down how all the phonemes work etc. Might have been in the AmigaBASIC manual?

Found it! Pages 299..307

https://archive.org/details/Amiga_BA.../n298/mode/1up

Nice thanks! I bought a copy of that a few months ago and even read through it and only your reference to it is reminding me of see it.. I blame old age!

KONEY · 24 November 2023, 15:09

Quote:

Originally Posted by Paulee_Alex_Bow

I talk about it in my Amiga softsynths video, starting at 7:16

I was going to add about how powerful OctaMED SoundStudio Hybrid/SynthSounds are, they feature a scripting language which allows modulation of Volume and Frequences, even with LFOs. A brief reading of the docs will give an idea: https://docs.google.com/document/d/1...Uf_R8kh1A/edit

@Paulee_Alex_Bow care to share your ASM piece of code?

Paulee_Alex_Bow · 24 November 2023, 16:16

Quote:

Originally Posted by KONEY

I was going to add about how powerful OctaMED SoundStudio Hybrid/SynthSounds are, they feature a scripting language which allows modulation of Volume and Frequences, even with LFOs. A brief reading of the docs will give an idea: https://docs.google.com/document/d/1...Uf_R8kh1A/edit

@Paulee_Alex_Bow care to share your ASM piece of code?

Hey

So I can probably find and paste the code as text later, but for now, if you pause my video at 7:40, it’s on the screen then.

[ Show youtube player ]

22 November 2023, 00:42	#25
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,165	I think it probably just creates wave table data for the phonemes based on the current voice parameters and then blends those over time, which could be done in software (doesn't sound much more than a few kHz sample rate, maybe 8.0). If you change the voice properties it just recalculates whatever set it needs perhaps. 22khz according to this https://wiki.amigaos.net/wiki/Narrat...audio%20device. I guess it's all synthesised on the fly, given the amount of parameters it has since v37 Last edited by Karlos; 22 November 2023 at 00:51. Reason: https:

22 November 2023, 09:07	#26
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,474	I quickly checked out the Da Jormas demo and it uses both volume and period modulation (I don't think together to modulate the same channel, but take it with a pinch of salt, I actually read the code for 2 minutes). FM synthesis is carried out using the modulator (channel 0) with a fixed table of periods, something like: Code: 00E1 00E8 00EF 00F6 00FD 0104 010A 0110 0116 011A 011F 0123 0126 0128 012A 012B 012B 012B 012A 0128 0126 0123 011F 011A 0116 0110 010A 0104 00FD 00F6 00EF 00E8 00E1 00D9 00D2 00CB 00C4 00BD 00B7 00B1 00AB 00A7 00A2 009E 009B 0099 0097 0096 0096 0096 0097 0099 009B 009E 00A2 00A7 00AB 00B1 00B7 00BD 00C4 00CB 00D2 00D9 Channel 0 fetches values in loop and the period of the fetches is also changed. The modulated waveform (channel 1) is a much longer sample; its frequency therefore changes continuously (and irregularly) generating the various distortions that are audible in the demo. Of course there is no attempt to keep the signals in phase and the waveform to be modulated and modulating have no harmonic relationships (the modulator is not even a waveform..), but this is a brutual and dirty FM modulation synthesis -- EDIT: this too is not actually correct because in fact the period table has a sinusoidal trend, with the 0 point centered on a 15764Hz frequency and with +1/-1 at 23646Hz and 11863Hz. So in effect the modulator oscillates between these frequencies in a variable time, defined by the period of channel 0. As Karlos said, arrangemen is very different than 'natural' FM.. Maybe this part could have been managed in a softer way (with a wider table, closer frequencies and a lower table period) to avoid too strong distortions in the sound; but it is probable that the effects are desired. -- From the definition: Frequency modulation synthesis (or FM synthesis) is a form of sound synthesis whereby the frequency of a waveform is changed by modulating its frequency with a modulator. Source waveform? Yes. Changed by a modulator? Yes. Real time and continuous? Yes. Completely in hardware (DMA)? Yes. No samples modded from code? Yes. Mathematically and physically incorrect? Yes Does it work? Somehow Last edited by ross; 22 November 2023 at 10:12.

22 November 2023, 20:49	#33
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,165	Sorry squire, getting folks muddled up. It was indeed your good self. I wonder if someone with the right equipment could get a quantitative measure of the quality. Last edited by Karlos; 22 November 2023 at 20:55.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Paula documents for a new Paula emulator library	MrSandMan	support.Hardware	0	27 November 2022 06:27
Any scripts/tools to turn on Paula's AM/FM Channel Modulation?	Paulee_Bow	request.Apps	23	30 July 2018 23:26
Kid Pix - Fujitsu FM Towns/FM Towns Marty	DamienD	HOL contributions	0	16 March 2017 21:40
Speech Synthesis (SAY)	Dunny	Coders. General	4	01 September 2011 21:22
Miggy, games and synthesis	Mr Softy	Amiga scene	13	25 February 2003 23:51

21 November 2023, 21:19	#21
Estrayk Registered User Join Date: Apr 2015 Location: Spain Posts: 511	I have always wondered how the famous workbench 1.x "say" program works inside because it is obvious that there are no pcm samples inside the executable itself. what does it actually do inside? does it generate little bits of waves and modulate them to create the voice imitation?

22 November 2023, 00:11	#24
Estrayk Registered User Join Date: Apr 2015 Location: Spain Posts: 511	Karlos and Joel_w, thank you for your explanations but I think my question was not well formulated. I really meant that it is possible that the narrator.library uses some kind of wavetable generator with the paula in real time because it does it too fast, has anyone monitored the memory at the time of the speech? there should be PCM samples in the chip ram even if they are short to be able to run the whole speech synthesis algorithm. if there is no pcm audio in memory, it is possible that ross is right and the paula has some way to generate any kind of PSG or FM o any kind of waveforms in realtime.

22 November 2023, 19:26	#30
nogginthenog Amigan Join Date: Feb 2012 Location: London Posts: 1,311	For those that are interested the way 'say' works (phonemes etc) is actually documented in the manual that comes with the Amiga! I found this fascinating when I was young.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)