English Amiga Board - Has anyone ever reverse engineered narrator.device?

Page 1 of 3

Show 20 post(s) from this thread on one page

English Amiga Board (https://eab.abime.net/index.php)

- Coders. General (https://eab.abime.net/forumdisplay.php?f=37)

- - Has anyone ever reverse engineered narrator.device? (https://eab.abime.net/showthread.php?t=116548)

Nightfox

26 December 2023 14:27

Has anyone ever reverse engineered narrator.device?

Hi all

I was wondering if anyone has ever done (or considered trying to) reverse engineer Workbench 1.3's narrator.device to know exactly how it works and perhaps extend it do do some things such as output to file instead of just outputting straight to audio, or even porting it to modern platforms?

Also, who actually owns it anymore? It was taken out of later workbenches due to some ownership issue was it not? Is it now public domain?

gulliver

26 December 2023 16:38

Hi,

Speaking from the point of AmigaOS development, I can tell you that we still have the framework (translator.library) that ThoR kept lubricatring about three/four years ago.

The narrator.device is propietary software. Its makers are still around developing voice synthetizing applications on modern platforms.

I contacted them (I was not the only one) regarding this old code we have that we could still reuse to build narrator.device, but unfortunately they only wanted to do business, and that means $$$ we do not have.

Someone could reverse engineer the code cleanly and slap a MIT license on it and set it free, and we might be able to pick it up and reuse it for a next iteration of AmigaOS. No promises, but certainly a possibility.

jotd	26 December 2023 17:05

you mean that one ? :)

Code:

        TTL        'NARRATOR.ASM'





*************************************************************************        

*                                                                            *

*   Copyright 1990, 1991 Joseph Katz/Mark Barton.  All rights reserved.        *

*   No part of this program may be reproduced, transmitted, or stored   *

*   in any language or computer system, in any form, whatsoever,        *

*   without the prior written permission of the authors.                   *

*                                                                           *

*   Modification History                                                       *

*                                                                           *

*        3/4/91        JK --        Added new subroutine FormSentence to break the        *

*                        input into sentences.  Less pause before        *

*                        speaking and smaller buffers need be allocated.        *

*************************************************************************

Zack	26 December 2023 17:10

Related, and could be useful?

A narrator.device emulator has just been released on Github.

StingRay

26 December 2023 17:35

Quote:

Originally Posted by jotd (Post 1660771)

you mean that one ? :)

Code:

    TTL    'NARRATOR.ASM'





*************************************************************************    

*                                                                        *

*   Copyright 1990, 1991 Joseph Katz/Mark Barton.  All rights reserved.    *

*   No part of this program may be reproduced, transmitted, or stored   *

*   in any language or computer system, in any form, whatsoever,    *

*   without the prior written permission of the authors.           *

*                                       *

*   Modification History                                                   *

*                                       *

*    3/4/91    JK --    Added new subroutine FormSentence to break the    *

*            input into sentences.  Less pause before    *

*            speaking and smaller buffers need be allocated.    *

*************************************************************************

That's not exactly reverse engineered though. :)

gulliver

26 December 2023 18:25

That is not clean reverse engineering!!!

jotd	26 December 2023 18:27

Quote:

Originally Posted by Zack (Post 1660772)

Related, and could be useful?

A narrator.device emulator has just been released on Github.

nice but it requires musashi (68000 emulator) and the original devices... not exactly open source :)

Quote:

Similar to vamos (Virtual AmigaOS runtime), this is a tool that will run the code from the narrator.device and translator.library. The actual code is emulated using the Musashi 680x0 CPU emulator, while just enough of the OS is simulated by trapping the exec.library calls.

Quote:

I contacted them (I was not the only one) regarding this old code we have that we could still reuse to build narrator.device, but unfortunately they only wanted to do business, and that means $$$ we do not have.

Seriously? Some text to speech modern tools are also free and open source (Kaldi? VOSK?). Screw them. Unless you want to create amiga software. In which case there is no $$$ to make... Use the original leaked source!

Thomas Richter

26 December 2023 20:12

Quote:

Originally Posted by Nightfox (Post 1660758)

Also, who actually owns it anymore? It was taken out of later workbenches due to some ownership issue was it not? Is it now public domain?

The narrator.device is proprietary source code and owned by SoftVoice. CBM, for some reason, never had a valid source code license according to Softvoice. (CBM, at its best, of course). Softvoice is still in business. And no "screw them" is definitely *not* the approach to follow. If anyone thinks that it is "quite obvious" what the device does, then please write an implementation yourself and provide it.

The narrator.device creates human voices from phonems, from base frequences filtered by a model of the human vocal tract, and quite some postprocessing to smoothen the transitions between the phonems as they are used in the English language. The trick seems to be less to generate the right signals, but to make the voice sound half-way realistic, more than just generation of the phonems is necessary.

Thus, the device as such is *not* suitable to speak in any other language. It seems that SoftVoice also offers spanish for their later products, but that is of course not sufficient for full international support.

This being said, I would rather say that the narrator.device was more a show-case for the Amiga (as it was for the Mac - same code, also from SoftVoice) than it served a valuable service. It was a nice feature, but is not really critical.

Thomas Richter

26 December 2023 20:13

Quote:

Originally Posted by jotd (Post 1660779)

Use the original leaked source!

No, definitely don't. Or how would you like it if I would just steal your property?

Karlos

26 December 2023 20:18

Could something like festival be ported and given a narrator.device compatible interface?

jotd	26 December 2023 20:39

Quote:

Originally Posted by Thomas Richter (Post 1660796)

No, definitely don't. Or how would you like it if I would just steal your property?

Yeah right, reverse engineering the original binary isn't stealing anyone property.

Neither is zoning IPF images or cracked ADFs, or creating a huge T***N server with all games & apps on it.

If someone rebuilt a valid ROM file from original kickstart sources that would be a problem by this site conventions (which forbids sharing ROM images because they're still sold), but rebuilding & tuning a tiny part of it? come on.

redblade

26 December 2023 21:24

Quote:

Originally Posted by jotd (Post 1660800)

Neither is zoning IPF images or cracked ADFs, or creating a huge T***@N server with all games & apps on it.

Please don't paint a target on a useful source.

paraj

26 December 2023 21:27

A "traditional" "clean room" RE of narrator.device seems infeasible, as far as I understand the process, since you can't just resource it and call it a day. You'd need someone to RE the original device and document what it does in enough detail for someone else to re-create it from that documentation. Seems like too much effort for something - that while cool for v1.x - hasn't really been used since. Using another text-to-speech engine seems beside the point to me - we'd want it for the original (nostalgic) sound, right? If it's a new engine just create a new library/device. Copying say+translator.library+narrator.device from KS1.3 still works (and same sort of thing works in whdload).

Matt_H

26 December 2023 22:53

There’s flite.device for OS4, which is functionally similar. I don’t know how much work would be required (or if it would even be possible) to turn it into a binary-/API-compatible drop-in replacement for narrator.device. But it’s available under a much more permissive license.

Bruce Abbott

27 December 2023 08:54

Quote:

Originally Posted by paraj (Post 1660805)

A "traditional" "clean room" RE of narrator.device seems infeasible, as far as I understand the process, since you can't just resource it and call it a day.

I think it's quite feasible. Reverse-engineering is generally legal. According to the EU Computer Programs Directive a program may be decompiled 'if this is necessary to ensure it operates with another program or device'. It could also fall under 'fair use' if used for educational purposes, eg. to find out how such things work.

Distributing the disassembly would probably be considered a copyright violation because it could be reassembled to identical object code, but you could distribute an explanation of how the code works as this would be all your own work.

However...

Quote:

Using another text-to-speech engine seems beside the point to me - we'd want it for the original (nostalgic) sound, right?

Yes, we would. To ensure that the original sound is preserved the device should not be modified. If someone wants to make an 'improved' narrator device then they should do it from scratch, creating a unique (and hopefully better) implementation which will not sound the same. This could be an interesting project.

However the OP asks about reverse engineering the narrator device to 'extend it' to do things such as 'output to file instead of just outputting straight to audio'. For something like that the device code may not have to be touched at all. I'm betting it outputs through the audio device, so to redirect the audio output to a file you would just have to capture the audio device commands and data. You could do this by patching DoIO etc. without touching the narrator code.

For other things you might need to modify the code. When I want to do this I prefer to patch it rather than changing the source code and reassembling it (which might cause bugs). If code needs to be added I append a hunk to the executable and apply patches that jump into the code in the extra hunk. This could even be done on-the-fly so that no changes are made to the device file on disk.

Minuous

27 December 2023 09:35

Quote:

Originally Posted by Thomas Richter (Post 1660795)

Thus, the device as such is *not* suitable to speak in any other language.

I would imagine that translator.library which works at the ASCII text level would need localization more than narrator.device which can speak just about any phoneme and can already be given accents.

rzookol

27 December 2023 10:17

Quote:

Originally Posted by Minuous (Post 1660856)

I would imagine that translator.library which works at the ASCII text level would need localization more than narrator.device which can speak just about any phoneme and can already be given accents.

But there is already: http://aminet.net/util/libs/translator42.lha with localization support.

Thomas Richter

27 December 2023 10:29

Quote:

Originally Posted by Minuous (Post 1660856)

I would imagine that translator.library which works at the ASCII text level would need localization more than narrator.device which can speak just about any phoneme and can already be given accents.

No, I afraid that is not correct. As stated before, the narrator *cannot* just generate *any* phonem. It can only generate English phonems (first problem) and second, the processing it does on the phonems is only suitable for English (or, rather, american English). It is very much bound to American speech. For the first point: It definitely lacks phonems that are present in German: German has two types of "ch" sounds (not just the one present in the narrator), and German has "umlaute", in particular the "ü" is missing. There is neither a German "r" (which is probably unspeakable for americans). But the narrator has many diphones that are not present in German. German has very few diphones compared to English. I heard the funniest sounds when Americans attempt to pronounce my last name as it combines all the difficulties of German: Two German "R"s and a German "ch", plus a glottal stop - not possible, unless you are German. American works differently. For the second: The narrator aims at generating a more or less "natural" melody (altering the pitch) in the sentence. This is built into narrator, and not into translator. It also alters the sound of phonems according to its neighboring phonems which is quite typical for English, but rather untypical for German. There is, actually on Aminet, an "internationalized" version of translator you can try to generate phonems from text in another language. I played a lot with it, but the result sounds more or less like an American attempting to speak German. Funny, but not realistic. Much more needs to be done to make narrator speak in any other language, and even its makers (SoftVoice, Inc) only partially support other languages (Spanish they do, but not German). It requires some research and some support from natural speakers to get the voice melody right, and you do not quite have the right feeling for a language without having spoken it for quite a while. My spoken English ("American" actually) is probably quite ok, but you would still recognize my accent, so I would not be the right person to create algorithms for American speech, same as Softvoice not being in the right position (without native support) for German.

Thomas Richter

27 December 2023 10:42

Quote:

Originally Posted by Bruce Abbott (Post 1660847)

I think it's quite feasible. Reverse-engineering is generally legal.

No, it is "generally illegal" in an appropriate (legal) use of the word "general". It is only legal in certain exceptions (thus not "general"), and only if you want to establish an interface to an otherwise closed system. Thus, for example, it would be legal to reverse engineer an internet protocol such as SMB to create an SMB client on Amiga (except that SMB is documented) or to reverse engineer the "word format" (except that it is now documented) to be able to read word documents.

But the "narrator" does not have such an interface. The two interfaces it has (phonems in - audio out) are perfectly open and documented, so nothing in that direction would require any type of reverse engineering.

What you could do is a patent research and see whether any of the algorithms the narrator depends upon were patented. If so, they should be usable without fee by now as the patents would have run out.

If that is not the case, you are stuck. It is a proprietary product by a private entity, and it is theirs, not yours. If you want to have a text to speech engine, there are (as already pointed out) alternatives you can pick from with more liberal licenses.

Quote:

Originally Posted by Bruce Abbott (Post 1660847)

Distributing the disassembly would probably be considered a copyright violation because it could be reassembled to identical object code, but you could distribute an explanation of how the code works as this would be all your own work.

The act of obtaining that information would still be illegal since it does not fall into the above exception clause. Anyhow, this would help little. The tricky part is really not getting "how" the code works, but "why" it works the way it works, and the disassembly (and even the source code) provides very little details of that. It requires the knowledge of one of its authors or any other expert in speech synthesis to make this code suitable for producing other languages, or to extend it.

Quote:

Originally Posted by Bruce Abbott (Post 1660847)

owever the OP asks about reverse engineering the narrator device to 'extend it' to do things such as 'output to file instead of just outputting straight to audio'. For something like that the device code may not have to be touched at all.

Indeed, and that is a simple exercise. It is an "audio-lookalike" (or "hear-alike"?) of the CMD utility which redirects printer output to disk, and that is a relatively easy exercise.

jotd	27 December 2023 10:50

If you don't like illegal stuff, stay away from EAB, as most of the stuff exchanged here is "abandonware".

All times are GMT +2. The time now is 17:02.

Page 1 of 3

Show 20 post(s) from this thread on one page

Page generated in 0.08259 seconds with 11 queries