Sound overhaul for TKG

Karlos · 05 June 2024, 14:00

Hi,

I decided to split this off from the mega-thread for TKG (https://eab.abime.net/showthread.php?t=111090) to try and keep it a bit on topic. For some time now, I've thought that the reworked TKG engine would benefit from an overhaul to the sound system. To date, it's pretty much a carbon copy of the AB3D1 original, with the following attributes:

* 8 kHz, 7-bit samples
* 1 channel module playback for music
* 4 or 8 channels (the 8 channel option is where the 7-bit sample restriction comes from)
* Completely coupled to Paula for playback.

All the recent audio tech talk courtesy @saimo and friends in the Hertz Overload has inspired me to revisit this area.

What I would like, in no particular order is:
* Fire and forget programming model that is abstract enough to allow alternative outputs to be implemented while thin enough not to waste CPU through too much indirection.
* Paula first, AHI in the future.
* Faster sample rates
* Better dynamics
* 8-bit sample resolution
* Stream-based audio for music
* Moar channels
* Acceptable 3Dish stereo positioning.

It goes without saying that this will increase CPU requirements and as such, should be optional. I'm probably going to target 68040 as a minimum requirement for this, in part because that's pretty much the minimum realistic requirement for the game, but also because I have some thoughts on leveraging the caches better.

My thoughts are that the method for triggering a sound should include the following key parameters:
* Sound Data Pointer
* Sound Length
* Volume
* Priority
* Stationary vs Moving source type
* Coordinate Pointer

These attributes will be put into a structure, one per channel, along with other state information, such as the current data position, emitter coordinates, etc. When all channels are full, the channel that has the lowest priority / closest to completion can be allocated.

Assumptions:
* All samples are the same sample rate as the mixer.
* All source sound data, buffers and tables will be allocated 16-byte aligned and be a multiple of 16 bytes in length.
* The minimum time granulatity for any new sound effect starting will be 16 samples (8 bit), which is 2ms at 8kHz (the minimum rate).
* Sound data will be mixed into a Packet that is some multiple of 16 in length that is a good fit for the 50Hz fixed interrupt, e.g. if the sound rate is 16kHz, we'd be generating a Packet of 320 output samples, which is 20 blocks of 16. TBC.

The mixing buffer will be a L/R pair of 16-bit values and there will be a fetch buffer for the next 16 8-bit samples. The approach to mixing a Packet shall be:

* If there is a music stream, fill the L/R pair with the next samples, otherwise fill with silence.
* For each active channel:
* If the source is moving, update the emitter coordinates from the pointer
* Determine the distance and position relative to the player, in order to derive a pair of left/right volume values, in the range 0-N. TBC on the value of N, but 15 is probably granular enough.
* Transfer the next 16 8-bit samples from the source into the fetch buffer.
* For each sample in the fetch buffer, look up the 16-bit value for the given L/R volumes in a set of precalculated tables.
* Accumulate the calculated 16-bit samples into the LR mix buffers.
* Update active channel state data (position etc).

Note that the volume tables that convert the incoming 8-bit data to 16-bit can be recalculated based on some notion of a global sound effects volume control.

For the distance and position, we can use basic Pythagoras to determine the square distance and then look that up in our existing 1/N table that's used for a bunch of other stuff. Assuming an inverse-square attenuation law, we don't need any square roots. For the angle, we can use basic diamond angle calculation, which doesn't need any trig and calculates a value in the range 0-4 (but as a fixed point we will have some higher power of 2 for precision). These two numbers can be used to evaluate the left/right volume based on the sound origin relative to the player at this moment in time. We can probably take a fair few shortcuts here if our left/right volume levels have restricted precision.

Interestingly, if the 8-bit sample data were delta encoded, then the 16 bit volume lookups will tend to cluster around the centre of that row in the lookup table, increasing cache hit probability. Applying the volume to the delta or the absolute value has the same effect as long as the volume is unchanging. The problem happens when it does change - you'd need to account for that in your running value, which might make the whole thing more complicated than needed. Might be a fun optimisation to figure out later.

Next is where it gets potentially interesting for Paula:

For the 16 samples just calculated in the packet after mixing all active channels, determine, to the nearest power of 2, how many arithmetic left shifts are needed to approximately normalise that 16 samples. For example, if the largest absolure sample value was 7000, the nearest power of 2 is 8192, which needs multiplying by 4 to normalise. Each of the 16 values is then arithmetically shift left 2 places. Finally we take 64, which is our full playback volume and right shift it by the same number of places. This gives us a playback volume attenuation to account for the normalisation. We have to limit the maximum shift to 6 and accept that 1 is the lowest volume the packet can be played at and let the precision die off as the 16-bit value approaches 0.

The plan for Paula playback is that all 4 channels are used, all DMA mode. Two for the L/R audio and the remaining pair as channel volume modulators. The latter will be fed an effective 1/16th rate of volume data. The upper 8 bits of the normalised sample values will be transferred to the Chip RAM buffers Paula will play from next and the corresponding volume words transferred to the buffers the modulator channels are using. This approach is a simplification of one I've experimented with previously. The hope is that this will be less demanding than 14-bit playback while delivering sound of comparable quality.

All the emphasis on 16-byte alignments for buffers opens up the possibility of using non-cache altering move16 transfers for source reads from the sample data, while keeping the transfer and mix buffers entirely in the datacache during the process.

For all this to work, the game properties file (the mod extension) will requre some updates that will allow the engine to choose the appropriate sound system and initialise it accordingly. Ideally I'd like to at least double the playback rate to 16kHz, but in theory there's no reason why those on PiStorm couldn't just .

I appreciate this all sounds a bit ambitious but it does have the advantage that it can be implemented entirely greenfield as a separate project and then retrofit, so I am hoping it's achievable.

Karlos · 05 June 2024, 14:19

I realise I wasn't explicit before, but you can assume that the audio stream for music is already 16-bit. If it's 8-bit, we'd have to look it up in the volume table. I suspect a basic 8 bit audio stream would not sound so great however, unless it's some cleverly compressed format.

roondar · 05 June 2024, 17:35

Just in case it is helpful as a starting point, you could check out my Audio Mixer (https://powerprograms.nl/projects/audio_mixer.html). It provides several of the features you're asking for and might save you some time coding, as the full source code is included.

Of course, it may not be an exact fit, but I figured it wouldn't hurt to share the option

Karlos · 05 June 2024, 17:43

Thanks! As it goes, I think this is something I want to have a first principles stab at because the problem domain interests me. When I get bogged down and close to quitting it's good to know there's a proven solution already!

Karlos · 05 June 2024, 18:52

@roondar

Your mixer looks really good btw. If I get the abstraction and decoupling right, it might be an ideal replacement for the existing audio system irrespective of anything else I decide to do.

pipper · 05 June 2024, 19:31

DoomSound.library might also be worth a look?

Karlos · 05 June 2024, 19:41

Yeah, that's another established option.

The thing that is motivating me right now is DMA mode volume modulated Paula output - specifically the idea of getting better dynamic range out of it. If it turns out to be a turkey, we can go for a more conventional solution.

roondar · 05 June 2024, 21:27

Quote:

Originally Posted by Karlos

@roondarYour mixer looks really good btw. If I get the abstraction and decoupling right, it might be an ideal replacement for the existing audio system irrespective of anything else I decide to do.

Cool, I hope it fits well for with your requirements if you do end using it

05 June 2024, 14:00	#1
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,347	Sound overhaul for TKG Hi, I decided to split this off from the mega-thread for TKG (https://eab.abime.net/showthread.php?t=111090) to try and keep it a bit on topic. For some time now, I've thought that the reworked TKG engine would benefit from an overhaul to the sound system. To date, it's pretty much a carbon copy of the AB3D1 original, with the following attributes: * 8 kHz, 7-bit samples * 1 channel module playback for music * 4 or 8 channels (the 8 channel option is where the 7-bit sample restriction comes from) * Completely coupled to Paula for playback. All the recent audio tech talk courtesy @saimo and friends in the Hertz Overload has inspired me to revisit this area. What I would like, in no particular order is: * Fire and forget programming model that is abstract enough to allow alternative outputs to be implemented while thin enough not to waste CPU through too much indirection. * Paula first, AHI in the future. * Faster sample rates * Better dynamics * 8-bit sample resolution * Stream-based audio for music * Moar channels * Acceptable 3Dish stereo positioning. It goes without saying that this will increase CPU requirements and as such, should be optional. I'm probably going to target 68040 as a minimum requirement for this, in part because that's pretty much the minimum realistic requirement for the game, but also because I have some thoughts on leveraging the caches better. My thoughts are that the method for triggering a sound should include the following key parameters: * Sound Data Pointer * Sound Length * Volume * Priority * Stationary vs Moving source type * Coordinate Pointer These attributes will be put into a structure, one per channel, along with other state information, such as the current data position, emitter coordinates, etc. When all channels are full, the channel that has the lowest priority / closest to completion can be allocated. Assumptions: * All samples are the same sample rate as the mixer. * All source sound data, buffers and tables will be allocated 16-byte aligned and be a multiple of 16 bytes in length. * The minimum time granulatity for any new sound effect starting will be 16 samples (8 bit), which is 2ms at 8kHz (the minimum rate). * Sound data will be mixed into a Packet that is some multiple of 16 in length that is a good fit for the 50Hz fixed interrupt, e.g. if the sound rate is 16kHz, we'd be generating a Packet of 320 output samples, which is 20 blocks of 16. TBC. The mixing buffer will be a L/R pair of 16-bit values and there will be a fetch buffer for the next 16 8-bit samples. The approach to mixing a Packet shall be: * If there is a music stream, fill the L/R pair with the next samples, otherwise fill with silence. * For each active channel: * If the source is moving, update the emitter coordinates from the pointer * Determine the distance and position relative to the player, in order to derive a pair of left/right volume values, in the range 0-N. TBC on the value of N, but 15 is probably granular enough. * Transfer the next 16 8-bit samples from the source into the fetch buffer. * For each sample in the fetch buffer, look up the 16-bit value for the given L/R volumes in a set of precalculated tables. * Accumulate the calculated 16-bit samples into the LR mix buffers. * Update active channel state data (position etc). Note that the volume tables that convert the incoming 8-bit data to 16-bit can be recalculated based on some notion of a global sound effects volume control. For the distance and position, we can use basic Pythagoras to determine the square distance and then look that up in our existing 1/N table that's used for a bunch of other stuff. Assuming an inverse-square attenuation law, we don't need any square roots. For the angle, we can use basic diamond angle calculation, which doesn't need any trig and calculates a value in the range 0-4 (but as a fixed point we will have some higher power of 2 for precision). These two numbers can be used to evaluate the left/right volume based on the sound origin relative to the player at this moment in time. We can probably take a fair few shortcuts here if our left/right volume levels have restricted precision. Interestingly, if the 8-bit sample data were delta encoded, then the 16 bit volume lookups will tend to cluster around the centre of that row in the lookup table, increasing cache hit probability. Applying the volume to the delta or the absolute value has the same effect as long as the volume is unchanging. The problem happens when it does change - you'd need to account for that in your running value, which might make the whole thing more complicated than needed. Might be a fun optimisation to figure out later. Next is where it gets potentially interesting for Paula: For the 16 samples just calculated in the packet after mixing all active channels, determine, to the nearest power of 2, how many arithmetic left shifts are needed to approximately normalise that 16 samples. For example, if the largest absolure sample value was 7000, the nearest power of 2 is 8192, which needs multiplying by 4 to normalise. Each of the 16 values is then arithmetically shift left 2 places. Finally we take 64, which is our full playback volume and right shift it by the same number of places. This gives us a playback volume attenuation to account for the normalisation. We have to limit the maximum shift to 6 and accept that 1 is the lowest volume the packet can be played at and let the precision die off as the 16-bit value approaches 0. The plan for Paula playback is that all 4 channels are used, all DMA mode. Two for the L/R audio and the remaining pair as channel volume modulators. The latter will be fed an effective 1/16th rate of volume data. The upper 8 bits of the normalised sample values will be transferred to the Chip RAM buffers Paula will play from next and the corresponding volume words transferred to the buffers the modulator channels are using. This approach is a simplification of one I've experimented with previously. The hope is that this will be less demanding than 14-bit playback while delivering sound of comparable quality. All the emphasis on 16-byte alignments for buffers opens up the possibility of using non-cache altering move16 transfers for source reads from the sample data, while keeping the transfer and mix buffers entirely in the datacache during the process. For all this to work, the game properties file (the mod extension) will requre some updates that will allow the engine to choose the appropriate sound system and initialise it accordingly. Ideally I'd like to at least double the playback rate to 16kHz, but in theory there's no reason why those on PiStorm couldn't just . I appreciate this all sounds a bit ambitious but it does have the advantage that it can be implemented entirely greenfield as a separate project and then retrofit, so I am hoping it's achievable. Last edited by Karlos; 05 June 2024 at 14:05.

05 June 2024, 14:19	#2
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,347	I realise I wasn't explicit before, but you can assume that the audio stream for music is already 16-bit. If it's 8-bit, we'd have to look it up in the volume table. I suspect a basic 8 bit audio stream would not sound so great however, unless it's some cleverly compressed format. Last edited by Karlos; 05 June 2024 at 17:58.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Slow A4000 after overhaul	Screechstar	support.Hardware	57	11 July 2023 23:02
Amiga Font Editor overhaul	buggs	Coders. Releases	19	09 March 2021 17:39
Escom A1200 overhaul	Ox.	Amiga scene	8	26 August 2014 08:54
Will Bridge Practice series needs an overhaul	mk1	HOL data problems	1	02 April 2009 21:55

05 June 2024, 17:35	#3
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,430	Just in case it is helpful as a starting point, you could check out my Audio Mixer (https://powerprograms.nl/projects/audio_mixer.html). It provides several of the features you're asking for and might save you some time coding, as the full source code is included. Of course, it may not be an exact fit, but I figured it wouldn't hurt to share the option

05 June 2024, 17:43	#4
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,347	Thanks! As it goes, I think this is something I want to have a first principles stab at because the problem domain interests me. When I get bogged down and close to quitting it's good to know there's a proven solution already!

05 June 2024, 18:52	#5
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,347	@roondar Your mixer looks really good btw. If I get the abstraction and decoupling right, it might be an ideal replacement for the existing audio system irrespective of anything else I decide to do.

05 June 2024, 19:31	#6
pipper Registered User Join Date: Jul 2017 Location: San Jose Posts: 669	DoomSound.library might also be worth a look?

05 June 2024, 19:41	#7
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,347	Yeah, that's another established option. The thing that is motivating me right now is DMA mode volume modulated Paula output - specifically the idea of getting better dynamic range out of it. If it turns out to be a turkey, we can go for a more conventional solution.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)