English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 05 June 2024, 14:00   #1
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Sound overhaul for TKG

Hi,

I decided to split this off from the mega-thread for TKG (https://eab.abime.net/showthread.php?t=111090) to try and keep it a bit on topic. For some time now, I've thought that the reworked TKG engine would benefit from an overhaul to the sound system. To date, it's pretty much a carbon copy of the AB3D1 original, with the following attributes:

* 8 kHz, 7-bit samples
* 1 channel module playback for music
* 4 or 8 channels (the 8 channel option is where the 7-bit sample restriction comes from)
* Completely coupled to Paula for playback.

All the recent audio tech talk courtesy @saimo and friends in the Hertz Overload has inspired me to revisit this area.

What I would like, in no particular order is:
* Fire and forget programming model that is abstract enough to allow alternative outputs to be implemented while thin enough not to waste CPU through too much indirection.
* Paula first, AHI in the future.
* Faster sample rates
* Better dynamics
* 8-bit sample resolution
* Stream-based audio for music
* Moar channels
* Acceptable 3Dish stereo positioning.

It goes without saying that this will increase CPU requirements and as such, should be optional. I'm probably going to target 68040 as a minimum requirement for this, in part because that's pretty much the minimum realistic requirement for the game, but also because I have some thoughts on leveraging the caches better.

My thoughts are that the method for triggering a sound should include the following key parameters:
* Sound Data Pointer
* Sound Length
* Volume
* Priority
* Stationary vs Moving source type
* Coordinate Pointer

These attributes will be put into a structure, one per channel, along with other state information, such as the current data position, emitter coordinates, etc. When all channels are full, the channel that has the lowest priority / closest to completion can be allocated.

Assumptions:
* All samples are the same sample rate as the mixer.
* All source sound data, buffers and tables will be allocated 16-byte aligned and be a multiple of 16 bytes in length.
* The minimum time granulatity for any new sound effect starting will be 16 samples (8 bit), which is 2ms at 8kHz (the minimum rate).
* Sound data will be mixed into a Packet that is some multiple of 16 in length that is a good fit for the 50Hz fixed interrupt, e.g. if the sound rate is 16kHz, we'd be generating a Packet of 320 output samples, which is 20 blocks of 16. TBC.

The mixing buffer will be a L/R pair of 16-bit values and there will be a fetch buffer for the next 16 8-bit samples. The approach to mixing a Packet shall be:

* If there is a music stream, fill the L/R pair with the next samples, otherwise fill with silence.
* For each active channel:
* If the source is moving, update the emitter coordinates from the pointer
* Determine the distance and position relative to the player, in order to derive a pair of left/right volume values, in the range 0-N. TBC on the value of N, but 15 is probably granular enough.
* Transfer the next 16 8-bit samples from the source into the fetch buffer.
* For each sample in the fetch buffer, look up the 16-bit value for the given L/R volumes in a set of precalculated tables.
* Accumulate the calculated 16-bit samples into the LR mix buffers.
* Update active channel state data (position etc).

Note that the volume tables that convert the incoming 8-bit data to 16-bit can be recalculated based on some notion of a global sound effects volume control.

For the distance and position, we can use basic Pythagoras to determine the square distance and then look that up in our existing 1/N table that's used for a bunch of other stuff. Assuming an inverse-square attenuation law, we don't need any square roots. For the angle, we can use basic diamond angle calculation, which doesn't need any trig and calculates a value in the range 0-4 (but as a fixed point we will have some higher power of 2 for precision). These two numbers can be used to evaluate the left/right volume based on the sound origin relative to the player at this moment in time. We can probably take a fair few shortcuts here if our left/right volume levels have restricted precision.

Interestingly, if the 8-bit sample data were delta encoded, then the 16 bit volume lookups will tend to cluster around the centre of that row in the lookup table, increasing cache hit probability. Applying the volume to the delta or the absolute value has the same effect as long as the volume is unchanging. The problem happens when it does change - you'd need to account for that in your running value, which might make the whole thing more complicated than needed. Might be a fun optimisation to figure out later.

Next is where it gets potentially interesting for Paula:

For the 16 samples just calculated in the packet after mixing all active channels, determine, to the nearest power of 2, how many arithmetic left shifts are needed to approximately normalise that 16 samples. For example, if the largest absolure sample value was 7000, the nearest power of 2 is 8192, which needs multiplying by 4 to normalise. Each of the 16 values is then arithmetically shift left 2 places. Finally we take 64, which is our full playback volume and right shift it by the same number of places. This gives us a playback volume attenuation to account for the normalisation. We have to limit the maximum shift to 6 and accept that 1 is the lowest volume the packet can be played at and let the precision die off as the 16-bit value approaches 0.

The plan for Paula playback is that all 4 channels are used, all DMA mode. Two for the L/R audio and the remaining pair as channel volume modulators. The latter will be fed an effective 1/16th rate of volume data. The upper 8 bits of the normalised sample values will be transferred to the Chip RAM buffers Paula will play from next and the corresponding volume words transferred to the buffers the modulator channels are using. This approach is a simplification of one I've experimented with previously. The hope is that this will be less demanding than 14-bit playback while delivering sound of comparable quality.

All the emphasis on 16-byte alignments for buffers opens up the possibility of using non-cache altering move16 transfers for source reads from the sample data, while keeping the transfer and mix buffers entirely in the datacache during the process.

For all this to work, the game properties file (the mod extension) will requre some updates that will allow the engine to choose the appropriate sound system and initialise it accordingly. Ideally I'd like to at least double the playback rate to 16kHz, but in theory there's no reason why those on PiStorm couldn't just .

I appreciate this all sounds a bit ambitious but it does have the advantage that it can be implemented entirely greenfield as a separate project and then retrofit, so I am hoping it's achievable.

Last edited by Karlos; 05 June 2024 at 14:05.
Karlos is offline  
Old 05 June 2024, 14:19   #2
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
I realise I wasn't explicit before, but you can assume that the audio stream for music is already 16-bit. If it's 8-bit, we'd have to look it up in the volume table. I suspect a basic 8 bit audio stream would not sound so great however, unless it's some cleverly compressed format.

Last edited by Karlos; 05 June 2024 at 17:58.
Karlos is offline  
Old 05 June 2024, 17:35   #3
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Just in case it is helpful as a starting point, you could check out my Audio Mixer (https://powerprograms.nl/projects/audio_mixer.html). It provides several of the features you're asking for and might save you some time coding, as the full source code is included.

Of course, it may not be an exact fit, but I figured it wouldn't hurt to share the option
roondar is offline  
Old 05 June 2024, 17:43   #4
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Thanks! As it goes, I think this is something I want to have a first principles stab at because the problem domain interests me. When I get bogged down and close to quitting it's good to know there's a proven solution already!
Karlos is offline  
Old 05 June 2024, 18:52   #5
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
@roondar

Your mixer looks really good btw. If I get the abstraction and decoupling right, it might be an ideal replacement for the existing audio system irrespective of anything else I decide to do.
Karlos is offline  
Old 05 June 2024, 19:31   #6
pipper
Registered User
 
Join Date: Jul 2017
Location: San Jose
Posts: 676
DoomSound.library might also be worth a look?
pipper is offline  
Old 05 June 2024, 19:41   #7
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Yeah, that's another established option.

The thing that is motivating me right now is DMA mode volume modulated Paula output - specifically the idea of getting better dynamic range out of it. If it turns out to be a turkey, we can go for a more conventional solution.
Karlos is offline  
Old 05 June 2024, 21:27   #8
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by Karlos View Post
@roondarYour mixer looks really good btw. If I get the abstraction and decoupling right, it might be an ideal replacement for the existing audio system irrespective of anything else I decide to do.
Cool, I hope it fits well for with your requirements if you do end using it
roondar is offline  
Old 07 June 2024, 20:38   #9
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Making some progress here but I have a Heisenbug that literally has me stumped.

I have the following C structures (that are replicated in assembler):

Code:
// CPU cache line size
#define CACHE_LINE_SIZE 16

// Number of volume levels when converting 8 bit to 16 for a given volume level
#define AUD_8_TO_16_LEVELS 16

#define AUD_NUM_CHANNELS 16

#define MIN_SAMPLE_RATE 8000
#define MAX_SAMPLE_RATE 22050
#define MIN_UPDATE_RATE 10
#define MAX_UPDATE_RATE 100

typedef struct {
    BYTE*   ac_SamplePtr;   // The current sample address, or NULL
    UWORD   ac_SamplesLeft; // Number of unplayed samples remaining
    UBYTE   ac_LeftVolume;
    UBYTE   ac_RightVolume;
} Aud_ChannelState;

typedef struct {
    Aud_ChannelState am_ChannelState[AUD_NUM_CHANNELS];

    BYTE am_FetchBuffer[CACHE_LINE_SIZE];
    WORD am_AccumL[CACHE_LINE_SIZE];
    WORD am_AccumR[CACHE_LINE_SIZE];

    // Chip RAM Buffers
    BYTE*  am_LeftPacketSamplePtr;  // contains am_PacketSize normalised 8-bit sample data for the left channel
    UWORD* am_LeftPacketVolumePtr;  // contains am_PacketSize/16 6-bit volume modulation data for the left channel
    BYTE*  am_RightPacketSamplePtr; // contains am_PacketSize normalised 8-bit sample data for the right channel
    UWORD* am_RightPacketVolumePtr; // contains am_PacketSize/16 6-bit volume modulation data for the right channel

    ULONG  am_LinesProcessed;

    UWORD  am_SampleRateHz;
    UWORD  am_UpdateRateHz;
    UWORD  am_PacketSize;
    UWORD  am_TableOffset;
} Aud_Mixer;
The structure is allocated cache aligned along with a cache aligned set of tables (15 tables of 256 WORDs) that each map an 8-bit sample value to a 16-bit one for a given volume level (volume level 0 is just skipped). The start of this set of tables is the address of the structure plus am_TableOffset and each table follows on. All the allocation and initialisation works just fine, I've dumped all the data and validated it.

I have implemented a single ASM function so far to just to scale and mix 1 cache line worth of samples from each of the channels. This works by looping though the am_ChannelState (one per channel) and for any that have a valid ac_SamplePtr and nonzero ac_SamplesLeft (as in remaining), moves the next 16 samples into am_FetchBuffer.
These are then mixed, first into am_AccumL with the left volume, then into am_AccumR with the right volume. The code looks like this

Code:
        xdef _Aud_MixLine

; a0 points at mixer
_Aud_MixLine::
Aud_MixLine:
        movem.l d2/d3/d4/d5/a2/a3/a4,-(sp)

    ; clear out both left/right buffers
.clear_accum_buffers:
        move.w  #CACHE_LINE_SIZE-1,d2
        lea     am_AccumL_vw(a0),a1

.clear_loop:
        clr.l   (a1)+ 
        dbra    d2,.clear_loop

        ; Fixed number of channels to mix in d2
        moveq   #AUD_NUM_CHANNELS-1,d2

        ; Get channelstate array into a1
        lea     am_ChannelState(a0),a1

.next_channel:
        ; Get the channel sample data pointer in a2, skip if null
        move.l  ac_SamplePtr_l(a1),a2
        beq.s   .done_channel

        ; Check there if data left to process. This really should never happen
        tst.w   ac_SamplesLeft_w(a1)
        beq.s   .done_channel

        ; Get the left/right volume pair, each of which should be 0-15, with 0 being a silence skip
        move.w   ac_LeftVol_b(a1),d5

        ; Enforce the range 0-15 for each channel
        and.w    #$0F0F,d5

        ; If both are zero, just update the channel state and move along
        beq.s   .update_channel

.not_silent:
        ; swap the bytes in d5 to get the left voume in the lower byte first. Endian fail, lol.
        rol.w   #8,d5

        ; grab the next 16 samples
        lea     am_FetchBuffer_vb(a0),a3

        ; The theory goes, we won't be crapflooding the datacache with the sample data this way...
        move16  (a2)+,(a3)+

        ; Two step loop. The first iteration handles the left channel, the second iteration handles the right
        move.w  #1,d3
        lea     am_AccumL_vw(a0),a4 ; note that the right accumulator immediately follows
        clr.l   d0

.mix_samples:
        move.b  d5,d0   ; d0 = 0-15, 0 silence, 1-14 are volume table selectors
        beq.s   .update_channel

        sub.w   #1,d0   ; d0 = 0-14, now we need to multiply by 512 to get the table start
        lsl.w   #8,d0   ;
        add.w   d0,d0   ; d0 = table position = vol * 256 * sizeof(WORD)

        ; Add the structure offset and put the effective address into a2
        add.w   am_TableOffset_w(a0),d0
        lea     (a0,d0.w),a2

        ; Point a3 at the cache line of samples we loaded
        lea     am_FetchBuffer_vb(a0),a3

        move.w  #CACHE_LINE_SIZE-1,d1    ; num samples in d1

        ; Index the table by sample value (as unsigned word)
        clr.w   d0

.next_sample:
        move.b  (a3)+,d0         ; next 8-bit sample.
        move.w  (a2,d0.w*2),d4   ; look up the volume adjusted word
        add.w   d4,(a4)+         ; accumulate onto the target buffer
        dbra    d1,.next_sample

        ; Now do the second step for the opposite side...
        lsr.w   #8,d5
        dbra    d3,.mix_samples

.update_channel:
        sub.w   #CACHE_LINE_SIZE,ac_SamplesLeft_w(a1)
        bne.s   .inc_sample_ptr

        ; zero out the remaining channel state
        clr.l   ac_SamplePtr_l(a1)
        clr.w   ac_LeftVol_b(a1)
        bra.s   .done_channel

.inc_sample_ptr:
        add.l   #CACHE_LINE_SIZE,ac_SamplePtr_l(a1)

.done_channel:
        add.w   #Aud_ChanelState_SizeOf_l,a1

        ; WEIRD!!! No idea why, but unless I trigger a write here, nothing happens
        add.l  #1,am_LinesProcessed_l(a0)

        dbra    d2,.next_channel

.finished:
        movem.l (sp)+,d2/d3/d4/d5/a2/a3/a4
        rts
This all works fine, unless I skip the line indicated with "WEIRD".

I did think this could be a move16 related issue, so I swapped that out for four move.l, but issue still happens. I've tried that in conjunction with all the combinations of instruction and datacache enabled/disabled.

If I do not call the entirely unneccessary instruction to increment the LinesProcessed field, it's as if the function was never called and I am completely stumped.

Last edited by Karlos; 07 June 2024 at 20:44.
Karlos is offline  
Old 07 June 2024, 21:33   #10
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Just had a cursory look at the assembly code, without even trying to understand it (I'm allergic to such process ) and spotted this:

Code:
   move.l  ac_SamplePtr_l(a1),a2
   beq.s   .done_channel
That won't work as movea doesn't touch the codition codes.
If you have a spare data register:
Code:
   move.l  ac_SamplePtr_l(a1),dx
   beq.s   .done_channel
   movea.l dx,a2
If not, a tst.l a2 after the movea will do.

EDIT: probably you're focused in getting the code to work, but anyway I thought it won't hurt mentioning these little optimizations:

move.w #CACHE_LINE_SIZE-1,d2 -> moveq
move.w #1,d3 -> moveq
sub.w #1,d0 -> subq
add.w #Aud_ChanelState_SizeOf_l,a1 -> lea
add.l #1,am_LinesProcessed_l(a0) -> addq

Last edited by saimo; 07 June 2024 at 21:40.
saimo is offline  
Old 07 June 2024, 21:48   #11
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Woo, I forgot about that!
Karlos is offline  
Old 07 June 2024, 21:54   #12
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
@saimo

Excellent spot. That was it. The code is now safely in github so that it doesn't get lost. Obviously, it's only just starting off, but if anyone is interested, it's over here: https://github.com/0xABADCAFE/tkg-mixer
Karlos is offline  
Old 07 June 2024, 21:55   #13
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by Karlos View Post
@saimoExcellent spot. That was it.
Cool! If only debugging were always this easy...
saimo is offline  
Old 07 June 2024, 21:58   #14
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Quote:
Originally Posted by saimo View Post
Cool! If only debugging were always this easy...
It was "working" with the count increment because the flags update from that operation fooled the bad address null check.
Karlos is offline  
Old 07 June 2024, 22:02   #15
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Quote:
Originally Posted by saimo View Post
EDIT: probably you're focused in getting the code to work, but anyway I thought it won't hurt mentioning these little optimizations:

move.w #CACHE_LINE_SIZE-1,d2 -> moveq
move.w #1,d3 -> moveq
sub.w #1,d0 -> subq
add.w #Aud_ChanelState_SizeOf_l,a1 -> lea
add.l #1,am_LinesProcessed_l(a0) -> addq
Yeah, getting it working is first and foremost. The function just implemented now accumulates 16 samples from each of the 16 channels, applying their left and right volume and calculates the (absolute) maximum for the left and right side.

That maximum value will be used to determine the Paula volume adjustmens for the 16 values and normalizing the buffer. The plan for 040 is to use power of 2 volume modulation only. On an 060, multiplication normalisation is probably affordable, giving all 64 Paula levels as viable adjustments.

I might make that a "high" quality option.
Karlos is offline  
Old 07 June 2024, 22:22   #16
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Gotta love the devotion and love you're pouring into this game. Keep up the good work!
(I don't mean to rain on the parade, but in all honestly I have to admit that, unfortunately I can't appreciate the game itself, as this sort of games is just not my cup of tea )
saimo is offline  
Old 07 June 2024, 22:58   #17
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
Quote:
Originally Posted by saimo View Post
Gotta love the devotion and love you're pouring into this game. Keep up the good work!
(I don't mean to rain on the parade, but in all honestly I have to admit that, unfortunately I can't appreciate the game itself, as this sort of games is just not my cup of tea )
I probably spend more time editing/building than playing. I might actually enjoy that more!
Karlos is offline  
Old 07 June 2024, 23:01   #18
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by Karlos View Post
I probably spend more time editing/building than playing. I might actually enjoy that more!
I perfectly understand that In my life, I've been developing 99.9% of the time and dedicated the rest to playing (and I love playing).
saimo is offline  
Old 08 June 2024, 13:26   #19
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
I am at the point where I have the absolute maximum and derived left/right normalisation shifts but I'm wondering if that's going to be too crude. It'll work ok when things are fairly quiet, but only being able to normalise by a power of 2 feels too imprecise. It would mean that a 16 bit signal just above half peak intensity would be effectively played as a 7 bit value at full volume and that's not in the spirit of the AM mechanism at all. The original mechanism always tries to get 8 bits of precision and attenuate the volume to the appropriate level.

I didn't want to go down the multiplication based normalisation route, with the possible exception of the 68060. I feel like there must be a lookup solution to this that doesn't require an entry for every possible 16 bit input.
Karlos is offline  
Old 08 June 2024, 16:14   #20
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,924
I guess precision will bite you in one place or another. This is just a thought that popped up in my head and probably doesn't work for many reasons. If your samples are stored as log() of their actual values, you could scale them by adding a constant to them. You could thus look up the inverse log() of the scaled sample by using an index register with the scale factor (or just move the address pointer accordingly) and using the log-value as index into the table. This might even be quite cache efficient, too, if samples aren't too far from each other as the scale factor remains constant for an entire sample. This would remove lots of MUL operations.
grond is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Slow A4000 after overhaul Screechstar support.Hardware 57 11 July 2023 23:02
Amiga Font Editor overhaul buggs Coders. Releases 19 09 March 2021 17:39
Escom A1200 overhaul Ox. Amiga scene 8 26 August 2014 08:54
Will Bridge Practice series needs an overhaul mk1 HOL data problems 1 02 April 2009 21:55

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:03.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09799 seconds with 13 queries