English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 06 July 2024, 17:31   #141
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Early days mulling the data but I think that the issue with the delta lookup on 040 is that the extra work per sample is the problem. However, there is a super simple solution to that. We just pre-encode the samples into the expected 1:15 linear/delta frames. This will save some cycles in the mix/accumulate loop.
Karlos is online now  
Old 08 July 2024, 01:43   #142
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
I plotted the data earlier for the 060 with datacache enabled and the trends were quite interesting. All three modes (040 linear, 040 delta and 060 muls) followed a slightly quadratic curve, where the time taken increases slightly more than linearly. The curvature is quite conspicuous.
Karlos is online now  
Old 08 July 2024, 11:41   #143
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Right, the data for these two systems is now charted.

The X axis is the number of channels and the Y axis is the time, in milliseconds, per packet of audio. The configuration is for a 16kHz mixing rate and 50Hz update, so we need a new packet of 320 sample (LR pairs) every 20ms. Therefore anything above 20ms is going to be a problem.

The cost of normalising the data and writing the chip ram buffers is basically invariant with respect to the number of channels.

The most obvious initial conclusion is that my code sucks. Just sucks. Either that, or I have measured something incorrectly.

The effect of the delta lookup on the 040 is only apparent after 8 channels and is marginal. However, I haven't tested with samples that are preconverted into the proposed 1:15 linear/delta format, all that is happening on the fly.
Attached Thumbnails
Click image for larger version

Name:	chart.png
Views:	18
Size:	26.1 KB
ID:	82661  
Karlos is online now  
Old 08 July 2024, 12:10   #144
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Some basic maths:

In the worst case, we are reading 16 sets of 320 samples from fast memory in order to produce 640 samples written to chip memory. We are using move16 to perform the reading as we know we won't be reusing the same input data again soon* and the location being written to should already be in the datacache.

* Note that's not actually true in the test case, but it is true in the general case we are aiming for.

To hit the target update rate of 50Hz, that would imply reading 16x320x50 = 256,000 bytes/s from Fast Ram and writing 640x50 = 32000 bytes/sec to Chip Ram. Our chip writes are all long and long aligned (well, almost, we do have the volume modulation packets that get 1 word written every 16 samples).

This doesn't seem too egregious. Moreover, I've ran OctaMED at 28kHz with 20+ channels of audio many times in the past, with 14-bit replay on a 68040 with a much more complex (varying/different sample rates per channel) mixer.

I wonder if I have a loop in the wrong place, lol...

Last edited by Karlos; 08 July 2024 at 12:24.
Karlos is online now  
Old 08 July 2024, 12:28   #145
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
Maybe it would be an idea to initialize this variable:

https://github.com/0xABADCAFE/tkg-mi...47/main.c#L203
paraj is offline  
Old 08 July 2024, 12:34   #146
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Quote:
Originally Posted by paraj View Post
Maybe it would be an idea to initialize this variable:

https://github.com/0xABADCAFE/tkg-mi...47/main.c#L203
Funny you say that, I just did. Though it doesn't look like it changes anything here, but yeah
Karlos is online now  
Old 08 July 2024, 12:37   #147
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Pushed an update to fix the uninitialised. See if it makes a difference?

Would be amusing if it was accumulating each previous run as well.
Karlos is online now  
Old 08 July 2024, 12:55   #148
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
Think this looks more reasonable (no longer quadratic)

Code:
Using 68040 linear lookup code path
Got Timer, frequency is 709379 Hz
Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060
Testing with 1 channel(s)
Mixed 189 Packets in 51077 EClockVal ticks (709379/s)
Testing with 2 channel(s)
Mixed 189 Packets in 72777 EClockVal ticks (709379/s)
Testing with 3 channel(s)
Mixed 189 Packets in 90963 EClockVal ticks (709379/s)
Testing with 4 channel(s)
Mixed 189 Packets in 109273 EClockVal ticks (709379/s)
Testing with 5 channel(s)
Mixed 189 Packets in 127572 EClockVal ticks (709379/s)
Testing with 6 channel(s)
Mixed 189 Packets in 146921 EClockVal ticks (709379/s)
Testing with 7 channel(s)
Mixed 189 Packets in 167578 EClockVal ticks (709379/s)
Testing with 8 channel(s)
Mixed 189 Packets in 186947 EClockVal ticks (709379/s)
Testing with 9 channel(s)
Mixed 189 Packets in 205100 EClockVal ticks (709379/s)
Testing with 10 channel(s)
Mixed 189 Packets in 223904 EClockVal ticks (709379/s)
Testing with 11 channel(s)
Mixed 189 Packets in 243475 EClockVal ticks (709379/s)
Testing with 12 channel(s)
Mixed 189 Packets in 261216 EClockVal ticks (709379/s)
Testing with 13 channel(s)
Mixed 189 Packets in 279599 EClockVal ticks (709379/s)
Testing with 14 channel(s)
Mixed 189 Packets in 297640 EClockVal ticks (709379/s)
Testing with 15 channel(s)
Mixed 189 Packets in 315759 EClockVal ticks (709379/s)
Testing with 16 channel(s)
Mixed 189 Packets in 328309 EClockVal ticks (709379/s)
Using 68040 delta lookup code path
Got Timer, frequency is 709379 Hz
Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060
Testing with 1 channel(s)
Mixed 189 Packets in 51248 EClockVal ticks (709379/s)
Testing with 2 channel(s)
Mixed 189 Packets in 75004 EClockVal ticks (709379/s)
Testing with 3 channel(s)
Mixed 189 Packets in 94780 EClockVal ticks (709379/s)
Testing with 4 channel(s)
Mixed 189 Packets in 115296 EClockVal ticks (709379/s)
Testing with 5 channel(s)
Mixed 189 Packets in 135090 EClockVal ticks (709379/s)
Testing with 6 channel(s)
Mixed 189 Packets in 155458 EClockVal ticks (709379/s)
Testing with 7 channel(s)
Mixed 189 Packets in 178751 EClockVal ticks (709379/s)
Testing with 8 channel(s)
Mixed 189 Packets in 200930 EClockVal ticks (709379/s)
Testing with 9 channel(s)
Mixed 189 Packets in 219283 EClockVal ticks (709379/s)
Testing with 10 channel(s)
Mixed 189 Packets in 238803 EClockVal ticks (709379/s)
Testing with 11 channel(s)
Mixed 189 Packets in 259180 EClockVal ticks (709379/s)
Testing with 12 channel(s)
Mixed 189 Packets in 279343 EClockVal ticks (709379/s)
Testing with 13 channel(s)
Mixed 189 Packets in 298529 EClockVal ticks (709379/s)
Testing with 14 channel(s)
Mixed 189 Packets in 320269 EClockVal ticks (709379/s)
Testing with 15 channel(s)
Mixed 189 Packets in 338412 EClockVal ticks (709379/s)
Testing with 16 channel(s)
Mixed 189 Packets in 351083 EClockVal ticks (709379/s)
Using 68060 code path
Got Timer, frequency is 709379 Hz
Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060
Testing with 1 channel(s)
Mixed 189 Packets in 47733 EClockVal ticks (709379/s)
Testing with 2 channel(s)
Mixed 189 Packets in 65388 EClockVal ticks (709379/s)
Testing with 3 channel(s)
Mixed 189 Packets in 79558 EClockVal ticks (709379/s)
Testing with 4 channel(s)
Mixed 189 Packets in 93729 EClockVal ticks (709379/s)
Testing with 5 channel(s)
Mixed 189 Packets in 108285 EClockVal ticks (709379/s)
Testing with 6 channel(s)
Mixed 189 Packets in 122527 EClockVal ticks (709379/s)
Testing with 7 channel(s)
Mixed 189 Packets in 136714 EClockVal ticks (709379/s)
Testing with 8 channel(s)
Mixed 189 Packets in 150958 EClockVal ticks (709379/s)
Testing with 9 channel(s)
Mixed 189 Packets in 164968 EClockVal ticks (709379/s)
Testing with 10 channel(s)
Mixed 189 Packets in 179377 EClockVal ticks (709379/s)
Testing with 11 channel(s)
Mixed 189 Packets in 193682 EClockVal ticks (709379/s)
Testing with 12 channel(s)
Mixed 189 Packets in 207930 EClockVal ticks (709379/s)
Testing with 13 channel(s)
Mixed 189 Packets in 222006 EClockVal ticks (709379/s)
Testing with 14 channel(s)
Mixed 189 Packets in 237397 EClockVal ticks (709379/s)
Testing with 15 channel(s)
Mixed 189 Packets in 250223 EClockVal ticks (709379/s)
Testing with 16 channel(s)
Mixed 189 Packets in 259709 EClockVal ticks (709379/s)
paraj is offline  
Old 08 July 2024, 13:10   #149
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
SMH, lol.

Those numbers do look a bit healthier.

@abu any chance of a rerun?
Karlos is online now  
Old 08 July 2024, 13:18   #150
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,056
I will rerun when I get home tonight.
abu_the_monkey is offline  
Old 08 July 2024, 13:51   #151
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Quote:
Originally Posted by abu_the_monkey View Post
I will rerun when I get home tonight.
Actually, it should be fine for me to convert your numbers into delta to recover what the additional time per channel was.

Looking at @paraj's new numbers, I need a different vertical scale, lol
Karlos is online now  
Old 08 July 2024, 13:57   #152
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Using @paraj's new values and extracting the differences from @abu's old ones (to isolate the time added per test), the results make a lot more sense.

It's clear the cache improvements for delta lookup are not great but it does have a few more operations per sample. I will throw in a preconveted version to see if it's better with the simpler inner loop.

10ms to mix 20ms of audio is still not awesome.
Attached Thumbnails
Click image for larger version

Name:	mixer.png
Views:	32
Size:	27.1 KB
ID:	82663  
Karlos is online now  
Old 08 July 2024, 14:11   #153
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Does anyone have a 68040 with proper local fast memory that can test this?
Karlos is online now  
Old 08 July 2024, 18:39   #154
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
If my math is right it seems to be about ~20 cycles/sample on 060 at the limit for extra channels. Seems decent enough.

For 040 I think it might be useful to have a test case that only uses volume steps that can be done with shifts (just clamp to nearest or w/e don't need to be precise). Just to have a lower bound.

(And remember the thing about automatically running all tests )
paraj is offline  
Old 08 July 2024, 18:41   #155
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Yeah, the lookup performance is rather disappointing, even with taking cache hit rate into consideration.

One of the side effects of moving to shift based L/R channel volume is that you will lose the global mix level control. Currently, this is encoded into the lookup tables for 040 and factored into the multiplier in the 060 path.

Last edited by Karlos; 08 July 2024 at 18:59.
Karlos is online now  
Old 08 July 2024, 19:21   #156
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
It would be purely to see the speed for comparison purposes at first (and similarly it might be good to include a "0 channel" test just to get a baseline). If it can't be fast enough that way you need to re-think the approach. I wonder though, how many samples would realistically be playing at once? There's probably "meta level" stuff you could do, like limiting enemy shots to N channels at engine level or something like that (if enemy shots is what is pushing it).

If "shift-only" performance is good, I'm sure the global volume issue could be solved (multiple functions/SMC/etc.)
paraj is offline  
Old 08 July 2024, 19:43   #157
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
We can just fudge that with a fixed shift version for now. Also, a channel is only mixed when it is not muted, has a data pointer and a remaining samples to mix, so that aspect is already dealt with. It's the worst case I'm looking to set limits for.

After this there needs to be a whole prioritisation and assignment layer that decides which channel to recycle if all are in use.

Last edited by Karlos; 08 July 2024 at 21:44.
Karlos is online now  
Old 09 July 2024, 01:50   #158
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
Quote:
Originally Posted by paraj View Post
If my math is right it seems to be about ~20 cycles/sample on 060 at the limit for extra channels. Seems decent enough.

For 040 I think it might be useful to have a test case that only uses volume steps that can be done with shifts (just clamp to nearest or w/e don't need to be precise). Just to have a lower bound.

(And remember the thing about automatically running all tests )
I've pushed a version that runs all the test cases. The only CLI params now are for verbosity and buffer dumping.

I have included a mock "shift only" mixer version that applies a fixed level shift (but using a register count operand to be a better fit) too.
Karlos is online now  
Old 09 July 2024, 20:10   #159
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
For reference. Slightly slower on 060 (you really can program like it's 1994 for that).

Code:
Got Timer, tick frequency is 709379 Hz
Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf4900
Test case 0:
	Mix : Multiplication
	Norm: Multiplication/Shift
	Info: Move16 fetch, target 68060

	Mixing  1 channel(s):    48073 ticks 189 packets
	Mixing  2 channel(s):    65622 ticks 189 packets
	Mixing  3 channel(s):    79522 ticks 189 packets
	Mixing  4 channel(s):    93702 ticks 189 packets
	Mixing  5 channel(s):   107951 ticks 189 packets
	Mixing  6 channel(s):   122454 ticks 189 packets
	Mixing  7 channel(s):   136699 ticks 189 packets
	Mixing  8 channel(s):   152201 ticks 189 packets
	Mixing  9 channel(s):   165239 ticks 189 packets
	Mixing 10 channel(s):   179373 ticks 189 packets
	Mixing 11 channel(s):   193857 ticks 189 packets
	Mixing 12 channel(s):   207912 ticks 189 packets
	Mixing 13 channel(s):   223480 ticks 189 packets
	Mixing 14 channel(s):   236706 ticks 189 packets
	Mixing 15 channel(s):   251025 ticks 189 packets
	Mixing 16 channel(s):   259951 ticks 189 packets
Test case 1:
	Mix : Lookup
	Norm: Multiplication/Shift
	Info: Move16 fetch, target 68040/60

	Mixing  1 channel(s):    50986 ticks 189 packets
	Mixing  2 channel(s):    72846 ticks 189 packets
	Mixing  3 channel(s):    90913 ticks 189 packets
	Mixing  4 channel(s):   108888 ticks 189 packets
	Mixing  5 channel(s):   127573 ticks 189 packets
	Mixing  6 channel(s):   146995 ticks 189 packets
	Mixing  7 channel(s):   166136 ticks 189 packets
	Mixing  8 channel(s):   184136 ticks 189 packets
	Mixing  9 channel(s):   203982 ticks 189 packets
	Mixing 10 channel(s):   221018 ticks 189 packets
	Mixing 11 channel(s):   239165 ticks 189 packets
	Mixing 12 channel(s):   259844 ticks 189 packets
	Mixing 13 channel(s):   277125 ticks 189 packets
	Mixing 14 channel(s):   294621 ticks 189 packets
	Mixing 15 channel(s):   314103 ticks 189 packets
	Mixing 16 channel(s):   325783 ticks 189 packets
Test case 2:
	Mix : Delta Lookup
	Norm: Multiplication/Shift
	Info: Move16 fetch, target 68040/60

	Mixing  1 channel(s):    51709 ticks 189 packets
	Mixing  2 channel(s):    75265 ticks 189 packets
	Mixing  3 channel(s):    94933 ticks 189 packets
	Mixing  4 channel(s):   114056 ticks 189 packets
	Mixing  5 channel(s):   134962 ticks 189 packets
	Mixing  6 channel(s):   154481 ticks 189 packets
	Mixing  7 channel(s):   175378 ticks 189 packets
	Mixing  8 channel(s):   196536 ticks 189 packets
	Mixing  9 channel(s):   217971 ticks 189 packets
	Mixing 10 channel(s):   236321 ticks 189 packets
	Mixing 11 channel(s):   255657 ticks 189 packets
	Mixing 12 channel(s):   275072 ticks 189 packets
	Mixing 13 channel(s):   295820 ticks 189 packets
	Mixing 14 channel(s):   316234 ticks 189 packets
	Mixing 15 channel(s):   334639 ticks 189 packets
	Mixing 16 channel(s):   348876 ticks 189 packets
Test case 3:
	Mix : Shift Only
	Norm: Multiplication/Shift
	Info: Move16 fetch, target 68040

	Mixing  1 channel(s):    48939 ticks 189 packets
	Mixing  2 channel(s):    67538 ticks 189 packets
	Mixing  3 channel(s):    82385 ticks 189 packets
	Mixing  4 channel(s):    96897 ticks 189 packets
	Mixing  5 channel(s):   111948 ticks 189 packets
	Mixing  6 channel(s):   126232 ticks 189 packets
	Mixing  7 channel(s):   141217 ticks 189 packets
	Mixing  8 channel(s):   155738 ticks 189 packets
	Mixing  9 channel(s):   170333 ticks 189 packets
	Mixing 10 channel(s):   185093 ticks 189 packets
	Mixing 11 channel(s):   199483 ticks 189 packets
	Mixing 12 channel(s):   214748 ticks 189 packets
	Mixing 13 channel(s):   228799 ticks 189 packets
	Mixing 14 channel(s):   243839 ticks 189 packets
	Mixing 15 channel(s):   258147 ticks 189 packets
	Mixing 16 channel(s):   267639 ticks 189 packets
paraj is offline  
Old 09 July 2024, 20:55   #160
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,660
We need to see the shift-only mixing mock on an 040, really. I am surprised it's slower than multiplication on the 060, albeit not by much.
Karlos is online now  
 


Currently Active Users Viewing This Thread: 2 (0 members and 2 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Slow A4000 after overhaul Screechstar support.Hardware 57 11 July 2023 23:02
Amiga Font Editor overhaul buggs Coders. Releases 19 09 March 2021 17:39
Escom A1200 overhaul Ox. Amiga scene 8 26 August 2014 08:54
Will Bridge Practice series needs an overhaul mk1 HOL data problems 1 02 April 2009 21:55

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:29.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10544 seconds with 16 queries