06 July 2024, 17:31 | #141 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Early days mulling the data but I think that the issue with the delta lookup on 040 is that the extra work per sample is the problem. However, there is a super simple solution to that. We just pre-encode the samples into the expected 1:15 linear/delta frames. This will save some cycles in the mix/accumulate loop.
|
08 July 2024, 01:43 | #142 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
I plotted the data earlier for the 060 with datacache enabled and the trends were quite interesting. All three modes (040 linear, 040 delta and 060 muls) followed a slightly quadratic curve, where the time taken increases slightly more than linearly. The curvature is quite conspicuous.
|
08 July 2024, 11:41 | #143 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Right, the data for these two systems is now charted.
The X axis is the number of channels and the Y axis is the time, in milliseconds, per packet of audio. The configuration is for a 16kHz mixing rate and 50Hz update, so we need a new packet of 320 sample (LR pairs) every 20ms. Therefore anything above 20ms is going to be a problem. The cost of normalising the data and writing the chip ram buffers is basically invariant with respect to the number of channels. The most obvious initial conclusion is that my code sucks. Just sucks. Either that, or I have measured something incorrectly. The effect of the delta lookup on the 040 is only apparent after 8 channels and is marginal. However, I haven't tested with samples that are preconverted into the proposed 1:15 linear/delta format, all that is happening on the fly. |
08 July 2024, 12:10 | #144 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Some basic maths:
In the worst case, we are reading 16 sets of 320 samples from fast memory in order to produce 640 samples written to chip memory. We are using move16 to perform the reading as we know we won't be reusing the same input data again soon* and the location being written to should already be in the datacache. * Note that's not actually true in the test case, but it is true in the general case we are aiming for. To hit the target update rate of 50Hz, that would imply reading 16x320x50 = 256,000 bytes/s from Fast Ram and writing 640x50 = 32000 bytes/sec to Chip Ram. Our chip writes are all long and long aligned (well, almost, we do have the volume modulation packets that get 1 word written every 16 samples). This doesn't seem too egregious. Moreover, I've ran OctaMED at 28kHz with 20+ channels of audio many times in the past, with 14-bit replay on a 68040 with a much more complex (varying/different sample rates per channel) mixer. I wonder if I have a loop in the wrong place, lol... Last edited by Karlos; 08 July 2024 at 12:24. |
08 July 2024, 12:28 | #145 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
|
Maybe it would be an idea to initialize this variable:
https://github.com/0xABADCAFE/tkg-mi...47/main.c#L203 |
08 July 2024, 12:34 | #146 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Quote:
|
|
08 July 2024, 12:37 | #147 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Pushed an update to fix the uninitialised. See if it makes a difference?
Would be amusing if it was accumulating each previous run as well. |
08 July 2024, 12:55 | #148 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
|
Think this looks more reasonable (no longer quadratic)
Code:
Using 68040 linear lookup code path Got Timer, frequency is 709379 Hz Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060 Testing with 1 channel(s) Mixed 189 Packets in 51077 EClockVal ticks (709379/s) Testing with 2 channel(s) Mixed 189 Packets in 72777 EClockVal ticks (709379/s) Testing with 3 channel(s) Mixed 189 Packets in 90963 EClockVal ticks (709379/s) Testing with 4 channel(s) Mixed 189 Packets in 109273 EClockVal ticks (709379/s) Testing with 5 channel(s) Mixed 189 Packets in 127572 EClockVal ticks (709379/s) Testing with 6 channel(s) Mixed 189 Packets in 146921 EClockVal ticks (709379/s) Testing with 7 channel(s) Mixed 189 Packets in 167578 EClockVal ticks (709379/s) Testing with 8 channel(s) Mixed 189 Packets in 186947 EClockVal ticks (709379/s) Testing with 9 channel(s) Mixed 189 Packets in 205100 EClockVal ticks (709379/s) Testing with 10 channel(s) Mixed 189 Packets in 223904 EClockVal ticks (709379/s) Testing with 11 channel(s) Mixed 189 Packets in 243475 EClockVal ticks (709379/s) Testing with 12 channel(s) Mixed 189 Packets in 261216 EClockVal ticks (709379/s) Testing with 13 channel(s) Mixed 189 Packets in 279599 EClockVal ticks (709379/s) Testing with 14 channel(s) Mixed 189 Packets in 297640 EClockVal ticks (709379/s) Testing with 15 channel(s) Mixed 189 Packets in 315759 EClockVal ticks (709379/s) Testing with 16 channel(s) Mixed 189 Packets in 328309 EClockVal ticks (709379/s) Using 68040 delta lookup code path Got Timer, frequency is 709379 Hz Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060 Testing with 1 channel(s) Mixed 189 Packets in 51248 EClockVal ticks (709379/s) Testing with 2 channel(s) Mixed 189 Packets in 75004 EClockVal ticks (709379/s) Testing with 3 channel(s) Mixed 189 Packets in 94780 EClockVal ticks (709379/s) Testing with 4 channel(s) Mixed 189 Packets in 115296 EClockVal ticks (709379/s) Testing with 5 channel(s) Mixed 189 Packets in 135090 EClockVal ticks (709379/s) Testing with 6 channel(s) Mixed 189 Packets in 155458 EClockVal ticks (709379/s) Testing with 7 channel(s) Mixed 189 Packets in 178751 EClockVal ticks (709379/s) Testing with 8 channel(s) Mixed 189 Packets in 200930 EClockVal ticks (709379/s) Testing with 9 channel(s) Mixed 189 Packets in 219283 EClockVal ticks (709379/s) Testing with 10 channel(s) Mixed 189 Packets in 238803 EClockVal ticks (709379/s) Testing with 11 channel(s) Mixed 189 Packets in 259180 EClockVal ticks (709379/s) Testing with 12 channel(s) Mixed 189 Packets in 279343 EClockVal ticks (709379/s) Testing with 13 channel(s) Mixed 189 Packets in 298529 EClockVal ticks (709379/s) Testing with 14 channel(s) Mixed 189 Packets in 320269 EClockVal ticks (709379/s) Testing with 15 channel(s) Mixed 189 Packets in 338412 EClockVal ticks (709379/s) Testing with 16 channel(s) Mixed 189 Packets in 351083 EClockVal ticks (709379/s) Using 68060 code path Got Timer, frequency is 709379 Hz Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf2060 Testing with 1 channel(s) Mixed 189 Packets in 47733 EClockVal ticks (709379/s) Testing with 2 channel(s) Mixed 189 Packets in 65388 EClockVal ticks (709379/s) Testing with 3 channel(s) Mixed 189 Packets in 79558 EClockVal ticks (709379/s) Testing with 4 channel(s) Mixed 189 Packets in 93729 EClockVal ticks (709379/s) Testing with 5 channel(s) Mixed 189 Packets in 108285 EClockVal ticks (709379/s) Testing with 6 channel(s) Mixed 189 Packets in 122527 EClockVal ticks (709379/s) Testing with 7 channel(s) Mixed 189 Packets in 136714 EClockVal ticks (709379/s) Testing with 8 channel(s) Mixed 189 Packets in 150958 EClockVal ticks (709379/s) Testing with 9 channel(s) Mixed 189 Packets in 164968 EClockVal ticks (709379/s) Testing with 10 channel(s) Mixed 189 Packets in 179377 EClockVal ticks (709379/s) Testing with 11 channel(s) Mixed 189 Packets in 193682 EClockVal ticks (709379/s) Testing with 12 channel(s) Mixed 189 Packets in 207930 EClockVal ticks (709379/s) Testing with 13 channel(s) Mixed 189 Packets in 222006 EClockVal ticks (709379/s) Testing with 14 channel(s) Mixed 189 Packets in 237397 EClockVal ticks (709379/s) Testing with 15 channel(s) Mixed 189 Packets in 250223 EClockVal ticks (709379/s) Testing with 16 channel(s) Mixed 189 Packets in 259709 EClockVal ticks (709379/s) |
08 July 2024, 13:10 | #149 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
SMH, lol.
Those numbers do look a bit healthier. @abu any chance of a rerun? |
08 July 2024, 13:18 | #150 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,056
|
I will rerun when I get home tonight.
|
08 July 2024, 13:51 | #151 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
|
08 July 2024, 13:57 | #152 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Using @paraj's new values and extracting the differences from @abu's old ones (to isolate the time added per test), the results make a lot more sense.
It's clear the cache improvements for delta lookup are not great but it does have a few more operations per sample. I will throw in a preconveted version to see if it's better with the simpler inner loop. 10ms to mix 20ms of audio is still not awesome. |
08 July 2024, 14:11 | #153 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Does anyone have a 68040 with proper local fast memory that can test this?
|
08 July 2024, 18:39 | #154 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
|
If my math is right it seems to be about ~20 cycles/sample on 060 at the limit for extra channels. Seems decent enough.
For 040 I think it might be useful to have a test case that only uses volume steps that can be done with shifts (just clamp to nearest or w/e don't need to be precise). Just to have a lower bound. (And remember the thing about automatically running all tests ) |
08 July 2024, 18:41 | #155 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Yeah, the lookup performance is rather disappointing, even with taking cache hit rate into consideration.
One of the side effects of moving to shift based L/R channel volume is that you will lose the global mix level control. Currently, this is encoded into the lookup tables for 040 and factored into the multiplier in the 060 path. Last edited by Karlos; 08 July 2024 at 18:59. |
08 July 2024, 19:21 | #156 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
|
It would be purely to see the speed for comparison purposes at first (and similarly it might be good to include a "0 channel" test just to get a baseline). If it can't be fast enough that way you need to re-think the approach. I wonder though, how many samples would realistically be playing at once? There's probably "meta level" stuff you could do, like limiting enemy shots to N channels at engine level or something like that (if enemy shots is what is pushing it).
If "shift-only" performance is good, I'm sure the global volume issue could be solved (multiple functions/SMC/etc.) |
08 July 2024, 19:43 | #157 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
We can just fudge that with a fixed shift version for now. Also, a channel is only mixed when it is not muted, has a data pointer and a remaining samples to mix, so that aspect is already dealt with. It's the worst case I'm looking to set limits for.
After this there needs to be a whole prioritisation and assignment layer that decides which channel to recycle if all are in use. Last edited by Karlos; 08 July 2024 at 21:44. |
09 July 2024, 01:50 | #158 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
Quote:
I have included a mock "shift only" mixer version that applies a fixed level shift (but using a register count operand to be a better fit) too. |
|
09 July 2024, 20:10 | #159 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,261
|
For reference. Slightly slower on 060 (you really can program like it's 1994 for that).
Code:
Got Timer, tick frequency is 709379 Hz Loaded sounds/airstrike.raw [60460 bytes] at 0x68bf4900 Test case 0: Mix : Multiplication Norm: Multiplication/Shift Info: Move16 fetch, target 68060 Mixing 1 channel(s): 48073 ticks 189 packets Mixing 2 channel(s): 65622 ticks 189 packets Mixing 3 channel(s): 79522 ticks 189 packets Mixing 4 channel(s): 93702 ticks 189 packets Mixing 5 channel(s): 107951 ticks 189 packets Mixing 6 channel(s): 122454 ticks 189 packets Mixing 7 channel(s): 136699 ticks 189 packets Mixing 8 channel(s): 152201 ticks 189 packets Mixing 9 channel(s): 165239 ticks 189 packets Mixing 10 channel(s): 179373 ticks 189 packets Mixing 11 channel(s): 193857 ticks 189 packets Mixing 12 channel(s): 207912 ticks 189 packets Mixing 13 channel(s): 223480 ticks 189 packets Mixing 14 channel(s): 236706 ticks 189 packets Mixing 15 channel(s): 251025 ticks 189 packets Mixing 16 channel(s): 259951 ticks 189 packets Test case 1: Mix : Lookup Norm: Multiplication/Shift Info: Move16 fetch, target 68040/60 Mixing 1 channel(s): 50986 ticks 189 packets Mixing 2 channel(s): 72846 ticks 189 packets Mixing 3 channel(s): 90913 ticks 189 packets Mixing 4 channel(s): 108888 ticks 189 packets Mixing 5 channel(s): 127573 ticks 189 packets Mixing 6 channel(s): 146995 ticks 189 packets Mixing 7 channel(s): 166136 ticks 189 packets Mixing 8 channel(s): 184136 ticks 189 packets Mixing 9 channel(s): 203982 ticks 189 packets Mixing 10 channel(s): 221018 ticks 189 packets Mixing 11 channel(s): 239165 ticks 189 packets Mixing 12 channel(s): 259844 ticks 189 packets Mixing 13 channel(s): 277125 ticks 189 packets Mixing 14 channel(s): 294621 ticks 189 packets Mixing 15 channel(s): 314103 ticks 189 packets Mixing 16 channel(s): 325783 ticks 189 packets Test case 2: Mix : Delta Lookup Norm: Multiplication/Shift Info: Move16 fetch, target 68040/60 Mixing 1 channel(s): 51709 ticks 189 packets Mixing 2 channel(s): 75265 ticks 189 packets Mixing 3 channel(s): 94933 ticks 189 packets Mixing 4 channel(s): 114056 ticks 189 packets Mixing 5 channel(s): 134962 ticks 189 packets Mixing 6 channel(s): 154481 ticks 189 packets Mixing 7 channel(s): 175378 ticks 189 packets Mixing 8 channel(s): 196536 ticks 189 packets Mixing 9 channel(s): 217971 ticks 189 packets Mixing 10 channel(s): 236321 ticks 189 packets Mixing 11 channel(s): 255657 ticks 189 packets Mixing 12 channel(s): 275072 ticks 189 packets Mixing 13 channel(s): 295820 ticks 189 packets Mixing 14 channel(s): 316234 ticks 189 packets Mixing 15 channel(s): 334639 ticks 189 packets Mixing 16 channel(s): 348876 ticks 189 packets Test case 3: Mix : Shift Only Norm: Multiplication/Shift Info: Move16 fetch, target 68040 Mixing 1 channel(s): 48939 ticks 189 packets Mixing 2 channel(s): 67538 ticks 189 packets Mixing 3 channel(s): 82385 ticks 189 packets Mixing 4 channel(s): 96897 ticks 189 packets Mixing 5 channel(s): 111948 ticks 189 packets Mixing 6 channel(s): 126232 ticks 189 packets Mixing 7 channel(s): 141217 ticks 189 packets Mixing 8 channel(s): 155738 ticks 189 packets Mixing 9 channel(s): 170333 ticks 189 packets Mixing 10 channel(s): 185093 ticks 189 packets Mixing 11 channel(s): 199483 ticks 189 packets Mixing 12 channel(s): 214748 ticks 189 packets Mixing 13 channel(s): 228799 ticks 189 packets Mixing 14 channel(s): 243839 ticks 189 packets Mixing 15 channel(s): 258147 ticks 189 packets Mixing 16 channel(s): 267639 ticks 189 packets |
09 July 2024, 20:55 | #160 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,660
|
We need to see the shift-only mixing mock on an 040, really. I am surprised it's slower than multiplication on the 060, albeit not by much.
|
Currently Active Users Viewing This Thread: 2 (0 members and 2 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Slow A4000 after overhaul | Screechstar | support.Hardware | 57 | 11 July 2023 23:02 |
Amiga Font Editor overhaul | buggs | Coders. Releases | 19 | 09 March 2021 17:39 |
Escom A1200 overhaul | Ox. | Amiga scene | 8 | 26 August 2014 08:54 |
Will Bridge Practice series needs an overhaul | mk1 | HOL data problems | 1 | 02 April 2009 21:55 |
|
|