11 July 2024, 18:38 | #181 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,286
|
|
11 July 2024, 18:49 | #182 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
What I really should use for it is a SCSI bridge. The BlizzPPC has a SCSI controller.
|
11 July 2024, 19:27 | #183 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,286
|
You don't have to get everything set up perfectly. Just enough to run timing tests. Even just a gotek drive would probably do for that if it's otherwise working OK.
Having some way of doing really cycle accurate measurements yourself is just so good/necessary/satisfying when you're theory crafting. For "classic" A500 stuff I really wish I had a real (working) machine, but WinUAE is good enough for that. Even for plain A1200 it's just not accurate enough if you /really/ want the actual numbers. |
11 July 2024, 22:31 | #184 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Any possible attempt to start up the 1200 would have to wait until the weekend. It did boot up last time I tried it a few years ago, but my old CRT has seen better days.
|
11 July 2024, 23:39 | #185 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,078
|
I do have an A1200T with an apollo 1240@40mhz with 32mb fast memory in storage.
if I get the time I will dig it out. it also has a CV643D so should work with the CF card from the stock A4000. |
12 July 2024, 00:23 | #186 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,078
|
@Karlos
what is the minimum required to run the test? will it fit on a floppy disk? if the disk drive on the 1200T still works it would save pulling it apart. |
12 July 2024, 00:47 | #187 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Sure. Just the binary and the airstrike.raw sound file I think. It'll need to be in a directory called sounds but none of the other sounds are needed atm
Last edited by Karlos; 12 July 2024 at 01:50. |
12 July 2024, 20:08 | #188 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,078
|
as requested, Apollo 1240@40 32mb
|
12 July 2024, 20:12 | #189 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,286
|
Great your doing these things @abu_the_monkey!
I haven't looked at the code (yet), but "Shift only" code path must be testing something other than I imagined. |
12 July 2024, 21:21 | #190 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Quote:
See: https://github.com/0xABADCAFE/tkg-mi...040_asm.s#L714 @abu_the_monkey Looks like my delta code is worse now. I think it's the extra work per sample, so I will have to see what it is like if the samples are pre-encoded for it. That would make the innermost loop a few less instructions. |
|
12 July 2024, 21:25 | #191 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Maybe it would go faster if I used the full register rather than using .w and maybe LSL rather than ASL, after all, once the 8 bit was sign extended, it shouldn't matter.
|
12 July 2024, 21:27 | #192 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
If it turns out that for reasons unknown shifting is not as good as lookup (crazy but...), I say lookup has the key advantage of being functionally equivalent to the multiplication.
Last edited by Karlos; 13 July 2024 at 00:56. Reason: Lol, typo |
12 July 2024, 22:02 | #193 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,078
|
I can run it again to check it wasn't something I did during the test, but, I would have to wait till later this evening/tomorrow morning.
|
13 July 2024, 00:45 | #194 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Quote:
I just realised I managed to type sh*tting and not shifting last time, lol Last edited by Karlos; 13 July 2024 at 00:57. |
|
13 July 2024, 11:58 | #195 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,286
|
Don't think there's anything wrong with the testing, and can't see anything wrong with the code either.
I just don't understand how the shift version could possibly slower than the lookup method. It's a bit faster for 1 and 2 channels, but slower for 3+. I guess the cache is really helping the lookup for more channels, and there must be some kind of pipeline effect going on, but still. Guess it really shows how one's intuition can be very wrong as soon as the CPU is even a bit more complicated than plain 68000. |
13 July 2024, 12:46 | #196 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
I'm going to add a null mix test and a preconverted delta version. Hopefully the latter is even better than the straight lookup. Only one way to find out.
|
13 July 2024, 19:30 | #197 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
Here's the proposed mixing code for the pre-encoded L1 : D15 format
Code:
.mix_first_sample: move.b (a3)+,d0 ; next 8-bit sample. move.w (a2,d0.w*2),d4 ; look up the volume adjusted word add.w d4,(a4)+ ; accumulate onto the target buffer ; d4 contains the current 16-bit value .mix_next_sample: move.b (a3)+,d0 ; next delta in d0 add.w (a2,d0.w*2),d4 ; add lookup adjusted delta to current add.w d4,(a4)+ ; Accumulate dbra d1,.mix_next_sample Last edited by Karlos; 13 July 2024 at 19:43. |
13 July 2024, 20:00 | #198 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
I've just pushed an updated version to test. It reorders the tests slightly to be a bit more logical
Test 0: Null Mixer - All source data fetches and chip data writes are performed (in the expected sizes and sequence), but no actual mixing or normalisation. This will hopefully give some sort of baseline for the IO aspect. Test 1: The full multiplication path (060 target) Test 2: The shift mixing path Test 3: The 040 linear lookup path Test 4: The 040 delta lookup path (on the fly delta) Test 5: The 040 pre-encoded delta lookup This last test preconverts the loaded data to ensure that the interaction with the tables (and thus the cache) is a realistic model. |
13 July 2024, 20:34 | #199 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
I should point out that I haven't strictly validated that the new method works but even if it has bugs, they're probably in the pre-encoder
|
14 July 2024, 00:06 | #200 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,841
|
There was a bug in the pre-encoding, but not the actual mixer. In a completely unscientific test locally in UAE (no JIT), the pre-encoded version is basically the same speed as the linear version and faster than the on-the-fly delta. This is what I expect based purely on the number of operations per loop. How it performs on actual hardware, remains to be seen.
Changes are pushed. |
Currently Active Users Viewing This Thread: 2 (0 members and 2 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Slow A4000 after overhaul | Screechstar | support.Hardware | 57 | 11 July 2023 23:02 |
Amiga Font Editor overhaul | buggs | Coders. Releases | 19 | 09 March 2021 17:39 |
Escom A1200 overhaul | Ox. | Amiga scene | 8 | 26 August 2014 08:54 |
Will Bridge Practice series needs an overhaul | mk1 | HOL data problems | 1 | 02 April 2009 21:55 |
|
|