21 October 2020, 16:26 | #21 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
Ah, ok. AFAIK the only CPU WinUAE emulates cycle-exact is the 68000; for the 68020 and upwards the emulation is less precise (because it is much harder and mostly undocumented). But I don't know if the difference is significant in your case.
Haha, been there, done that so many times. |
21 October 2020, 16:53 | #22 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
At second this routine is called 4 times in row, if i remember right. Then you can use move.w d7,d4 rol.w #7,d4 and.b #127,d4 before the loop, not inside your loop routine. |
|
21 October 2020, 16:54 | #23 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
No I can't, because the fraction changes for every output sample!
EDIT: Ok, the LUT method is faster on my 68030 50MHz Amiga! So that's good news. Also I edited the code again as I had to replace ext.w d3 with and.w #$ff,d3 So I think the thing to focus on now is to try and optimize this any further, if possible: Code:
move.w (a3,d2.l),d3 ; read 2x signed 8-bit PCM samples move.b d3,d5 ext.w d5 asr.w #8,d3 sub.w d3,d5 lsl.w #7,d5 move.w d7,d4 ; copy of sampling position fraction (16-bit) rol.w #7,d4 and.b #127,d4 or.b d4,d5 add.b (a6,d5.w),d3 and.w #$ff,d3 ; d3.b = -128..127 (ready for volume LUT) Also sorry for not listening too much to the suggestions, I just thought they were not suitable (changing frac etc). Last edited by 8bitbubsy; 21 October 2020 at 17:34. |
21 October 2020, 17:38 | #24 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
MIXCF MACRO move.w (a3,d2.l),d3 move.b d3,d5 ext.w d5 asr.w #8,d3 sub.w d3,d5 move.w d7,d4 lsr.w #8,d4 muls.w d4,d5 asr.w #8,d5 add.w d5,d3 move.w (a1,d3.w*2),d5 add.w d5,(a5)+ add.w d5,(a5)+ add.w d6,d7 addx.l d1,d2 ENDM Perhaps after some modification, this can work fastest or same speed (i dont remember 68020 timings), d4 is free now too. Code:
move.w (a3,d2.l),d3 move.b d3,d5 ext.w d5 asr.w #8,d3 sub.w d3,d5 lsl.w #7,d5 ; move.w d7,d4 ; rol.w #7,d4 ; and.b #127,d4 ; or.b d4,d5 rol.l #7,d7 or.b d7,d5 ror.l #7,d7 ; restore d7 add.b (a6,d5.w),d3 and.w #$ff,d3 ..... add.l d6,d7 ; original d6/d7 word values must be in high word, and low word must be cleared (empty) before the loop addx.l d1,d2 |
|
21 October 2020, 18:07 | #25 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Thanks, that was slightly faster. I decided to store $00ff in d4.w for the and.w, which additionally made it a tiny bit faster.
Code:
move.w (a3,d2.l),d3 move.b d3,d5 ext.w d5 asr.w #8,d3 sub.w d3,d5 lsl.w #7,d5 rol.l #7,d7 or.b d7,d5 ror.l #7,d7 add.b (a6,d5.w),d3 and.w d4,d3 move.w (a1,d3.w*2),d5 add.w d5,(a5)+ add.w d5,(a5)+ add.l d6,d7 addx.l d1,d2 |
21 October 2020, 19:25 | #26 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,044
|
If you are using only the upper 16 bits of d6/d7, you can use the lower 16 bits instead of d5. Then you only have to roll d7 left and back right, and d5 is free. Something like:
Code:
move.w (a3,d2.l),d3 move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 rol.l #7,d7 add.b (a6,d7.w),d3 ror.l #7,d7 and.w d4,d3 move.w (a1,d3.w*2),d7 add.w d7,(a5)+ add.w d7,(a5)+ add.l d6,d7 addx.l d1,d2 |
21 October 2020, 19:31 | #27 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Awesome! That worked and gave a nice speed improvement.
|
21 October 2020, 19:47 | #28 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,044
|
Laaaag ;p.
And if you have no use for d5, I think move d7,d5 with ror d5 should be faster than ror/rol d7 on a 020/030. |
21 October 2020, 19:56 | #29 | |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Quote:
EDIT: d5.l can be used for the center mixer. Now, I'm a bit confused as to what you meant I could do with d5. Here's the current mixers: Stereo mix: Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 rol.l #7,d7 add.b (a6,d7.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample ror.l #7,d7 move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT) swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 rol.l #7,d7 add.b (a6,d7.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample ror.l #7,d7 move.w (a1,d3.w*2),d3 ; d3.w = output sample (from volume LUT) add.w d3,(a5)+ add.w d3,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Last edited by 8bitbubsy; 21 October 2020 at 20:10. |
|
21 October 2020, 20:36 | #30 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.b (a6,d5.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT) swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM |
|
21 October 2020, 21:12 | #31 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Ah, like that! Yes, it made it slightly faster.
So now we're left with: Stereo mix: Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.b (a6,d5.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT) swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.b (a6,d5.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample move.w (a1,d3.w*2),d3 ; d3.w = output sample (from volume LUT) add.w d3,(a5)+ add.w d3,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Last edited by 8bitbubsy; 22 October 2020 at 14:28. |
21 October 2020, 21:35 | #32 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
|
|
21 October 2020, 21:36 | #33 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
I'm mixing to a 16-bit fastmem stereo buffer, then in the post-mixing stage I use a post-mixing table to convert it to pre-clamped, normalized 14-bit values for Paula (yes, I use 14-bit output).
I played around with trying to make it use longword add for center mix, but it turned out to be slower. E.g. move.w d3,d5 swap d5 move.w d3,d5 add.l d5,(a5)+ |
21 October 2020, 22:15 | #34 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,044
|
It might do nothing, since right-shift is extra fast on 020, but just in case... Replace the first four with:
Code:
move.w (a3,d2.l),d7 ; d7.w = 2x signed 8-bit samples bfexts d7{16:8},d3 ext.w d7 |
21 October 2020, 22:20 | #35 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Just benchmarked it on my 68030 50MHz A1200, and it's about 2-4% slower.
|
21 October 2020, 22:29 | #36 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d7 add.b (a6,d5.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample add.w (a1,d3.w*2),(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 add.w (a1,d3.w*2),(a5)+ ENDM |
|
21 October 2020, 22:30 | #37 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
That's an extra look-up + word read from memory, can that possibly be faster?
Also no need to put instructions inbetween audio buffer writes, I'm not using chipmem here. add (An,Dn),(An)+ is also not a valid opcode. You can only do that on move, I think. Last edited by 8bitbubsy; 21 October 2020 at 22:40. |
21 October 2020, 23:25 | #38 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,044
|
OK, another idea...
If d3 bits 8-15 are all either 0 or 1, so if you make the a1/a4 tables twice as large (512 words instead of 256) with indices -256 to 255 (-256 = 0, -255 = 1, ... -1 = 255) and a1/a4 pointing to index 0, you can drop: Code:
and.w d4,d3 |
21 October 2020, 23:42 | #39 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
|
Quote:
Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d7 add.b (a6,d5.w),d3 and.w d4,d3 ; d3.w = $00xx = 8-bit signed interpolated sample move.w (a1,d3.w*2),d3 ; d3.w = output sample (from volume LUT) add.w d3,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 add.w d3,(a5)+ ENDM |
|
22 October 2020, 12:20 | #40 | ||
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Quote:
Quote:
Stereo mix: Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.w (a6,d5.w*2),d3 move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT) swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.w (a6,d5.w*2),d3 move.w (a1,d3.w*2),d3 ; d3.w = output sample (from volume LUT) add.w d3,(a5)+ add.w d3,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Last edited by 8bitbubsy; 22 October 2020 at 14:28. |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Interpolation new Sound options | Paul | support.WinUAE | 10 | 17 March 2019 20:57 |
Artifacts from non-gamma-aware interpolation | mark_k | support.WinUAE | 5 | 08 January 2018 14:37 |
switch sound interpolation 4 chs | turrican3 | support.WinUAE | 1 | 14 February 2016 10:39 |
Non-linear retrogaming? | Nogg | Retrogaming General Discussion | 5 | 13 October 2007 17:09 |
is time linear | PaulS | request.Demos | 2 | 22 September 2002 12:37 |
|
|