English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 21 October 2020, 16:26   #21
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by 8bitbubsy View Post
WinUAE + stock, cycle-exact A1200 config w/ some fastmem.
Ah, ok. AFAIK the only CPU WinUAE emulates cycle-exact is the 68000; for the 68020 and upwards the emulation is less precise (because it is much harder and mostly undocumented). But I don't know if the difference is significant in your case.
Quote:
Originally Posted by 8bitbubsy View Post
EDIT: ARGH! I still managed to compile the previous version thinking I was compiling the LUT version, and apparently it still doesn't work like it should. Haha
Haha, been there, done that so many times.
chb is offline  
Old 21 October 2020, 16:53   #22
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
I managed to calculate a lerp LUT with 9-bit delta precision and 7-bit frac precision, and it works... but... it's about the same speed as the muls code on a 68020! So I was right to begin with, the instruction overhead is slow.

Here's how I did it:
Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	lsl.w	#7,d5
	move.w	d7,d4
	rol.w	#7,d4
	and.b	#127,d4
	or.b	d4,d5
	add.b	(a6,d5.w),d3
	ext.w	d3
vs. old muls method:

Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	move.w	d7,d4
	lsr.w	#8,d4
	muls.w	d4,d5
	asr.w	#8,d5
	add.w	d5,d3
Generating the lut:
Code:
int8_t lerpLUT[65536];

void generateLerpLUT(void)
{
	int8_t *ptr8 = lerpLUT;
	for (int32_t smp = -256; smp < 256; smp++)
	{
		for (int32_t frac = 0; frac < 128; frac++)
			*ptr8++ = (int8_t)round(smp * (frac / 128.0));
	}
}
I could change the LUT to use 8-bit frac precision, and then eliminate the AND'ing, but then the upper part of d5.l has to be cleared (longword LUT access), which probably doesn't make it much faster after all...
At first you must use original Amiga 68020 for test, not WinUAE.
At second this routine is called 4 times in row, if i remember right. Then you can use

move.w d7,d4
rol.w #7,d4
and.b #127,d4
before the loop, not inside your loop routine.
Don_Adan is offline  
Old 21 October 2020, 16:54   #23
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
No I can't, because the fraction changes for every output sample!

EDIT: Ok, the LUT method is faster on my 68030 50MHz Amiga! So that's good news. Also I edited the code again as I had to replace ext.w d3 with and.w #$ff,d3

So I think the thing to focus on now is to try and optimize this any further, if possible:
Code:
	move.w	(a3,d2.l),d3 ; read 2x signed 8-bit PCM samples
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	lsl.w	#7,d5
	move.w	d7,d4 ; copy of sampling position fraction (16-bit)
	rol.w	#7,d4
	and.b	#127,d4
	or.b	d4,d5
	add.b	(a6,d5.w),d3
	and.w	#$ff,d3 ; d3.b = -128..127 (ready for volume LUT)
Maybe one can use a bitfield instruction to get the LUT index calculated...

Also sorry for not listening too much to the suggestions, I just thought they were not suitable (changing frac etc).

Last edited by 8bitbubsy; 21 October 2020 at 17:34.
8bitbubsy is offline  
Old 21 October 2020, 17:38   #24
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
No I can't, because the fraction changes for every output sample!

EDIT: Ok, the LUT method is faster on my 68030 50MHz Amiga! So that's good news. Also I edited the code again as I had to replace ext.w d3 with and.w #$ff,d3

So I think the thing to focus on now is to try and optimize this any further, if possible:
Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	lsl.w	#7,d5
	move.w	d7,d4
	rol.w	#7,d4
	and.b	#127,d4
	or.b	d4,d5
	add.b	(a6,d5.w),d3
	and.w	#$ff,d3
Maybe one can use a bitfield instruction to get the LUT index calculated...
ok, right. Original code:

MIXCF MACRO
move.w (a3,d2.l),d3
move.b d3,d5
ext.w d5
asr.w #8,d3
sub.w d3,d5
move.w d7,d4
lsr.w #8,d4
muls.w d4,d5
asr.w #8,d5
add.w d5,d3
move.w (a1,d3.w*2),d5
add.w d5,(a5)+
add.w d5,(a5)+
add.w d6,d7
addx.l d1,d2
ENDM

Perhaps after some modification, this can work fastest or same speed (i dont remember 68020 timings), d4 is free now too.

Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	lsl.w	#7,d5
;	move.w	d7,d4
;	rol.w	#7,d4
;	and.b	#127,d4
;	or.b	d4,d5
        rol.l #7,d7
        or.b    d7,d5
        ror.l    #7,d7           ; restore d7
	add.b	(a6,d5.w),d3
	and.w	#$ff,d3

.....

 add.l d6,d7 ; original d6/d7 word values must be in high word, and low word must be cleared (empty) before the loop
addx.l d1,d2
Don_Adan is offline  
Old 21 October 2020, 18:07   #25
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Thanks, that was slightly faster. I decided to store $00ff in d4.w for the and.w, which additionally made it a tiny bit faster.

Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d5
	ext.w	d5
	asr.w	#8,d3
	sub.w	d3,d5
	lsl.w	#7,d5
	rol.l	#7,d7
	or.b	d7,d5
	ror.l	#7,d7
	add.b	(a6,d5.w),d3
	and.w	d4,d3
	move.w	(a1,d3.w*2),d5
	add.w	d5,(a5)+
	add.w	d5,(a5)+
	add.l	d6,d7
	addx.l	d1,d2
8bitbubsy is offline  
Old 21 October 2020, 19:25   #26
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,044
If you are using only the upper 16 bits of d6/d7, you can use the lower 16 bits instead of d5. Then you only have to roll d7 left and back right, and d5 is free. Something like:
Code:
	move.w	(a3,d2.l),d3
	move.b	d3,d7
	ext.w	d7
	asr.w	#8,d3
	sub.w	d3,d7
	rol.l	#7,d7
	add.b	(a6,d7.w),d3
	ror.l	#7,d7
	and.w	d4,d3
	move.w	(a1,d3.w*2),d7
	add.w	d7,(a5)+
	add.w	d7,(a5)+
	add.l	d6,d7
	addx.l	d1,d2
a/b is online now  
Old 21 October 2020, 19:31   #27
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Awesome! That worked and gave a nice speed improvement.
8bitbubsy is offline  
Old 21 October 2020, 19:47   #28
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,044
Laaaag ;p.
And if you have no use for d5, I think move d7,d5 with ror d5 should be faster than ror/rol d7 on a 020/030.
a/b is online now  
Old 21 October 2020, 19:56   #29
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Quote:
Originally Posted by a/b View Post
Laaaag ;p.
And if you have no use for d5, I think move d7,d5 with ror d5 should be faster than ror/rol d7 on a 020/030.
d5 is sadly in use.
EDIT: d5.l can be used for the center mixer. Now, I'm a bit confused as to what you meant I could do with d5.

Here's the current mixers:

Stereo mix:
Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    rol.l  #7,d7
    add.b  (a6,d7.w),d3    
    and.w  d4,d3          ; d3.w = $00xx = 8-bit signed interpolated sample
    ror.l  #7,d7
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Center mix (slightly faster when channel pan is in center):

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    rol.l  #7,d7
    add.b  (a6,d7.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    ror.l  #7,d7
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    ENDM

Last edited by 8bitbubsy; 21 October 2020 at 20:10.
8bitbubsy is offline  
Old 21 October 2020, 20:36   #30
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
d5 is sadly in use.
EDIT: d5.l can be used for the center mixer. Now, I'm a bit confused as to what you meant I could do with d5.

Here's the current mixers:

Stereo mix:
Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    rol.l  #7,d7
    add.b  (a6,d7.w),d3    
    and.w  d4,d3          ; d3.w = $00xx = 8-bit signed interpolated sample
    ror.l  #7,d7
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Center mix (slightly faster when channel pan is in center):

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    rol.l  #7,d7
    add.b  (a6,d7.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    ror.l  #7,d7
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    ENDM
Perhaps something like this:

Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    move.l d7,d5
    rol.l  #7,d5
    add.b  (a6,d5.w),d3    
    and.w  d4,d3          ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Don_Adan is offline  
Old 21 October 2020, 21:12   #31
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Ah, like that! Yes, it made it slightly faster.

So now we're left with:

Stereo mix:
Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    move.l d7,d5 
    rol.l  #7,d5
    add.b  (a6,d5.w),d3    
    and.w  d4,d3          ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Center mix (slightly faster when channel pan is in center):

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d5
    add.b  (a6,d5.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    ENDM

Last edited by 8bitbubsy; 22 October 2020 at 14:28.
8bitbubsy is offline  
Old 21 October 2020, 21:35   #32
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
Ah, like that! Yes, it made it slightly faster.

So now we're left with:

Stereo mix:
Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    move.l d7,d5 
    rol.l  #7,d5
    add.b  (a6,d5.w),d3    
    and.w  d4,d3          ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Center mix (slightly faster when channel pan is in center):

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d7
    add.b  (a6,d5.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    ENDM
Perhaps, but if A5 is chip ram writing then perhaps can be pipelined for 68030 and MIXCF. meynaf is expert in 68030 pipelining. Or maybe 1 longword ADD will be fastest than 2 word ADDs ?
Don_Adan is offline  
Old 21 October 2020, 21:36   #33
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
I'm mixing to a 16-bit fastmem stereo buffer, then in the post-mixing stage I use a post-mixing table to convert it to pre-clamped, normalized 14-bit values for Paula (yes, I use 14-bit output).
I played around with trying to make it use longword add for center mix, but it turned out to be slower. E.g. move.w d3,d5 swap d5 move.w d3,d5 add.l d5,(a5)+
8bitbubsy is offline  
Old 21 October 2020, 22:15   #34
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,044
It might do nothing, since right-shift is extra fast on 020, but just in case... Replace the first four with:
Code:
	move.w	(a3,d2.l),d7	; d7.w = 2x signed 8-bit samples
	bfexts	d7{16:8},d3
	ext.w	d7
a/b is online now  
Old 21 October 2020, 22:20   #35
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Quote:
Originally Posted by a/b View Post
It might do nothing, since right-shift is extra fast on 020, but just in case... Replace the first four with:
Code:
    move.w    (a3,d2.l),d7    ; d7.w = 2x signed 8-bit samples
    bfexts    d7{16:8},d3
    ext.w    d7
Just benchmarked it on my 68030 50MHz A1200, and it's about 2-4% slower.
8bitbubsy is offline  
Old 21 October 2020, 22:29   #36
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
I'm mixing to a 16-bit fastmem stereo buffer, then in the post-mixing stage I use a post-mixing table to convert it to pre-clamped, normalized 14-bit values for Paula (yes, I use 14-bit output).
I played around with trying to make it use longword add for center mix, but it turned out to be slower. E.g. move.w d3,d5 swap d5 move.w d3,d5 add.l d5,(a5)+
If you want, you can check this:
Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d7
    add.b  (a6,d5.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    add.w (a1,d3.w*2),(a5)+ 
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    add.w (a1,d3.w*2),(a5)+ 
   ENDM
Don_Adan is offline  
Old 21 October 2020, 22:30   #37
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
That's an extra look-up + word read from memory, can that possibly be faster?
Also no need to put instructions inbetween audio buffer writes, I'm not using chipmem here.

add (An,Dn),(An)+ is also not a valid opcode. You can only do that on move, I think.

Last edited by 8bitbubsy; 21 October 2020 at 22:40.
8bitbubsy is offline  
Old 21 October 2020, 23:25   #38
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,044
OK, another idea...
If d3 bits 8-15 are all either 0 or 1, so if you make the a1/a4 tables twice as large (512 words instead of 256) with indices -256 to 255 (-256 = 0, -255 = 1, ... -1 = 255) and a1/a4 pointing to index 0, you can drop:
Code:
	and.w	d4,d3
And d4 is now free.
a/b is online now  
Old 21 October 2020, 23:42   #39
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,018
Quote:
Originally Posted by 8bitbubsy View Post
That's an extra look-up + word read from memory, can that possibly be faster?
Also no need to put instructions inbetween audio buffer writes, I'm not using chipmem here.

add (An,Dn),(An)+ is also not a valid opcode. You can only do that on move, I think.
Right. no opcode. But you can check this. Writing to fastmem is pipelining too, if i remember right.

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d7
    add.b  (a6,d5.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    add.w  d3,(a5)+
    ENDM
Don_Adan is offline  
Old 22 October 2020, 12:20   #40
8bitbubsy
Registered User
 
8bitbubsy's Avatar
 
Join Date: Sep 2009
Location: Norway
Posts: 1,712
Quote:
Originally Posted by Don_Adan View Post
Right. no opcode. But you can check this. Writing to fastmem is pipelining too, if i remember right.

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d7
    add.b  (a6,d5.w),d3
    and.w  d4,d3           ; d3.w = $00xx = 8-bit signed interpolated sample
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    add.w  d3,(a5)+
    ENDM
This was actually a bit slower on my 68030 50MHz A1200 benchmark, for some reason??

Quote:
Originally Posted by a/b View Post
OK, another idea...
If d3 bits 8-15 are all either 0 or 1, so if you make the a1/a4 tables twice as large (512 words instead of 256) with indices -256 to 255 (-256 = 0, -255 = 1, ... -1 = 255) and a1/a4 pointing to index 0, you can drop:
Code:
    and.w    d4,d3
And d4 is now free.
I pre-centered the volume LUT pointers so that they can handle a signed look-up (still same LUT size), then I increased the lerp LUT size by two, so that it uses signed word values. Now d4 is indeed free and the code is slightly faster. It's currently like this:

Stereo mix:
Code:
; d0.w = bytes to mix
MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7          ; d7.w = sample2-sample1
    move.l d7,d5 
    rol.l  #7,d5
    add.w  (a6,d5.w*2),d3
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    addx.l d1,d2
    ENDM
Center mix (slightly faster when channel pan is in center):

Code:
; d0.w = bytes to mix
MIXCF MACRO
    move.w (a3,d2.l),d3    ; d3.w = 2x signed 8-bit samples
    move.b d3,d7
    ext.w  d7
    asr.w  #8,d3
    sub.w  d3,d7           ; d7.w = sample2-sample1
    move.l d7,d5    
    rol.l  #7,d5
    add.w  (a6,d5.w*2),d3
    move.w (a1,d3.w*2),d3  ; d3.w = output sample (from volume LUT)
    add.w  d3,(a5)+
    add.w  d3,(a5)+
    add.l  d6,d7           ; increase sampling position
    addx.l d1,d2
    ENDM
Getting quite fast now, but the binary is getting big. 433kB as of now.

Last edited by 8bitbubsy; 22 October 2020 at 14:28.
8bitbubsy is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Interpolation new Sound options Paul support.WinUAE 10 17 March 2019 20:57
Artifacts from non-gamma-aware interpolation mark_k support.WinUAE 5 08 January 2018 14:37
switch sound interpolation 4 chs turrican3 support.WinUAE 1 14 February 2016 10:39
Non-linear retrogaming? Nogg Retrogaming General Discussion 5 13 October 2007 17:09
is time linear PaulS request.Demos 2 22 September 2002 12:37

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:45.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.14442 seconds with 13 queries