Optimizing linear interpolation routine for a live resampler - Page 4

8bitbubsy · 24 October 2020, 12:09

Ok, good to know.

Regarding having two mixer sets, well... I'd have to make sure that the integer sampling position wouldn't change inside the mixing loop (which is not guaranteed if delta is < 1), which requires some calculations before entering the loop. I want to keep the outer mix loop simple, don't want too much overhead in there. I already have a 32-bit division in there to figure out the max amount of samples to mix before eventually having to handle the sample end/loop end.

chb · 24 October 2020, 12:48

Quote:

Originally Posted by 8bitbubsy

Ok, good to know.

Regarding having two mixer sets, well... I'd have to make sure that the integer sampling position wouldn't change inside the mixing loop (which is not guaranteed if delta is < 1), which requires some calculations before entering the loop. I want to keep the outer mix loop simple, don't want too much overhead in there. I already have a 32-bit division in there to figure out the max amount of samples to mix before eventually having to handle the sample end/loop end.

I was rather thinking of something more straight forward - like a simple test in the inner loop if you need to load and process a new sample or if you can continue with the old ones, but not assuming that the integer sampling position stays constant (that's unlikely). So one routine with that test and one without (which is probably exactly the one you have now), and deciding which one to choose before you enter the mixing loop based on the sample read delta, using a simple threshold that's probably around 0.7 or so. Of course, that's only makes sense if you have samples at low sample rates often.

EDIT: something like this, which is slower when delta > 1, but probably faster when samples are repeated:

Code:

; d0.w = bytes to mix

MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d4
    ext.w  d4
    asr.w  #8,d3
    sub.w  d3,d4          ; d4.w = sample2-sample1
    lsl.l  #7,d4
nc:
    move.l d7,d5
    rol.l #7,d5
    or.w d4,d5
    
    add.w  (a6,d5.w*2),d3
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    bcc nc                ; branch if carry clear = integer sampling position unchanged
    addx.l	d1,d2
    ENDM

8bitbubsy · 24 October 2020, 14:44

Yeah, that could maybe help.

I think most songs will have an average sampling frequency above ~20kHz (28603.99(mixrate)*0.7), but it might still help for some songs.

I'll try this when I get the time, on monday or so.

Don_Adan · 24 October 2020, 15:20

Quote:

Originally Posted by 8bitbubsy

Oh no, I totally forgot that this has a possible word access misalignment! If only one could use addx on an address register, then I could add the sampling position to a3 before the loop, then read two bytes, then addx on a3. And when I leave the loop, I subtract the sample base from a3 to get the new sampling position, before I handle sample end/loop end.

I could still do that method by using d2 as the relative sampling position (d2 = a3+d2 before loop). Then I do "move.l d3,a3" as the first instruction in the loop.
That's one move intruction extra, so probably slower in the end?

This can be faster, because asr.w #8,d3 (4c) instruction left, and perhaps one ext.w (4c) too. I dont know/remember exactly 68020/68030 timings.

Simple you must/can try also this version:
move.l d2,a3
move.b (a3),d3
move.b 1(a3),d7
If you want to reach maximum speed. Same for no swap command version.

8bitbubsy · 24 October 2020, 15:21

Yes, that's what I was thinking of. Though you still have to ext.w both of them to calculate the delta sample (-256..254).

Don_Adan · 24 October 2020, 15:23

Quote:

Originally Posted by chb

I was rather thinking of something more straight forward - like a simple test in the inner loop if you need to load and process a new sample or if you can continue with the old ones, but not assuming that the integer sampling position stays constant (that's unlikely). So one routine with that test and one without (which is probably exactly the one you have now), and deciding which one to choose before you enter the mixing loop based on the sample read delta, using a simple threshold that's probably around 0.7 or so. Of course, that's only makes sense if you have samples at low sample rates often.

EDIT: something like this, which is slower when delta > 1, but probably faster when samples are repeated:

Code:

; d0.w = bytes to mix

MIXSF MACRO
    move.w (a3,d2.l),d3   ; d3.w = 2x signed 8-bit samples
    move.b d3,d4
    ext.w  d4
    asr.w  #8,d3
    sub.w  d3,d4          ; d4.w = sample2-sample1
    lsl.l  #7,d4
nc:
    move.l d7,d5
    rol.l #7,d5
    or.w d4,d5
    
    add.w  (a6,d5.w*2),d3
    move.w (a1,d3.w*2),d5 ; d5.w = left output sample (from volume LUT)
    swap   d5
    move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample
    add.l  d5,(a5)+
    add.l  d6,d7          ; increase sampling position
    bcc nc                ; branch if carry clear = integer sampling position unchanged
    addx.l	d1,d2
    ENDM

Good idea, but i dont think it will be works. Because main loop works in DO counter. It will be trashed memory via "add.l d5,(a5)+" command, I think.

8bitbubsy · 24 October 2020, 15:26

Hm yes, you are right. This will not work correctly because it will keep branching to nc until one integer of the sampling position has been reached (e.g. d0 counter is not respected until that has happened). Also this macro is unrolled 4 times inside the actual inner loop.

chb · 24 October 2020, 16:12

Quote:

Originally Posted by Don_Adan

Good idea, but i dont think it will be works. Because main loop works in DO counter. It will be trashed memory via "add.l d5,(a5)+" command, I think.

Yes, that's true. You'd either have the main loop set up accordingly (first do a number of iterations of the modified loop with the branch, then without), or just reserve 1/(sample read delta low) longwords at the end of the buffer, or check a5 against some end position during each iteration. Might be worth the hassle or not; probably depends how many bytes you typically mix. And how slow memory reads are compared to instructions, but those repeated samples come from the data cache anyway on 030+. Hmmm.

8bitbubsy · 28 October 2020, 13:36

I tried to move the two samples byte by byte instead of as a word, and it was slightly slower no matter what I did. I also tried to do a benchmark test on my A1200 68030 to figure out how much worse the speed was for a misaligned word read, and it seems to be a quite small speed penalty (if I did my tests right).

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Interpolation new Sound options	Paul	support.WinUAE	10	17 March 2019 20:57
Artifacts from non-gamma-aware interpolation	mark_k	support.WinUAE	5	08 January 2018 14:37
switch sound interpolation 4 chs	turrican3	support.WinUAE	1	14 February 2016 10:39
Non-linear retrogaming?	Nogg	Retrogaming General Discussion	5	13 October 2007 17:09
is time linear	PaulS	request.Demos	2	22 September 2002 12:37

24 October 2020, 12:09	#61
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,712	Ok, good to know. Regarding having two mixer sets, well... I'd have to make sure that the integer sampling position wouldn't change inside the mixing loop (which is not guaranteed if delta is < 1), which requires some calculations before entering the loop. I want to keep the outer mix loop simple, don't want too much overhead in there. I already have a 32-bit division in there to figure out the max amount of samples to mix before eventually having to handle the sample end/loop end.

24 October 2020, 14:44	#63
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,712	Yeah, that could maybe help. I think most songs will have an average sampling frequency above ~20kHz (28603.99(mixrate)*0.7), but it might still help for some songs. I'll try this when I get the time, on monday or so.

24 October 2020, 15:21	#65
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,712	Yes, that's what I was thinking of. Though you still have to ext.w both of them to calculate the delta sample (-256..254).

24 October 2020, 15:26	#67
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,712	Hm yes, you are right. This will not work correctly because it will keep branching to nc until one integer of the sampling position has been reached (e.g. d0 counter is not respected until that has happened). Also this macro is unrolled 4 times inside the actual inner loop.

28 October 2020, 13:36	#69
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,712	I tried to move the two samples byte by byte instead of as a word, and it was slightly slower no matter what I did. I also tried to do a benchmark test on my A1200 68030 to figure out how much worse the speed was for a misaligned word read, and it seems to be a quite small speed penalty (if I did my tests right).

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)