English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   Clamping a word to signed byte range (https://eab.abime.net/showthread.php?t=106727)

DanScott 24 April 2021 17:10

Clamping a word to signed byte range
 
After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range?

ie.. quicker than this:

Code:

        cmp.w        #127,d0
        blt.s        .NoClampMax
        moveq        #127,d0
        bra.s        .NoClampMin
.NoClampMax
        cmp.w        #-128,d0
        bge.s        .NoClampMin
        moveq        #-128,d0
.NoClampMin

have been trying to think of ways to do this, but struggling to find anything that works

meynaf 24 April 2021 18:14

That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:

move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

ross 24 April 2021 21:52

Just use a pre-clamped 512 element LUT ;)

Don_Adan 24 April 2021 22:35

Quote:

Originally Posted by meynaf (Post 1479025)
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:

move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

Perhaps adding ext.w D0 can be option for clamped word.

DanScott 24 April 2021 23:33

Quote:

Originally Posted by meynaf (Post 1479025)
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:

move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

Thanks!!! That's awesome, and an ext.w d0 after the eori should do the trick :)

That extend the byte of itself and compare to see if it's within the range is a cool trick, would never have though of that! :bowdown

Antiriad_UK 25 April 2021 00:06

Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.

ross 25 April 2021 00:13

Isn't this faster?

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq  #-1,d0
    moveq  #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b  .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b  .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b  .3
    lea -256(a0),a0

...
clamp:
    move.b  (a0,d0.w),d0
    ext.w  d0
...


clamp_lut:
    ds.b    512

EDIT: on clamp: enter d0 is sure a full signed word value :)

DanScott 25 April 2021 00:29

basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ?

Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508

ross 25 April 2021 00:33

Quote:

Originally Posted by DanScott (Post 1479105)
basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ? I'd need a 64k lookup :D

Just enlarge the lut to 1024 byte :)

Don_Adan 25 April 2021 00:33

Quote:

Originally Posted by ross (Post 1479103)
Isn't this faster?

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq  #-1,d0
    moveq  #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b  .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b  .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b  .3
    lea -256(a0),a0

...
clamp:
    ext.w  d0    ;just in case
    move.b  (a0,d0.w),d0
    ext.w  d0    ;just in case
...


clamp_lut:
    ds.b    512

EDIT: The state of d0 depends on your specific needs before and/or after the lut grab, so ext.w could be avoided.
Or use a .w lut :)

Why not ?
ext.w D0
move.b clampt_lut(PC,D0.W),D0
ext.w d0 ;just in case
rts ; or bra or jmp


clamp_lut:
ds.b 512

a/b 25 April 2021 00:39

A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table.

And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles):
Code:

; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits
        move.l        d0,a0
        move.b        (a0),d0
;        ext.w        d0


ross 25 April 2021 00:41

Generator for the 1024 LUT:
Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    move.w  #512-128-1,d0
    moveq  #-128,d1
.1  move.b  d1,(a0)+
    dbra    d0,.1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b  .2
    move.w  #512-128-1,d0
    moveq  #127,d1
.3  move.b  d1,(a0)+
    dbra    d0,.3
    lea -512(a0),a0


EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what :D

DanScott 25 April 2021 00:49

but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle :)

a/b 25 April 2021 00:52

Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table.
It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p.

ross 25 April 2021 00:58

Quote:

Originally Posted by DanScott (Post 1479111)
but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle :)

Yep, notice that a0 point to the middle at the end of the generator :)
    lea -512(a0),a0


Optimized generator:
Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    lea 1024(a0),a1
    move.w  #512-128-1,d0
    moveq  #-128,d1
    moveq  #127,d2
.1  move.b  d1,(a0)+
    move.b  d2,-(a1)
    dbra    d0,.1
.2  move.b  d1,(a0)+
    move.b  d2,-(a1)
    addq.b  #1,d1
    dbra    d2,.2

Even this time a0 to the middle :)

meynaf 25 April 2021 08:11

I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss.

Not for 68000, but :
Code:

moveq #24,d2      ; out of loop
 move.b d0,d1
 asl.l d2,d1
 bvc.s .done


meynaf 25 April 2021 08:35

Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier.
Code:

move.w d2,d1  ; with d2=$80
 add.w d0,d1
 cmp.w a0,d1  ; with a0=$100
 blo.s .done
 sge d0
 ext.w d0
 subi.w #$80,d0
.done


ross 25 April 2021 09:32

Quote:

Originally Posted by meynaf (Post 1479140)
I thought about the LUT but depending on target cpu the gain could be small or even zero.

True. And your solutions are nice! But I bet for a bare 68k.;)
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.

EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram
move.b (a0,d0.w),(a1)+

meynaf 25 April 2021 11:19

Quote:

Originally Posted by ross (Post 1479145)
True. And your solutions are nice! But I bet for a bare 68k.;)

I often forget about 68000 timings. Too much 020+ code probably. ;)


Quote:

Originally Posted by ross (Post 1479145)
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.

I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :
Code:

lsr.w #2,d0
This way, 100% sure it's in range and no clipping.


Quote:

Originally Posted by ross (Post 1479145)
EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram
move.b (a0,d0.w),(a1)+

Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.

DanScott 25 April 2021 11:36

Quote:

Originally Posted by meynaf (Post 1479160)
I often forget about 68000 timings. Too much 020+ code probably. ;)



I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :
Code:

lsr.w #2,d0
This way, 100% sure it's in range and no clipping.



Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.


You're right, it's sample mixing (up to 4 samples mixed down to one), but it's a conversion of an existing sample mixing system written in C, and results need to be consistent with the output of the original.

The original does indeed do a >>1 on each input sample to soften the impact of clipping

EDIT... strictly speaking, that's not true... original shifts bytes up to word range, but <<7 rather than <<8, before adding and clamping to full word range.

But doing it as bytes will be more efficient on vanilla 68k, and a small lookup to clamp would certainly speed things up


All times are GMT +2. The time now is 16:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.04833 seconds with 11 queries