Clamping a word to signed byte range

DanScott · 24 April 2021, 17:10

After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range?

ie.. quicker than this:

Code:

	cmp.w	#127,d0
	blt.s	.NoClampMax
	moveq	#127,d0
	bra.s	.NoClampMin
.NoClampMax
	cmp.w	#-128,d0
	bge.s	.NoClampMin
	moveq	#-128,d0
.NoClampMin

have been trying to think of ways to do this, but struggling to find anything that works

meynaf · 24 April 2021, 18:14

That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :

Code:

 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

ross · 24 April 2021, 21:52

Just use a pre-clamped 512 element LUT

Don_Adan · 24 April 2021, 22:35

Quote:

Originally Posted by meynaf

That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :

Code:

 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

Perhaps adding ext.w D0 can be option for clamped word.

DanScott · 24 April 2021, 23:33

Quote:

Originally Posted by meynaf

That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :

Code:

 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done

You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

Thanks!!! That's awesome, and an ext.w d0 after the eori should do the trick

That extend the byte of itself and compare to see if it's within the range is a cool trick, would never have though of that!

Antiriad_UK · 25 April 2021, 00:06

Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.

ross · 25 April 2021, 00:13

Isn't this faster?

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq   #-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b   .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b   .3
    lea -256(a0),a0

...
clamp:
    move.b  (a0,d0.w),d0
    ext.w   d0
...


clamp_lut:
    ds.b    512

EDIT: on clamp: enter d0 is sure a full signed word value

DanScott · 25 April 2021, 00:29

basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ?

Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508

ross · 25 April 2021, 00:33

Quote:

Originally Posted by DanScott

basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ? I'd need a 64k lookup

Just enlarge the lut to 1024 byte

Don_Adan · 25 April 2021, 00:33

Quote:

Originally Posted by ross

Isn't this faster?

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq   #-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b   .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b   .3
    lea -256(a0),a0

...
clamp:
    ext.w   d0     ;just in case
    move.b  (a0,d0.w),d0
    ext.w   d0     ;just in case
...


clamp_lut:
    ds.b    512

EDIT: The state of d0 depends on your specific needs before and/or after the lut grab, so ext.w could be avoided.
Or use a .w lut

Why not ?
ext.w D0
move.b clampt_lut(PC,D0.W),D0
ext.w d0 ;just in case
rts ; or bra or jmp

clamp_lut:
ds.b 512

a/b · 25 April 2021, 00:39

A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table.

And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles):

Code:

; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits
	move.l	d0,a0
	move.b	(a0),d0
;	ext.w	d0

ross · 25 April 2021, 00:41

Generator for the 1024 LUT:

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    move.w  #512-128-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    dbra    d0,.1 
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
    move.w  #512-128-1,d0
    moveq   #127,d1
.3  move.b  d1,(a0)+
    dbra    d0,.3 
    lea -512(a0),a0

EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what

DanScott · 25 April 2021, 00:49

but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle

a/b · 25 April 2021, 00:52

Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table.
It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p.

ross · 25 April 2021, 00:58

Quote:

Originally Posted by DanScott

but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle

Yep, notice that a0 point to the middle at the end of the generator

    lea -512(a0),a0

Optimized generator:

Code:

clamp_lut_generator:
    lea clamp_lut(pc),a0
    lea 1024(a0),a1
    move.w  #512-128-1,d0
    moveq   #-128,d1
    moveq   #127,d2
.1  move.b  d1,(a0)+
    move.b  d2,-(a1)
    dbra    d0,.1 
.2  move.b  d1,(a0)+
    move.b  d2,-(a1)
    addq.b  #1,d1
    dbra    d2,.2

Even this time a0 to the middle

meynaf · 25 April 2021, 08:11

I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss.

Not for 68000, but :

Code:

 moveq #24,d2       ; out of loop
 move.b d0,d1
 asl.l d2,d1
 bvc.s .done

meynaf · 25 April 2021, 08:35

Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier.

Code:

 move.w d2,d1  ; with d2=$80
 add.w d0,d1
 cmp.w a0,d1   ; with a0=$100
 blo.s .done
 sge d0
 ext.w d0
 subi.w #$80,d0
.done

ross · 25 April 2021, 09:32

Quote:

Originally Posted by meynaf

I thought about the LUT but depending on target cpu the gain could be small or even zero.

True. And your solutions are nice! But I bet for a bare 68k.

The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.

EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram

move.b (a0,d0.w),(a1)+

meynaf · 25 April 2021, 11:19

Quote:

Originally Posted by ross

True. And your solutions are nice! But I bet for a bare 68k.

I often forget about 68000 timings. Too much 020+ code probably.

Quote:

Originally Posted by ross

The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.

I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :

Code:

 lsr.w #2,d0

This way, 100% sure it's in range and no clipping.

Quote:

Originally Posted by ross

EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram

move.b (a0,d0.w),(a1)+

Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.

DanScott · 25 April 2021, 11:36

Quote:

Originally Posted by meynaf

I often forget about 68000 timings. Too much 020+ code probably.

I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :

Code:

 lsr.w #2,d0

This way, 100% sure it's in range and no clipping.

Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.

You're right, it's sample mixing (up to 4 samples mixed down to one), but it's a conversion of an existing sample mixing system written in C, and results need to be consistent with the output of the original.

The original does indeed do a >>1 on each input sample to soften the impact of clipping

EDIT... strictly speaking, that's not true... original shifts bytes up to word range, but <<7 rather than <<8, before adding and clamping to full word range.

But doing it as bytes will be more efficient on vanilla 68k, and a small lookup to clamp would certainly speed things up

25 April 2021, 00:39	#11
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table. And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles): Code: ; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits move.l d0,a0 move.b (a0),d0 ; ext.w d0

25 April 2021, 00:52	#14
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table. It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p. Last edited by a/b; 25 April 2021 at 00:54. Reason: typo

25 April 2021, 08:11	#16
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss. Not for 68000, but : Code: moveq #24,d2 ; out of loop move.b d0,d1 asl.l d2,d1 bvc.s .done

25 April 2021, 08:35	#17
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier. Code: move.w d2,d1 ; with d2=$80 add.w d0,d1 cmp.w a0,d1 ; with a0=$100 blo.s .done sge d0 ext.w d0 subi.w #$80,d0 .done

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Clamped addition of 2 signed words	DanScott	Coders. Asm / Hardware	13	14 April 2021 23:46
High Dynamic Range and Wide Color Range	Dr.Venom	support.WinUAE	8	04 March 2019 21:13
Juggler movie demo v2.0 BYTE by BYTE!	TjLaZer	support.Demos	3	30 November 2017 12:06
Word vs not word aligned playfield question	nandius_c	Coders. Asm / Hardware	8	03 December 2013 12:03
64 bit signed multiply	cdoty	Coders. General	2	16 December 2007 12:24

24 April 2021, 17:10	#1
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,211	Clamping a word to signed byte range After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range? ie.. quicker than this: Code: cmp.w #127,d0 blt.s .NoClampMax moveq #127,d0 bra.s .NoClampMin .NoClampMax cmp.w #-128,d0 bge.s .NoClampMin moveq #-128,d0 .NoClampMin have been trying to think of ways to do this, but struggling to find anything that works

24 April 2021, 18:14	#2
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	That one's more difficult due no easy way to check if it's in range or not. Quickest i could come up with : Code: move.b d0,d1 ext.w d1 cmp.w d1,d0 beq.s .done slt d0 eori.b #$7f,d0 .done You have to verify though, not 100% sure it works in all cases. Also higher byte of the word is incorrect ; you get a byte, not a clamped word.

24 April 2021, 21:52	#3
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Just use a pre-clamped 512 element LUT

25 April 2021, 00:06	#6
Antiriad_UK OCS forever! Join Date: Mar 2019 Location: Birmingham, UK Posts: 418	Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.

25 April 2021, 00:29	#8
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,211	basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127. Ross, not sure if what you are suggesting will work for that ? Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508

25 April 2021, 00:41	#12
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Generator for the 1024 LUT: Code: clamp_lut_generator: lea clamp_lut(pc),a0 move.w #512-128-1,d0 moveq #-128,d1 .1 move.b d1,(a0)+ dbra d0,.1 .2 move.b d1,(a0)+ addq.b #1,d1 bvc.b .2 move.w #512-128-1,d0 moveq #127,d1 .3 move.b d1,(a0)+ dbra d0,.3 lea -512(a0),a0 EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what Last edited by ross; 25 April 2021 at 01:52.

25 April 2021, 00:49	#13
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,211	but if d0.w is (for example) -450, then it's not going to get a value from the LUT I guess LUT pointer needs to point to the middle

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)