English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 24 April 2021, 17:10   #1
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
Clamping a word to signed byte range

After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range?

ie.. quicker than this:

Code:
	cmp.w	#127,d0
	blt.s	.NoClampMax
	moveq	#127,d0
	bra.s	.NoClampMin
.NoClampMax
	cmp.w	#-128,d0
	bge.s	.NoClampMin
	moveq	#-128,d0
.NoClampMin
have been trying to think of ways to do this, but struggling to find anything that works
DanScott is offline  
Old 24 April 2021, 18:14   #2
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:
 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done
You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.
meynaf is offline  
Old 24 April 2021, 21:52   #3
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Just use a pre-clamped 512 element LUT
ross is offline  
Old 24 April 2021, 22:35   #4
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by meynaf View Post
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:
 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done
You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.
Perhaps adding ext.w D0 can be option for clamped word.
Don_Adan is offline  
Old 24 April 2021, 23:33   #5
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
Quote:
Originally Posted by meynaf View Post
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with :
Code:
 move.b d0,d1
 ext.w d1
 cmp.w d1,d0
 beq.s .done
 slt d0
 eori.b #$7f,d0
.done
You have to verify though, not 100% sure it works in all cases.
Also higher byte of the word is incorrect ; you get a byte, not a clamped word.
Thanks!!! That's awesome, and an ext.w d0 after the eori should do the trick

That extend the byte of itself and compare to see if it's within the range is a cool trick, would never have though of that!
DanScott is offline  
Old 25 April 2021, 00:06   #6
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.
Antiriad_UK is offline  
Old 25 April 2021, 00:13   #7
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Isn't this faster?

Code:
clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq   #-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b   .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b   .3
    lea -256(a0),a0

...
clamp:
    move.b  (a0,d0.w),d0
    ext.w   d0
...


clamp_lut:
    ds.b    512
EDIT: on clamp: enter d0 is sure a full signed word value

Last edited by ross; 25 April 2021 at 00:29.
ross is offline  
Old 25 April 2021, 00:29   #8
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ?

Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508
DanScott is offline  
Old 25 April 2021, 00:33   #9
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by DanScott View Post
basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.

Ross, not sure if what you are suggesting will work for that ? I'd need a 64k lookup
Just enlarge the lut to 1024 byte
ross is offline  
Old 25 April 2021, 00:33   #10
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by ross View Post
Isn't this faster?

Code:
clamp_lut_generator:
    lea clamp_lut(pc),a0
    moveq   #-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    subq.b  #1,d0
    bmi.b   .1
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
.3  move.b  d0,(a0)+
    addq.b  #1,d1
    bmi.b   .3
    lea -256(a0),a0

...
clamp:
    ext.w   d0     ;just in case
    move.b  (a0,d0.w),d0
    ext.w   d0     ;just in case
...


clamp_lut:
    ds.b    512
EDIT: The state of d0 depends on your specific needs before and/or after the lut grab, so ext.w could be avoided.
Or use a .w lut
Why not ?
ext.w D0
move.b clampt_lut(PC,D0.W),D0
ext.w d0 ;just in case
rts ; or bra or jmp


clamp_lut:
ds.b 512
Don_Adan is offline  
Old 25 April 2021, 00:39   #11
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table.

And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles):
Code:
; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits
	move.l	d0,a0
	move.b	(a0),d0
;	ext.w	d0
a/b is offline  
Old 25 April 2021, 00:41   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Generator for the 1024 LUT:
Code:
clamp_lut_generator:
    lea clamp_lut(pc),a0
    move.w  #512-128-1,d0
    moveq   #-128,d1
.1  move.b  d1,(a0)+
    dbra    d0,.1 
.2  move.b  d1,(a0)+
    addq.b  #1,d1
    bvc.b   .2
    move.w  #512-128-1,d0
    moveq   #127,d1
.3  move.b  d1,(a0)+
    dbra    d0,.3 
    lea -512(a0),a0

EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what

Last edited by ross; 25 April 2021 at 01:52.
ross is offline  
Old 25 April 2021, 00:49   #13
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle
DanScott is offline  
Old 25 April 2021, 00:52   #14
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table.
It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p.

Last edited by a/b; 25 April 2021 at 00:54. Reason: typo
a/b is offline  
Old 25 April 2021, 00:58   #15
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by DanScott View Post
but if d0.w is (for example) -450, then it's not going to get a value from the LUT

I guess LUT pointer needs to point to the middle
Yep, notice that a0 point to the middle at the end of the generator
    lea -512(a0),a0


Optimized generator:
Code:
clamp_lut_generator:
    lea clamp_lut(pc),a0
    lea 1024(a0),a1
    move.w  #512-128-1,d0
    moveq   #-128,d1
    moveq   #127,d2
.1  move.b  d1,(a0)+
    move.b  d2,-(a1)
    dbra    d0,.1 
.2  move.b  d1,(a0)+
    move.b  d2,-(a1)
    addq.b  #1,d1
    dbra    d2,.2
Even this time a0 to the middle
ross is offline  
Old 25 April 2021, 08:11   #16
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss.

Not for 68000, but :
Code:
 moveq #24,d2       ; out of loop
 move.b d0,d1
 asl.l d2,d1
 bvc.s .done
meynaf is offline  
Old 25 April 2021, 08:35   #17
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier.
Code:
 move.w d2,d1  ; with d2=$80
 add.w d0,d1
 cmp.w a0,d1   ; with a0=$100
 blo.s .done
 sge d0
 ext.w d0
 subi.w #$80,d0
.done
meynaf is offline  
Old 25 April 2021, 09:32   #18
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by meynaf View Post
I thought about the LUT but depending on target cpu the gain could be small or even zero.
True. And your solutions are nice! But I bet for a bare 68k.
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.

EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram
move.b (a0,d0.w),(a1)+

Last edited by ross; 25 April 2021 at 10:04.
ross is offline  
Old 25 April 2021, 11:19   #19
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by ross View Post
True. And your solutions are nice! But I bet for a bare 68k.
I often forget about 68000 timings. Too much 020+ code probably.


Quote:
Originally Posted by ross View Post
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant.
I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :
Code:
 lsr.w #2,d0
This way, 100% sure it's in range and no clipping.


Quote:
Originally Posted by ross View Post
EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram
move.b (a0,d0.w),(a1)+
Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.
meynaf is offline  
Old 25 April 2021, 11:36   #20
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
Quote:
Originally Posted by meynaf View Post
I often forget about 68000 timings. Too much 020+ code probably.



I'm afraid mixing audio this way can be problematic - two channels with normal amplitude will lead to excessive clipping, and what i've read above suggests it will be 4.
The fastest and simplest way would then be :
Code:
 lsr.w #2,d0
This way, 100% sure it's in range and no clipping.



Obviously.
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope.

You're right, it's sample mixing (up to 4 samples mixed down to one), but it's a conversion of an existing sample mixing system written in C, and results need to be consistent with the output of the original.

The original does indeed do a >>1 on each input sample to soften the impact of clipping

EDIT... strictly speaking, that's not true... original shifts bytes up to word range, but <<7 rather than <<8, before adding and clamping to full word range.

But doing it as bytes will be more efficient on vanilla 68k, and a small lookup to clamp would certainly speed things up

Last edited by DanScott; 25 April 2021 at 11:42.
DanScott is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Clamped addition of 2 signed words DanScott Coders. Asm / Hardware 13 14 April 2021 23:46
High Dynamic Range and Wide Color Range Dr.Venom support.WinUAE 8 04 March 2019 21:13
Juggler movie demo v2.0 BYTE by BYTE! TjLaZer support.Demos 3 30 November 2017 12:06
Word vs not word aligned playfield question nandius_c Coders. Asm / Hardware 8 03 December 2013 12:03
64 bit signed multiply cdoty Coders. General 2 16 December 2007 12:24

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:52.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10262 seconds with 15 queries