24 April 2021, 17:10 | #1 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
Clamping a word to signed byte range
After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range?
ie.. quicker than this: Code:
cmp.w #127,d0 blt.s .NoClampMax moveq #127,d0 bra.s .NoClampMin .NoClampMax cmp.w #-128,d0 bge.s .NoClampMin moveq #-128,d0 .NoClampMin |
24 April 2021, 18:14 | #2 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with : Code:
move.b d0,d1 ext.w d1 cmp.w d1,d0 beq.s .done slt d0 eori.b #$7f,d0 .done Also higher byte of the word is incorrect ; you get a byte, not a clamped word. |
24 April 2021, 21:52 | #3 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Just use a pre-clamped 512 element LUT
|
24 April 2021, 22:35 | #4 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
|
|
24 April 2021, 23:33 | #5 | |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
Quote:
That extend the byte of itself and compare to see if it's within the range is a cool trick, would never have though of that! |
|
25 April 2021, 00:06 | #6 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.
|
25 April 2021, 00:13 | #7 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Isn't this faster?
Code:
clamp_lut_generator: lea clamp_lut(pc),a0 moveq #-1,d0 moveq #-128,d1 .1 move.b d1,(a0)+ subq.b #1,d0 bmi.b .1 .2 move.b d1,(a0)+ addq.b #1,d1 bvc.b .2 .3 move.b d0,(a0)+ addq.b #1,d1 bmi.b .3 lea -256(a0),a0 ... clamp: move.b (a0,d0.w),d0 ext.w d0 ... clamp_lut: ds.b 512 Last edited by ross; 25 April 2021 at 00:29. |
25 April 2021, 00:29 | #8 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.
Ross, not sure if what you are suggesting will work for that ? Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508 |
25 April 2021, 00:33 | #9 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Just enlarge the lut to 1024 byte
|
25 April 2021, 00:33 | #10 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
ext.w D0 move.b clampt_lut(PC,D0.W),D0 ext.w d0 ;just in case rts ; or bra or jmp clamp_lut: ds.b 512 |
|
25 April 2021, 00:39 | #11 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table.
And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles): Code:
; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits move.l d0,a0 move.b (a0),d0 ; ext.w d0 |
25 April 2021, 00:41 | #12 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Generator for the 1024 LUT:
Code:
clamp_lut_generator: lea clamp_lut(pc),a0 move.w #512-128-1,d0 moveq #-128,d1 .1 move.b d1,(a0)+ dbra d0,.1 .2 move.b d1,(a0)+ addq.b #1,d1 bvc.b .2 move.w #512-128-1,d0 moveq #127,d1 .3 move.b d1,(a0)+ dbra d0,.3 lea -512(a0),a0 EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what Last edited by ross; 25 April 2021 at 01:52. |
25 April 2021, 00:49 | #13 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
but if d0.w is (for example) -450, then it's not going to get a value from the LUT
I guess LUT pointer needs to point to the middle |
25 April 2021, 00:52 | #14 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table.
It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p. Last edited by a/b; 25 April 2021 at 00:54. Reason: typo |
25 April 2021, 00:58 | #15 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
lea -512(a0),a0 Optimized generator: Code:
clamp_lut_generator: lea clamp_lut(pc),a0 lea 1024(a0),a1 move.w #512-128-1,d0 moveq #-128,d1 moveq #127,d2 .1 move.b d1,(a0)+ move.b d2,-(a1) dbra d0,.1 .2 move.b d1,(a0)+ move.b d2,-(a1) addq.b #1,d1 dbra d2,.2 |
|
25 April 2021, 08:11 | #16 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss.
Not for 68000, but : Code:
moveq #24,d2 ; out of loop move.b d0,d1 asl.l d2,d1 bvc.s .done |
25 April 2021, 08:35 | #17 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier.
Code:
move.w d2,d1 ; with d2=$80 add.w d0,d1 cmp.w a0,d1 ; with a0=$100 blo.s .done sge d0 ext.w d0 subi.w #$80,d0 .done |
25 April 2021, 09:32 | #18 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant. EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram move.b (a0,d0.w),(a1)+ Last edited by ross; 25 April 2021 at 10:04. |
|
25 April 2021, 11:19 | #19 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
I often forget about 68000 timings. Too much 020+ code probably.
Quote:
The fastest and simplest way would then be : Code:
lsr.w #2,d0 Quote:
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope. |
||
25 April 2021, 11:36 | #20 | |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
Quote:
You're right, it's sample mixing (up to 4 samples mixed down to one), but it's a conversion of an existing sample mixing system written in C, and results need to be consistent with the output of the original. The original does indeed do a >>1 on each input sample to soften the impact of clipping EDIT... strictly speaking, that's not true... original shifts bytes up to word range, but <<7 rather than <<8, before adding and clamping to full word range. But doing it as bytes will be more efficient on vanilla 68k, and a small lookup to clamp would certainly speed things up Last edited by DanScott; 25 April 2021 at 11:42. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Clamped addition of 2 signed words | DanScott | Coders. Asm / Hardware | 13 | 14 April 2021 23:46 |
High Dynamic Range and Wide Color Range | Dr.Venom | support.WinUAE | 8 | 04 March 2019 21:13 |
Juggler movie demo v2.0 BYTE by BYTE! | TjLaZer | support.Demos | 3 | 30 November 2017 12:06 |
Word vs not word aligned playfield question | nandius_c | Coders. Asm / Hardware | 8 | 03 December 2013 12:03 |
64 bit signed multiply | cdoty | Coders. General | 2 | 16 December 2007 12:24 |
|
|