Clamping a word to signed byte range
After some great contributions to my "clamped word result of an add" question.... does anyone know a good/quicker way to clamp a word to a signed byte range?
ie.. quicker than this: Code:
cmp.w #127,d0 |
That one's more difficult due no easy way to check if it's in range or not.
Quickest i could come up with : Code:
move.b d0,d1 Also higher byte of the word is incorrect ; you get a byte, not a clamped word. |
Just use a pre-clamped 512 element LUT ;)
|
Quote:
|
Quote:
That extend the byte of itself and compare to see if it's within the range is a cool trick, would never have though of that! :bowdown |
Yeah I never think of things this way! I guess have to watch out for d0 being clear in top part of word so the cmp.w doesn't get messed up.
|
Isn't this faster?
Code:
clamp_lut_generator: |
basically, I have up to 4 byte values (each can be in the full range -128 to +127) that I need to add together, and then clamp the result to -128 to +127.
Ross, not sure if what you are suggesting will work for that ? Actually, I could use a lookup (1024b) as I know that the maximum range after adding 4 bytes is -512 to +508 |
Quote:
|
Quote:
ext.w D0 move.b clampt_lut(PC,D0.W),D0 ext.w d0 ;just in case rts ; or bra or jmp clamp_lut: ds.b 512 |
A simple 14-cycle move.b (ax,d0.w) was the first thing I thought about, yeah. Not sure why complicate it further (other than adding an ext.w if you need a word again). But I thought Dan has his reasons not to use a table.
And if you want to take it even further, have a spare areg and can mess with memory as much as you want (12 cycles): Code:
; d0 = pointer to a 64kb aligned table ;P, make sure you don't trash top 16 bits |
Generator for the 1024 LUT:
Code:
clamp_lut_generator: EDIT: because there was a chain of 'fast' edit by everyone and it wasn't clear who was responding to whom and for what :D |
but if d0.w is (for example) -450, then it's not going to get a value from the LUT
I guess LUT pointer needs to point to the middle :) |
Haha, yeah. Maybe you missed a ";P" in the comment right after 64kb table.
It's very unlikely that such an extreme is needed here, but hey we don't know the full context so maybe that kind of complication is warranted here ;p. |
Quote:
lea -512(a0),a0 Optimized generator: Code:
clamp_lut_generator: |
I thought about the LUT but depending on target cpu the gain could be small or even zero. However if you're mixing audio samples (what else could be signed bytes ?) you can fit a volume boost/reduction for free. As too much clipping will result in big quality loss.
Not for 68000, but : Code:
moveq #24,d2 ; out of loop |
Another possibility (but untested) : turn signed into unsigned, as unsigned byte clamping is easier.
Code:
move.w d2,d1 ; with d2=$80 |
Quote:
The LUT is by far the fastest and simplest, especially if you mix and clamp a whole buffer and you can keep a0 constant. EDIT: and if you don't need to further elaborate the sample, store it directly to chip ram move.b (a0,d0.w),(a1)+ |
Quote:
Quote:
The fastest and simplest way would then be : Code:
lsr.w #2,d0 Quote:
It might be interesting to have better knowledge of what the whole program does. Micro-optimizations are nice, but optimizing gives better results with a broader scope. |
Quote:
You're right, it's sample mixing (up to 4 samples mixed down to one), but it's a conversion of an existing sample mixing system written in C, and results need to be consistent with the output of the original. The original does indeed do a >>1 on each input sample to soften the impact of clipping EDIT... strictly speaking, that's not true... original shifts bytes up to word range, but <<7 rather than <<8, before adding and clamping to full word range. But doing it as bytes will be more efficient on vanilla 68k, and a small lookup to clamp would certainly speed things up |
All times are GMT +2. The time now is 16:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.