PDA

View Full Version : Help coding


alexh
08 August 2006, 17:47
Is there a simple way to Mirror data?

My input data sets are very confined, they are just "one-hot". Meaning that only 1-bit can be set at once.

0001 ($1) = 1000 ($8)
0010 ($2) = 0100 ($4)
0100 ($4) = 0010 ($2)
1000 ($8) = 0001 ($1)

I actually need to Mirror then invert

0001 ($1) = 0111 ($7)
0010 ($2) = 1011 ($B)
0100 ($4) = 1101 ($D)
1000 ($8) = 1110 ($E)

If there isnt a simple arithmetical technique I'll just implement a switch statement in ASM.

I have the usual MOVE, NOT, AND, XOR, OR, SL, SR opcodes etc.

Npl
08 August 2006, 23:03
Im pretty sure they aint an easy way. I`d use a 256-entry table with inverted bytes.

Codetapper
08 August 2006, 23:17
I'm not sure what you mean by one hot, do you mean they are always one bit set? If that is the case, you can get around the table problem, as you know which bit is set so can mirror it by setting the mirrored bit straight away then doing the not operation on it.

But if that isn't the case, a 256 byte table that you generate is probably the best way to go, and you can fill in the mirrored not'd version at the same time meaning a byte lookup will be very quick.

girv
09 August 2006, 02:45
I remember asking this question years ago on csa.games I think. I wanted to reverse the order of bits in a byte or word. The best anyone came up with was a lookup table, just like the others have said already.

Also, avoid the switch statement approach if performance is important.

alexh
09 August 2006, 15:03
I'm not sure what you mean by one hot, do you mean they are always one bit set?

Oops. Sorry for not explaining. One-Hot (in this case) means that only one bit can be set at once. Think of each bit as being a mutually exclusive flag.

If that is the case, you can get around the table problem, as you know which bit is set so can mirror it by setting the mirrored bit straight away then doing the not operation on it.
The data is an input, an interrupt status register. You dont know which of the four bits is set. It could be any of them.

But if that isn't the case, a 256 byte table that you generate is probably the best way to go
Because it is "one-hot" it's effectively just a 4-entry lookup table. But I dont have any memory (other than instruction memory) so I cannot implement a true lookup table.

But I was hoping there was a simple arithmetic operation so that I wouldnt have to do any SUBI+BCC instructions.

girv
09 August 2006, 15:54
The data is an input, an interrupt status register. You dont know which of the four bits is set. It could be any of them.

Because it is "one-hot" it's effectively just a 4-entry lookup table. But I dont have any memory (other than instruction memory) so I cannot implement a true lookup table.

But I was hoping there was a simple arithmetic operation so that I wouldnt have to do any SUBI+BCC instructions.
In your case you'd need only a 9 byte lookup table, using the 4 bits of input information as an index into a byte table holding the output value you need. Can't you spare 9 bytes? It would probably take more in instruction bytes to code an algorithm to do the same job!

If you're on 020-040 you could look at the BFFFO (Bit Field Find First One) instruction. IIRC it exists on 060 as well but is emulated.

I don't know the exact syntax of BFFFO but your code would be something like this.

; > d0.l = input status value

bfffo d0{0:31},d1 ; d1 = bit position of first 1 bit
neg.b d1 ; get d1 = 3-d1
addq.b #3,d1
moveq #0,d0 ; set only bit d1 in d0
bset d1,d0
not.b d0 ; invert d0 to get final value
I now sit back and wait for the Obvious Mistake Police to pounce.

Galahad/FLT
09 August 2006, 16:00
What machine is this for where memory is so critical?

alexh
09 August 2006, 16:36
It's a PicoController inside a hardware RAID5 controller that I designed (using the 68k instructions as a template)

It has no RAM at all. Just 16*48-bit registers and a 512 entry instruction RAM.

One of the team want to do something it wasnt supposed to do and is looking for a workaround using the minimum number of instructions and the minimum number of cycles.

Had I known they would ever have wanted to do anything like this I would have added it to the load instruction.

It would have added little more than 4 x 2:1 mux's and 4 wires (i.e. nothing!) :(

Photon
09 August 2006, 18:29
For this example, you can do it without bit mirroring, by looking at the values:

Flip: ;d0=4-bit input
addq.b #6,d0
cmp.b #7,d0
beq.s .done
cmp.b #14,d0
beq.s .done
addq.b #3,d0
.done: rts

alexh
09 August 2006, 19:05
Top man.

Npl
09 August 2006, 21:22
You could use x = 0x8 / x. but thats likely not supported or slow

If a variable shift is availabe, you could use a special bitmask v = 111100010000 and shift that to the right v=v>>x, then add x>>2 to the result x = x>>2+v. The low 4 bits contain the mirrored pattern.

Im not fluent at m68k assembly, I added some C-style remarks to make it clear

Flip: ;d0=4-bit input
move #F10, d1 ;d1 = 0xF10
lsr.w d0,d1 ; d1 = (d1 >> d0)
lsr.b #2, d0 ; d0 = (d0>>2)
add d1,d0 ; d0 = d0 + d1
andi.b #7, d0 ; d0 = d0&0x7
rts
Photons code of course will do the same, but I consider avoiding branches an art ;)
Dunno which one would be faster/smaller on m68k

Psygore
09 August 2006, 21:31
Here a 4bit mirroring:
input : d0
output : d1

roxr.b #1,d0
addx.b d1,d1
roxr.b #1,d0
addx.b d1,d1
roxr.b #1,d0
addx.b d1,d1
roxr.b #1,d0
addx.b d1,d1
not.b d1 ;invert value
and.b #%1111,d1

AGN
11 August 2006, 00:37
Optimized Photon version:


d0 - input
d1 - output

add.b #6,d0
move.l d0,d1
and.b #4,d0
bne done
add.b #3,d1
done:
only 4 and a half instruction - did I fit in cycles?

ps.
send more puzzles like this one ;)

Photon
14 August 2006, 01:40
Great! Or even:


addq.b #6,d0
btst #2,d0
bne.s .done
addq.b #3,d0
.done:

alexh
15 August 2006, 09:24
@AGN & Photon
Cool. I can do that.

I dont have the instruction BTST in my hardware but unlike the 68k I do have 3 operands in my instructions (src1, src2, dest)

Have to see which is faster. Avoiding branches is good in 99% of situations, but using delay slots can mean zero penalty in some situations. (This isnt a situation where you can use a delay slot on it's own but maybe the other code fits around it)