05 November 2021, 16:13 | #1 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
index of lowest set bit
Time for a small code contest, even though i really need to do that in my code.
If you didn't get it with the title, it's about getting the bit number of the lowest (= rightmost) bit that's equal to 1. Having a value with only that bit set is easier (when you know how) but not enough as it then serves as index. F.e. if input value is $12345678, it must return 3. For $80000000 it returns 31. For 0, don't care - unimportant for my use. Could even loop forever I have my version, but curious to see what others can come up with. 68020+ code allowed (else it's a nightmare !). |
05 November 2021, 17:12 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,043
|
Code:
move.b (a0,d0.w),d1 bne.b done swap d0 move.b (a0,d0.w),d1 done: |
05 November 2021, 17:20 | #3 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Not only a joke, but actually it doesn't work. You're using same table for high and low words.
|
05 November 2021, 17:37 | #4 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,043
|
Well, I was trying to supress my megalomania and settle with a single table ;p.
OK, now a bit more serious... How about: Code:
move.l d0,d1 subq.l #1,d1 eor.l d1,d0 bfffo d0{0:32},d0 eor.w #31,d0 edit: EOR is my bane... |
05 November 2021, 17:51 | #5 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
|
05 November 2021, 18:34 | #6 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
|
05 November 2021, 18:54 | #7 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Now a small, interesting problem : i need to clear said bit to be able to loop on them. (Didn't think about that when posting...)
Like this : Code:
; d2=input value, d1=tmp, d0=result loop move.w d2,d1 subq.w #1,d1 not.w d1 and.w d2,d1 bfffo d1{0:32},d0 eori.b #31,d0 (do something here with d0=bit#) eor.w d1,d2 bne loop |
05 November 2021, 20:18 | #8 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,627
|
If it's a contest and not a question, maybe specify size or cycles, and which platform? It will affect reply relevance and usefulness to you.
Simple pseudocode version is Code:
shift right bcs x dbf Dn x: move.w Dn,... From your last comment though, I would rather just loop from 0-31 and use bclr and act on the cc. These are 3 major ways. Last edited by Photon; 05 November 2021 at 20:26. |
05 November 2021, 20:30 | #9 | ||
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 189
|
Quote:
my 68000 compatible version, "move.l D2,D1" instructions can be removed if D2 does not have to be preserved: Code:
; d2=input value, d1=tmp, d0=result moveq #-1,D0 move.l D2,D1 Loop addq.l #1,D0 lsr.l #1,D1 bcc Loop Quote:
what about "bclr.l D0,D2"? Code:
; d2=input value, d1=tmp, d0=result moveq #-1,D0 move.l D2,D1 Loop addq.l #1,D0 lsr.l #1,D1 bcc Loop bclr.l D0,D2 |
||
05 November 2021, 20:33 | #10 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Quote:
Size ? Not too much. A few extra code words are ok, but huge table is overkill. Cycles ? As little as possible, considering size must not explode. Which platform ? I said 020+ was ok. Must be as agnostic as possible within this range. No, too many loops for values having few ones will make it even more damned slow. |
|
05 November 2021, 20:36 | #11 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
|
05 November 2021, 21:03 | #12 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,043
|
If you are looping through bits it's better to use dbcs:
Code:
moveq #32-1,d1 loop: lsr.l #1,d0 dbcs d1,loop eor.w #31,d1 |
05 November 2021, 22:11 | #13 | |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,043
|
Quote:
I optimized those two+and with sub+eor before you mentioned you need to clear the bit. |
|
05 November 2021, 22:14 | #14 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,627
|
You are performing a bitwise (non-atomic) operation, like a div. This makes it serial in nature, and this in itself requires a serial solution or a successive approximation (tree) solution. In some CPUs, there's dedicated silicon for the bitwise decision tree instead of branches.
In my 3 major solutions, the 5-word table corresponds to 5 comparisons in that tree, which means you can get a result factorially quicker than linear. This is the quickest canonical way, as in the fewest decisions barring a dedicated instruction. BFFFO scans for the highest bit, which means you must reverse bits, which is a similar desired but not implemented instruction in most CPUs. All this to say, every desire is not always implemented, and 680x0 is already a wishlist family of CISC (and later more formalized RISC) CPU. Much can be done with 256-byte tables, if 64K tables are too big. You can reverse bits, find the last set, fill, invert, and find contours. Lowest set bit is a contour. On different platforms, memory accesses will be more or less punishing, especially off-cache. If CPU is not specified then, there's no way you will get a single correct answer. It may very well be that the long loop in-cache with no memory access will be much faster than the shortest or fastest code with memory accesses. For 68000 and maybe also 68EC020/68EC030, I would LUT a byte at a time backward if nonzero: Code:
LowestBit: move.l d0,(a0)+ ;in: d0.l, (a0) 4 bytes workspace moveq #0,d0 REPT 4 move.b -(a0),d1 bne.s .lu addq.l #8,d0 ENDR RTS .lu: add.b (a2,d1.w),d0 ;a2=LUT .done: RTS ;out: d0.l=0-32 (32=no bit set) Last edited by Photon; 05 November 2021 at 22:32. |
05 November 2021, 22:56 | #15 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
What about this. I just checked few cases,
Code:
moveq #32-1,d1 loop add.l d0,d0 dbeq d1,loop addq.b #1,d1 |
05 November 2021, 23:10 | #16 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
Quote:
Code:
; d2=input value, d1=tmp, d0=result loop move.l d2,d1 neg.l d1 and.l d2,d1 bfffo d1{0:32},d0 eori.b #31,d0 ; (do something here with d0=bit#) eor.l d1,d2 bne.b loop |
|
06 November 2021, 02:53 | #17 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,627
|
On 68000-68010, bit field operations are not available. BFFFO is also unusable because it gives the highest bit and OP wants lowest bit.
Last edited by Photon; 06 November 2021 at 03:04. |
06 November 2021, 09:11 | #18 | ||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Quote:
As an example, if value $f0000000 is given, you'll execute your loop 32 times vs 4 in my case, and nothing can be done to cure that. Quote:
Quote:
The (n and n-1) trick can work, but not always. Quote:
Quote:
Quote:
Also in this code 68000 is totally out of question. Quote:
Quote:
Quote:
Quote:
Code:
loop bfffo d2{0:32},d0 eori.b #31,d0 (do something here) bclr d0,d2 tst.w d2 bne loop endm |
||||||||||
06 November 2021, 09:35 | #19 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
another approach (perhaps useless), better for 68000 (I think)
Code:
;d0 - in, d1 - out lowest: moveq #1,d2 move.w d0,d0 bne.b .go swap d0 moveq #16,d2 .go moveq #16-1,d1 .loop add.w d0,d0 dbeq d1,.loop add.b d2,d1 rts |
06 November 2021, 10:21 | #20 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Right, but in this example it's clearly not about 68000.
I have other examples of similar code : Code:
; pack the bits of d2 into d7, with mask=d1 moveq #0,d7 .loop bfffo d2{0:32},d4 beq.s .fin addq.l #1,d4 lsl.l d4,d2 lsl.l d4,d1 addx.l d7,d7 bra.s .loop .fin ; unpacks the bits of d2 into d7, with mask=d1 moveq #0,d7 .loop move.l d2,d4 beq.s .fin subq.l #1,d2 and.l d4,d2 ; n and (n-1) -> clr lower bit lsr.l #1,d1 bcc.s .loop eor.l d2,d4 add.l d4,d7 bra.s .loop .fin Or smaller, without making them slower. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sprites on lowest area of PAL Screen | Tigerskunk | Coders. Asm / Hardware | 6 | 02 December 2017 20:34 |
Is WinUAE optimized for lowest possible input lag? | Dr.Venom | support.WinUAE | 22 | 03 July 2011 02:14 |
Trade - Blizzard 1230IV + SCSI + Cash for Lowest Spec PPC | fitzsteve | Swapshop | 4 | 10 December 2010 14:25 |
Lowest Spec for WinUAE ?? | Methanoid | support.WinUAE | 29 | 20 March 2008 17:23 |
pure bit not set? | amigalizard | support.Apps | 4 | 04 September 2005 05:47 |
|
|