06 November 2021, 10:41 | #21 | |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Quote:
Not that I would use this iterative approach, but since several people mentioned it and either used a dbf or a non-dbf loop, I just pointed out that dbcs is better as it does all of that for the same price. Similarly, Asman's iterative approach to push all the 1s out until you end up with a zero reg is using the same db(EQ) idea to break out of the loop as early as possible. I missed that too. There were 3 instruction in my first "serious" code, sub+not+and, and I was so focused on eor, usually the hardest one for me to figure, that I missed the easy sub+not to neg and edited with not+and changed to eor. The end result being pretty much the same, only a different register is trashed and it happens that you needed the other one. Last edited by a/b; 06 November 2021 at 10:46. |
|
06 November 2021, 11:31 | #22 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Now your dbcs does not update the counter when it exits, requiring an extra instruction to fix this before looping back. |
|
06 November 2021, 12:13 | #23 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
My first thought was to generate a mask (for a and) to exclude the upper bits and the simplest way is with the representation of negative numbers in binary (the two's complement), where all leftmost bit are reversed till the last one 1 at the rightmost position. So the neg came naturally, I didn't even think about the sub+not for the mask (I skipped the 3 instructions stage) But I was late, you had already published your alternative equivalent solution. So I only posted my approach as a corollary and it happened was perfect for meynaf's request. Interesting how things randomly evolve and fit together. PS: and the not+and->eor is nice, I've used it in a tricky routine not so much ago |
|
06 November 2021, 12:40 | #24 | |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Quote:
EDIT: Bigger picture... You most likely went through this but we don't know what else is behind. Why exactly do you have to go from lowest to highest 1? Would it be possible to reverse the algorithm or maybe bit order in input data, and simply use bfffo+bclr+tst (or alternative exit instead of tst if your unspecified code happen to allow it)? Last edited by a/b; 06 November 2021 at 13:02. |
|
06 November 2021, 15:35 | #25 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
|
|
06 November 2021, 16:32 | #26 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
I think you are having trouble reading the execution time in these examples? 4 branches LUT solution is somehow not OK because it uses 256b of memory, but 32 iteration branches are OK because the code is 20b smaller?
Typically, more memory comes with faster CPU. My example can be adapted to a 64K bytes LUT easily, so now we're down to 2 branches, twice as fast and only two memory word reads. Quote:
If you want to compare them, you can take my 4-branches example and replace each byte lookup with an 8-bit loop, making sure to keep the same early-exit structure. |
|
06 November 2021, 18:40 | #27 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
But there is an instruction to find the highest bit set, so we can just isolate the lowest one with a simple trick (so we have a value with only one bit set) and use that instruction. No loop to find the lowest bit then.
|
06 November 2021, 20:52 | #28 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
This code does that trick for your example input value in OP.
Code:
move.l d0,d1 subq.l #1,d1 and.l d0,d1 eor.l d1,d0 Code:
move.l d0,d1 subq.l #1,d1 eor.l d1,d0 Last edited by Photon; 06 November 2021 at 21:48. |
07 November 2021, 08:53 | #29 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
For this Ross' code is fine.
Code:
move.l d2,d1 neg.l d1 and.l d2,d1 |
08 November 2021, 17:58 | #30 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
So you have two answers. Plus all the alternatives.
Your competition/quiz has received many answers. Maybe contribute your answer now? Going on might be perceived as trolling by some. Last edited by Photon; 08 November 2021 at 18:07. |
08 November 2021, 19:12 | #31 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
What do you mean by contribute my answer ?
If you mean showing my own version, it's there since post #7. |
08 November 2021, 19:42 | #32 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Oh. So you had no answer. It seemed you found something wrong with all the different suggestions. Well, now you have multiple smallest + fastest answers, so why not incorporate one into your code and thank the author if it helps you?
Normally there's a need for asking, or it's "for" something, and when either has been satisfied the thread can end. |
09 November 2021, 10:51 | #33 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
@Photon: you've had a bad night or what ?
It's in post #7, i've mentioned this in previous post. So yes i had an answer. That said, i wasn't required to have one. Quote:
Quote:
So, already done. Quote:
So yes the thread can end, and should have at post #25. Not my fault it didn't. Now this post of yours (#32) could be perceived as trolling. I dare to hope it's just a misunderstanding. |
|||
12 November 2021, 23:26 | #34 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Post #6 is when Ross gave the same answer as I. Either will roll through the bits and supply the decision tree wishlisted into silicon as mentioned in my CSci comments directly related to the problem to be solved.
Post #7 is when you changed topic and "wanted more than you got", alluding to a "simple trick" not shared before in the thread. You have many simple tricks to pick and choose from now. My last piece of code will allow you to clear the bit. |
13 November 2021, 12:08 | #35 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Why do you insist ? Problem is solved, end of story.
|
15 December 2021, 15:36 | #36 |
Registered User
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 40
|
If it's for 060, this should be faster than using BFFFO:
Code:
move.l d0,d1 subq.l #1,d1 moveq.l #27,d2 eor.l d0,d1 mulu.l #$78345765,d1 lsr.l d2,d1 move.b table(pc,d1.l),d1 table: dc.b 30,3,12,25,28,10,16,22,18,2,24,9,21,1,20,0 dc.b 31,4,5,13,6,26,14,7,29,11,27,15,17,23,8,19 |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sprites on lowest area of PAL Screen | Tigerskunk | Coders. Asm / Hardware | 6 | 02 December 2017 20:34 |
Is WinUAE optimized for lowest possible input lag? | Dr.Venom | support.WinUAE | 22 | 03 July 2011 02:14 |
Trade - Blizzard 1230IV + SCSI + Cash for Lowest Spec PPC | fitzsteve | Swapshop | 4 | 10 December 2010 14:25 |
Lowest Spec for WinUAE ?? | Methanoid | support.WinUAE | 29 | 20 March 2008 17:23 |
pure bit not set? | amigalizard | support.Apps | 4 | 04 September 2005 05:47 |
|
|