English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 06 November 2021, 10:41   #21
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by meynaf View Post
As an example, if value $f0000000 is given, you'll execute your loop 32 times vs 4 in my case, and nothing can be done to cure that.
Uhhh, no. It's not a dbF/dbRA loop, it's dbCS. It will break out of the loop when you push the first (lowest) 1 out of the register.
Not that I would use this iterative approach, but since several people mentioned it and either used a dbf or a non-dbf loop, I just pointed out that dbcs is better as it does all of that for the same price.
Similarly, Asman's iterative approach to push all the 1s out until you end up with a zero reg is using the same db(EQ) idea to break out of the loop as early as possible.

Quote:
Originally Posted by meynaf View Post
How could I miss this... Thanks !
I missed that too. There were 3 instruction in my first "serious" code, sub+not+and, and I was so focused on eor, usually the hardest one for me to figure, that I missed the easy sub+not to neg and edited with not+and changed to eor. The end result being pretty much the same, only a different register is trashed and it happens that you needed the other one.

Last edited by a/b; 06 November 2021 at 10:46.
a/b is offline  
Old 06 November 2021, 11:31   #22
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by a/b View Post
Uhhh, no. It's not a dbF/dbRA loop, it's dbCS. It will break out of the loop when you push the first (lowest) 1 out of the register.
You will break out of the loop after 29 iterations, then have to execute it 3 more times as i loop over all '1'. So yes you'll do 32 iterations in total, dbcs or not.
Now your dbcs does not update the counter when it exits, requiring an extra instruction to fix this before looping back.
meynaf is offline  
Old 06 November 2021, 12:13   #23
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by a/b View Post
I missed that too. There were 3 instruction in my first "serious" code, sub+not+and, and I was so focused on eor, usually the hardest one for me to figure, that I missed the easy sub+not to neg and edited with not+and changed to eor. The end result being pretty much the same, only a different register is trashed and it happens that you needed the other one.
I suppose we all reasoned the same: find a quick way to isolate the affected bit so we can then use bfffo.
My first thought was to generate a mask (for a and) to exclude the upper bits and the simplest way is with the representation of negative numbers in binary (the two's complement), where all leftmost bit are reversed till the last one 1 at the rightmost position.
So the neg came naturally, I didn't even think about the sub+not for the mask (I skipped the 3 instructions stage)
But I was late, you had already published your alternative equivalent solution.
So I only posted my approach as a corollary and it happened was perfect for meynaf's request.
Interesting how things randomly evolve and fit together.


PS: and the not+and->eor is nice, I've used it in a tricky routine not so much ago
ross is offline  
Old 06 November 2021, 12:40   #24
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by meynaf View Post
You will break out of the loop after 29 iterations, then have to execute it 3 more times as i loop over all '1'. So yes you'll do 32 iterations in total, dbcs or not.
Now your dbcs does not update the counter when it exits, requiring an extra instruction to fix this before looping back.
Ah, OK. You were considering the whole thing, including your "outer" loop over all the 1s. I was talking only about a single instance of finding the lowest 1. My bad.

EDIT: Bigger picture... You most likely went through this but we don't know what else is behind. Why exactly do you have to go from lowest to highest 1? Would it be possible to reverse the algorithm or maybe bit order in input data, and simply use bfffo+bclr+tst (or alternative exit instead of tst if your unspecified code happen to allow it)?

Last edited by a/b; 06 November 2021 at 13:02.
a/b is offline  
Old 06 November 2021, 15:35   #25
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by a/b View Post
EDIT: Bigger picture... You most likely went through this but we don't know what else is behind. Why exactly do you have to go from lowest to highest 1? Would it be possible to reverse the algorithm or maybe bit order in input data, and simply use bfffo+bclr+tst (or alternative exit instead of tst if your unspecified code happen to allow it)?
Reversing the bit order or the algorithm would need either to effectively reverse the bits (but we have no bitrev) or counting the '1' to adjust parameters (but we have no popcnt).
meynaf is offline  
Old 06 November 2021, 16:32   #26
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
I think you are having trouble reading the execution time in these examples? 4 branches LUT solution is somehow not OK because it uses 256b of memory, but 32 iteration branches are OK because the code is 20b smaller?

Typically, more memory comes with faster CPU. My example can be adapted to a 64K bytes LUT easily, so now we're down to 2 branches, twice as fast and only two memory word reads.

Quote:
Originally Posted by meynaf View Post
Here 000 is out of question. Not only because "too slow" but also due to so many misaligned memory accesses, incredible high amount of scale factors, etc.
Because there's no 68030 instruction to find the lowest bit set, a loop is necessary. It's loop vs LUT, and LUT can be so many times faster than a loop, that it can win on a many times faster CPU.

If you want to compare them, you can take my 4-branches example and replace each byte lookup with an 8-bit loop, making sure to keep the same early-exit structure.
Photon is offline  
Old 06 November 2021, 18:40   #27
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Photon View Post
Because there's no 68030 instruction to find the lowest bit set, a loop is necessary. It's loop vs LUT, and LUT can be so many times faster than a loop, that it can win on a many times faster CPU.
But there is an instruction to find the highest bit set, so we can just isolate the lowest one with a simple trick (so we have a value with only one bit set) and use that instruction. No loop to find the lowest bit then.
meynaf is offline  
Old 06 November 2021, 20:52   #28
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
This code does that trick for your example input value in OP.

Code:
move.l d0,d1
subq.l #1,d1
and.l d0,d1
eor.l d1,d0
For the purpose of use with bfffo this can be reduced to

Code:
move.l d0,d1
subq.l #1,d1
eor.l d1,d0

Last edited by Photon; 06 November 2021 at 21:48.
Photon is offline  
Old 07 November 2021, 08:53   #29
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
For this Ross' code is fine.
Code:
    move.l d2,d1
    neg.l d1
    and.l d2,d1
meynaf is offline  
Old 08 November 2021, 17:58   #30
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
So you have two answers. Plus all the alternatives.

Your competition/quiz has received many answers. Maybe contribute your answer now? Going on might be perceived as trolling by some.

Last edited by Photon; 08 November 2021 at 18:07.
Photon is offline  
Old 08 November 2021, 19:12   #31
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
What do you mean by contribute my answer ?
If you mean showing my own version, it's there since post #7.
meynaf is offline  
Old 08 November 2021, 19:42   #32
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Oh. So you had no answer. It seemed you found something wrong with all the different suggestions. Well, now you have multiple smallest + fastest answers, so why not incorporate one into your code and thank the author if it helps you?

Normally there's a need for asking, or it's "for" something, and when either has been satisfied the thread can end.
Photon is offline  
Old 09 November 2021, 10:51   #33
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
@Photon: you've had a bad night or what ?

Quote:
Originally Posted by Photon View Post
Oh. So you had no answer.
It's in post #7, i've mentioned this in previous post. So yes i had an answer. That said, i wasn't required to have one.


Quote:
Originally Posted by Photon View Post
It seemed you found something wrong with all the different suggestions.
Nope. The one from Ross is fine.


Quote:
Originally Posted by Photon View Post
Well, now you have multiple smallest + fastest answers, so why not incorporate one into your code and thank the author if it helps you?
I don't have multiple smallest + fastest but one is enough and i took it.
So, already done.


Quote:
Originally Posted by Photon View Post
Normally there's a need for asking, or it's "for" something, and when either has been satisfied the thread can end.
Normally there's also a need for reading before jumping to conclusions.
So yes the thread can end, and should have at post #25. Not my fault it didn't.
Now this post of yours (#32) could be perceived as trolling. I dare to hope it's just a misunderstanding.
meynaf is offline  
Old 12 November 2021, 23:26   #34
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Post #6 is when Ross gave the same answer as I. Either will roll through the bits and supply the decision tree wishlisted into silicon as mentioned in my CSci comments directly related to the problem to be solved.

Post #7 is when you changed topic and "wanted more than you got", alluding to a "simple trick" not shared before in the thread.

You have many simple tricks to pick and choose from now. My last piece of code will allow you to clear the bit.
Photon is offline  
Old 13 November 2021, 12:08   #35
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Why do you insist ? Problem is solved, end of story.
meynaf is offline  
Old 15 December 2021, 15:36   #36
Blueberry
Registered User
 
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 40
If it's for 060, this should be faster than using BFFFO:
Code:
    move.l  d0,d1
    subq.l  #1,d1
    moveq.l #27,d2
    eor.l   d0,d1
    mulu.l  #$78345765,d1
    lsr.l   d2,d1
    move.b  table(pc,d1.l),d1


table:
    dc.b    30,3,12,25,28,10,16,22,18,2,24,9,21,1,20,0
    dc.b    31,4,5,13,6,26,14,7,29,11,27,15,17,23,8,19
There are actually 1024 different multiplication constants that have this property that the top 5 bits of the multiplication result become different for these 32 (power of two minus one) values. I just picked the one I liked best.
Blueberry is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sprites on lowest area of PAL Screen Tigerskunk Coders. Asm / Hardware 6 02 December 2017 20:34
Is WinUAE optimized for lowest possible input lag? Dr.Venom support.WinUAE 22 03 July 2011 02:14
Trade - Blizzard 1230IV + SCSI + Cash for Lowest Spec PPC fitzsteve Swapshop 4 10 December 2010 14:25
Lowest Spec for WinUAE ?? Methanoid support.WinUAE 29 20 March 2008 17:23
pure bit not set? amigalizard support.Apps 4 04 September 2005 05:47

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 21:19.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09063 seconds with 15 queries