View Single Post
Old 06 November 2021, 16:32   #26
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
I think you are having trouble reading the execution time in these examples? 4 branches LUT solution is somehow not OK because it uses 256b of memory, but 32 iteration branches are OK because the code is 20b smaller?

Typically, more memory comes with faster CPU. My example can be adapted to a 64K bytes LUT easily, so now we're down to 2 branches, twice as fast and only two memory word reads.

Quote:
Originally Posted by meynaf View Post
Here 000 is out of question. Not only because "too slow" but also due to so many misaligned memory accesses, incredible high amount of scale factors, etc.
Because there's no 68030 instruction to find the lowest bit set, a loop is necessary. It's loop vs LUT, and LUT can be so many times faster than a loop, that it can win on a many times faster CPU.

If you want to compare them, you can take my 4-branches example and replace each byte lookup with an 8-bit loop, making sure to keep the same early-exit structure.
Photon is offline  
 
Page generated in 0.04263 seconds with 11 queries