Quote:
Originally Posted by paraj
Aren't 030 and 020 cycle times quite similar (except for minor differences in cache speed)? In that case best case given by 020UM is 25 cycles (maybe 030 can do 24?, but either way negligible difference from worst).
Your snippet should be faster just counting cycles, but the larger code size may negate the benefit (more instruction cache used). Counting cycles is a good guide, but you really want to measure the full loop on more advanced CPUs.
|
I'm not explicitly targeting 020 separately. It looks like 020/030, 040 benefit from this sort of thing. 16 cycles for mulu.w on 040 (not including EA).