Quote:
Originally Posted by pmc
Code:
lea vals(pc),An
movem.w (An),Dn-Dn
gets you the ext.l's for free
|
Indeed it does, and
Code:
movem.w Vals(PC),d0-d1
saves a further 4 cycles. (Yeye I know it's obvious.)
Mainly replied to say that movem has an overhead which makes it break even at a count of 3 registers. Here, 2 are faster only because of the desired sign extends.
The instructions take the cycles they take, and there's no instruction reorder optimizations on 68000 apart from the prefetch after the write to BLTSIZ and the hard-to-know odd-cycle alignment wait of instructions that take 6/10/14 etc cycles.
A simple one for when you have a loop loading registers from memory is to backup, then pre-poke a magic exitvalue (such as say, zero or negative) instead of checking end-address or loopctr/DBF. Since you're loading the registers anyway, a simple bmi.s Done instead of dbf Dn,KeepOn saves 2 cycles.
The same is true for other branches inside loops; you may save 4 cycles for 50% of the branches if all branches jump outside the loop. More, if there is a bias toward either true or false.
Optimized an unrolled loop yesterday from 64 cycles to 51.75 cycles average