Point taken, you can go up to 8 long words without too much messing around (if you have that many free registers available) over two instructions: (you could go more than 8 but then setup become trickier)
.. movem.l (a0)+,d0-d6/a3
movem.l d0-d6/a3,(a1)+
dbra d7,..
becomes 3 instruction fetches, 32 data operations, to copy 32 bytes of data per iteration
7.1mhz / 4 cycles/transfer / 35 transfers/iteration * 32 bytes/iteration = 1.682 MB/s
this unfortunately appears to become slower than the prior code on anything greater than a 68000, as it doesn't invoke the more advanced processor's loop mode (which completely removes instruction fetches from the equation)
Last edited by Shadowfire; 23 August 2009 at 07:38.
|