English Amiga Board - View Single Post

paraj · 28 November 2022, 12:06

Quote:

Originally Posted by koobo

A bit off topic I suppose, but isn't it more likely that both of these instructions have been prefetched into the 96-byte FIFO buffer, as the previous instructions are short? Not sure tho

Perfectly on topic, and it's bitten me before

A correctly predicted Bcc instruction is free if taken, but discards the instruction stream (1.4.2.1 of 68060UM):

Quote:

If a hit occurs in the branch cache, indicating a branch taken instruction, the current instruction stream is discarded and a new instruction stream is fetched starting at the location indicated by the branch cache.

Just checked by timing loops with 200 iterations and everything in cache, and the original loop takes ~9 cycles (as I expected) and my first version takes ~6. The d7 version also takes 6 (maybe a stall for a2? not sure, and lea doesn't help, but that also increases code size...). However this variation takes it down to 5:

Code:

    add.l   d0,d2
    move.b  d7,(a2)
    move.b  (a1,d4.l),d5
    move.l  d2,d4
    lsr.l   d3,d4
    add.l   d6,a2
    move.b  (a0,d5.l),d7
    subq.l  #1,d1
    bne.b   .loop

But going from 6 to 5 probably isn't going to give a measurable speed up if 9->6 didn't.