View Single Post
Old 28 November 2022, 12:06   #6
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
Quote:
Originally Posted by koobo View Post
A bit off topic I suppose, but isn't it more likely that both of these instructions have been prefetched into the 96-byte FIFO buffer, as the previous instructions are short? Not sure tho
Perfectly on topic, and it's bitten me before A correctly predicted Bcc instruction is free if taken, but discards the instruction stream (1.4.2.1 of 68060UM):
Quote:
If a hit occurs in the branch cache, indicating a branch taken instruction, the current instruction stream is discarded and a new instruction stream is fetched starting at the location indicated by the branch cache.
Just checked by timing loops with 200 iterations and everything in cache, and the original loop takes ~9 cycles (as I expected) and my first version takes ~6. The d7 version also takes 6 (maybe a stall for a2? not sure, and lea doesn't help, but that also increases code size...). However this variation takes it down to 5:
Code:
    add.l   d0,d2
    move.b  d7,(a2)
    move.b  (a1,d4.l),d5
    move.l  d2,d4
    lsr.l   d3,d4
    add.l   d6,a2
    move.b  (a0,d5.l),d7
    subq.l  #1,d1
    bne.b   .loop
But going from 6 to 5 probably isn't going to give a measurable speed up if 9->6 didn't.

Last edited by paraj; 03 December 2022 at 12:28. Reason: Strike out wrong info
paraj is offline  
 
Page generated in 0.04370 seconds with 11 queries