What's the typical break even point, in bytes, at which the efficiency of CMQ with all these patches pays for its own library call overhead, compared to locally inlined move.l loop with a spot of unrolling? Let's say I'm most interested in 68030 and above for this question and we are dealing with a cacheable source but uncacheable destination (e.g. Local Fast to RTG VRAM memory).
|