Proper cache usage versus unrolled loop
OK, so got a game that really needs one particular routine to run as fast as possible.
I've not had much to do with cpu caches on 68020, but thinking this is where they are ideal for a tight loop.
The loop I have repeats 934 times, its a screen conversion routine. The routine easily fits in the 256 byte cache of 020.
Unrolled, the routine is a massive 44K in size!!!
There is no MULU's or DIVS involved, its majority movem.l with a couple of add.l to address registers, looping over and over until the screen is converted.
So can anyone confirm that a tight loop that repeats 934 times with caches enabled is going to be quicker than unrolling it?
Fast ram is NOT an option here, i'm loathe to be doing an ST conversion that requires 68020 in the first place, but this particular game and the way it is written would essentially need a complete rewrite for a 68000 Amiga, and its not worth the effort and is a curiosity more than anything else, IF I was prepared to get it to work properly on 68000, i'd rather put the time into doing my own game on 68000 instead.
|