English Amiga Board - View Single Post

roondar · 14 November 2018, 15:19

Quote:

Originally Posted by litwr

Code:

                         68000                   68020
.loop
1    sub.l d4,d6         4                       3?
2    bgt.s .xp           13 (10 or 8+4+4=16)     6?
3    add.l a0,d1            
4    add.l a2,d6
.xp
5    sub.l d5,d7         4                       3?
6    bgt.s .yp          13                       6?

7    add.l a1,d2
8    add.l a2,d7
.yp
9    bsr.s setpixel     18                       8?
0    dbf d0,.loop       10                       6?

I didn't really mean to look at this much, but the 68000 cycle counts you show there don't look to be correct to me. For instance, there are no 68000 opcodes with odd cycle counts. I've made an attempt as well, the code you show should have the following cycle counts for the 68000.

Code:

                         68000
.loop
1    sub.l d4,d6         8
2    bgt.s .xp           10 if taken / 8 if not
3    add.l a0,d1         8            
4    add.l a2,d6         8
.xp
5    sub.l d5,d7         8
6    bgt.s .yp           10 if taken / 8 if not

7    add.l a1,d2         8
8    add.l a2,d7         8
.yp
9    bsr.s setpixel      18
0    dbf d0,.loop        10

As you can see, the 68000 is somewhat slower than you originally calculated. I'm also not entirely clear why your cycle count examples (for all processors) don't actually count all instructions.

For that matter, the 68k code looks kind of odd - why are you adding address registers to data registers? I might be wrong here, but I think you mean to do the opposite. On a side note: if putpixel takes x&y coordinates the longword add/sub commands can be optimised into word add/sub commands.

That said, I've not actually looked at the line drawing stuff you discussed much as I find it to be a far to small algorithm to actually be useful to compare stuff accurately. So it might be correct after all.

The 68020 is much harder to 'cycle count' for because the 68020 has a cache which means execution times start to differ depending on the code being inside or outside of the cache (stuff in cache is much faster). More so, code running from the cache can continue to run during memory accesses of prior instructions so it's possible for some opcodes to take '0 cycles' by being run during a memory access. The Motorola manual has an example like this:

Code:

; This example assumes code is running from cache
4 cycles   move.l d4,(a1)+
0 cycles   add.l d4,d6

; This example assumes code is running from memory
4 cycles   mode.l d4,(a1)+
3 cycles   add.l d4,d6

It's actually even more complicated than this (there are quite a few different cases to account for). Personally, for this reason I tend to stay away from cycle counting on processors that utilize cache and internal concurrency (like the 68020) - the results can vary quite a bit depending on the involvement of cache or not.

The 486 actually has similar problems, it also has cache memory and will run code inside the cache considerably faster than code that isn't in the cache. I can't say for certain the 486 also runs opcodes while waiting on memory access or uses internal concurrency, but it probably does have these abilities and thus likewise is fairly complicated to count for. The 386 tended to run without cache as far as I can find.