Thanks, but it's not that easy
Quote:
Originally Posted by a/b
- lea (ham8.colorTable.w,pc,d6.w*8),a3 is out of 8-bit range
|
That one might be a bit of a problem, because the table is 32kb.
Quote:
Originally Posted by a/b
- and.b #$fc,dx as fast as and.b d6,dx? if so, moveq #-4,d6 not needed
|
On 68020/30 AND immediate same speed as AND register + moveq.
Quote:
Originally Posted by a/b
- (-6,a3)/(-4,a3)/(-2,a3) faster than subq.l #6,a3 and 3x(a3)+? (postinc as fast as indirect displacement)
|
Auto decrement is 1 cycle slower than auto increment (really). Furthermore, you have to move the write to memory to a place where nothing gets pipelined (the dbra gets partially pipelined now).
Quote:
Originally Posted by a/b
Branching looks OK to me (G's weight is 3 so it makes sense to assume branch not taken when comparing d4 with d3/d5).
|
G's case should happen the most often, so it's done first (if that's what you mean).
Quote:
Originally Posted by a/b
Code:
; move.w #512-1,-(sp)
...
; move.w #640-1,d7
move.l #(512-1)<<16+(640-1),d7
dbf d7,.loopx
..
; subq.w #1,(sp)
sub.l #(2<<16)-640,d7
bge .loopy
; addq.l #2,sp
|
That's very interesting, thanks.