A couple of suggestions for CachePreDMA (2 bytes shorter and one less branch):
Code:
CachePreDMA:
MOVEM.L A0/A5,-(SP)
MOVE.L A0,D1
ANDI.L #$FFE00000,D1 ;Chip RAM
BEQ.B lbC000068
LEA (lbC000084,PC),A5 ; moved from below
ANDI.B #$A,D0 ;Continue or ReadFromRam
BNE.B lbC000060
MOVE.L A0,D1
OR.L (A1),D1
ANDI.B #15,D1 ;16 Byte aligned
BEQ.B lbC000060
LEA (lbC000074,PC),A5
; or alternatively (faster on 040, don't know about 060; maybe add.w works better overall):
; LEA (lbC000074-lbC000084,A5),A5
;; BRA.B lbC000064
lbC000060
;; LEA (lbC000084,PC),A5
;;lbC000064
JSR (-$1E,A6) ;Call Supervisor
lbC000068