For some work on a new prod I'm doing I need to decode lores EHB ILBM files. I only need to decode this type of ILBM, not ILBMs in general so I've written a decoder for just that purpose and it works perfectly fine.
The two main loops required are one to pull out the colour data and another to decode the RLE graphics data.
Here's my colour extraction loop:
.put_colours: move.b (a0)+,d0
and here's my RLE decoder loop:
.next_row: moveq.l #screen_bpls-1,d6
.crntrow_allbpls: moveq.l #screen_wd,d4
.rle_decode: moveq.l #0,d7
.copy: move.b (a0)+,(a3)+
.replicate: neg.b d7
.do_replicate: move.b (a0),(a3)+
.next_bpl: subq.b #1,d4
Now, while this works fine and doesn't take too long, I'd like to be certain I'm doing the ILBM decode as a whole as fast as possible.
So, my question is - is there any way the above routines could be optimised further than I already have or, alternatively, a completely different approach altogether which I've missed?
By the way, I should mention that I'm coding specifically for the 68000 processor and not 68020+
EDIT: in the RLE decoder, possibly the copy and replicate loops could be speeded up by determining the number of bytes in the current copy or replicate operation and moving words or longwords instead of bytes when possible...? The speedup of this would need to be traded off against how long it would take the code doing the decision logic for that to run of course...