English Amiga Board - View Single Post

chb · 03 December 2016, 20:35

Quote:

Originally Posted by Master484

Yes, inline assembly can be freely added to Blitz code, and this is most likely the only way to mirror GFX in real time. But I don't know anything about assembler coding. And also such a routine would need to be compatible with the Blitz "Shape" system also, and I don't know how complicated that would be.

So it can be done, but only with ASM, and even then I don't know if it would be possible to keep a game like this running at 50 FPS with constant mirroring of two 32 color large (and multi-part) BOBs.

Mirroring in asm is best done using a flip table, which just contains for every value the flipped one of it. You have two possibilities: byte- or word-based. A byte table is 256 bytes, a word word table 65536 words = 128k. The tables should reside in fast ram. Word based is around 2x faster than byte based.

For sprites the code is quite easy, as they are in 16 bit wide stripes, so it's only necessary to flip the words themselves, not their order, as you can do this by changing the sprite positions:

Code:

; this flips a 16 bit word in memory using byte table
; a0 - source, a1 - dest, a5 - flip table (byte based)
    move.b (a0)+,d6
    move.b (a0)+,d7 
    move.b (a5,d7.w),(a1)+
    move.b (a5,d6.w),(a1)+
; this flips a 16 bit word in memory using word table
; a0 - source, a1 - dest, a5 - flip table (word based)
    move.w (a0)+,d6
    move.w (a5,d6.w),(a1)+

The byte based version is 52 cpu cycles, word based 26 (both at full speed cpu memory access).
The byte based version can be written to perform only one write access, which can speed things up if chipram bandwidth is low (lots of bitplane/blitter DMA).
It uses two 256-entry word tables where the flipped byte is in the high/low byte, e.g. $XX00 and $00XX, as the 68000 has no byte-swap instruction (rotation is slow).

Code:

; this flips a 16 bit word in memory using byte table
; a0 - source, a1 - dest, a5 - flip table (low byte), a6 - flip table (high byte)
      moveq #0,d7                       ; +4 cycl
      move.b (a0)+,d7                  ; +8 cycl
      add.w d7,d7                         ; (table is word size), +4 cycl
      move.w (a5,d7.w),d6            ; +14 cycl
      moveq #0,d7                       ; +4 cycl
      move.b (a0)+,d7                  ; +8 cycl
      add.w d7,d7                         ; (table is word size), +4 cycl
      or.w (a6,d7.w),d6                 ; +14 cycl
      move.w d6,(a1)+                  ; +8 cycl

It is a bit slower however (68 cycles), when mem speed is unrestricted.

For bobs wider 16 pix you need to change the word order, too. For a 32 pix BOB e.g. (a,b,c,d,e,f,g,h) is getting (b*,a*,d*,c*,f*,e*,h*,g*), where each letter is a word and * indicates the flipped word. For 64 pix it is (d*,c*,b*,a*,h*,g*,f*,e*).

A 86*86 6-plane-BOB (5 planes +mask) would take ~144k cpu-cycles per flip, or 7.2M per second at 50Hz - that's slightly more than the cpu in the a500 has available. With a word-table half of that. That's quite steep, but do you need all animation in 50Hz? One could e.g. use only the basic moves at 50Hz (and store them flipped), while having the special moves in fast ram at 25Hz unflipped.

Another, slightly unconventional approach would be using Stephane Dallongeville's lz4w decompressor, which is just amazingly fast - it is in fact slighty faster than the word-table-based flip approach, and the code is much smaller (8kb) than the word table (128kb). So you'd have compressed frames for both directions in memory, as long as compression ratio is around 2:1, it's more memory effective and faster. The animation data should compress reasonable well. BTW, you'd need to compress every frame on it's own, as you can't stop the decompression after n bytes easily. So no inter-frame compression.