30 September 2011, 20:43 | #1 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,767
|
Layered tile engine optimizing.
Hi,
I've been working on my Advance Wars 2 conversion, and have written a new layered tile engine (old one sucked) that needs to run as fast as possible Basically the lowest target is an A1200 with some trapdoor fastmem (2 MB?) and an HD. The question is: Can the code below be optimized further? I think it's fast enough already (haven't tested it yet), but some input on the subject would be greatly appreciated The code should be optimized for 68020s and 68030s, anything above will run this fast enough anyway. The code simply reads four layers of 16x16 pixel bitmaps with masks (except the first layer). The masks are interleaved into the bitmap data so you can read 32 mask bits and after that 32 tile bits (the routine reads two lines of 16 pixels at a time). This is done twice, after which there is a simple transpose (from Kalms' c2p), the two longwords are then written to chipmem. Note that this routine does not handle movement of individual sprites, everything is simply 16x16 pixel aligned. The required frame rate is about 6 or 7 frames per second (I'll write other code for things that require super smoothness). If anyone can see a way to do it better, then let's hear it! Any questions? Please ask, and sorry about the lack of comments Code:
update movem.l d0-a6,-(sp) subq.l #12,sp move.l gfx_bank_table,-(sp) move.l screen_map,-(sp) move.l #10240-16*4,d3 ; may be wrong, check move.l #160,-(sp) .loopz move.l 4(sp),a5 move.l 8(sp),a4 move.l (a4)+,a0 add.l (a5)+,a0 move.l (a4)+,a1 add.l (a5)+,a1 move.l (a4)+,a2 add.l (a5)+,a2 move.l (a4)+,d2 add.l (a5)+,d2 move.l (a4)+,a3 add.l (a5)+,a3 move.l (a4)+,d0 add.l (a5)+,d0 move.l (a4)+,d1 add.l (a5)+,d1 move.l (a4)+,d5 add.l (a5)+,d5 move.l d0,a4 move.l a5,4(sp) move.l d1,a5 moveq #8-1,d6 .loopy moveq #8-1,d7 .loopx move.l (a0)+,d0 and.l (a1)+,d0 or.l (a1)+,d0 and.l (a2)+,d0 or.l (a2)+,d0 exg d2,a2 and.l (a2)+,d0 or.l (a2)+,d0 move.l (a3)+,d1 and.l (a4)+,d1 or.l (a4)+,d1 and.l (a5)+,d1 or.l (a5)+,d1 exg d5,a5 and.l (a5)+,d1 or.l (a5)+,d1 swap d1 eor.w d0,d1 eor.w d1,d0 move.l d0,(a6) add.l d4,a6 exg d2,a2 eor.w d0,d1 swap d1 move.l d1,(a6) add.l d4,a6 exg d5,a5 .nextx dbra d7,.loopx add.l d3,a6 .nexty dbra d6,.loopy sub.l #81920+40*16-4,a6 ; may be wrong, check .nextz move.l (sp),d0 subq.l #1,d0 move.l d0,(sp) bne .loopz .exit add.l #12,sp movem.l (sp)+,d0-a6 rts Last edited by Thorham; 01 October 2011 at 00:49. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Optimizing WHDLoad config for 040/060 | 8bitbubsy | project.WHDLoad | 1 | 03 November 2011 22:37 |
Optimizing question: instruction order | TheDarkCoder | Coders. Asm / Hardware | 9 | 29 October 2011 17:07 |
Benching and optimizing CF-IDE speed | Photon | support.Hardware | 12 | 15 July 2009 01:48 |
For people who like optimizing 680x0 code. | Thorham | Coders. General | 5 | 28 May 2008 11:48 |
Tile map sample | Blip | Coders. General | 1 | 18 July 2007 13:53 |
|
|