27 November 2022, 11:45 | #1 |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Advice to make real-time Bob X-Flipping faster
Hi folks!
I've had to start using a real-time sprite/bob flipping routine to save memory, and it works perfectly, but it's just dog-slow at the moment, so I'm looking for some advice on how to maybe speed it up. The Bobs are all made up of horizontal slices (see image below) which are 16 pixels tall and a varying width, and this routine flips the slices using a lookup table into an Image and Mask buffer which are then blitted to the screen (it sits in part of a wider drawing routine). All the information for source/screen positions/modulos etc. have been loaded in an earlier part of the drawing routine from a "sprite table" of pre-calculated values. I've done two version of this so far - one which flips all the words sequentially in each bitplane before moving on to the next, and then this one which does one word then the next bitplane of that word and so on (essentially doing it a "tile" at a time). Both are around the same speed. I'm kind of shocked that there's no Blitter function to do this as it seems really cumbersome to have to do such a commonly occurring graphical task with the CPU. Regardless, any advice on how to maybe speed this up would be most welcome, as I'm out of ideas. Many thanks! Code:
;------------------------------------------------------------------------------ ; XFLIP ;------------------------------------------------------------------------------ ; a1 = SLICE MASK ADDRESS ; a2 = SLICE IMAGE ADDRESS ; a4 = LOOKUP TABLE ;--------------------------------------- ; d3/d2 = Width counter ; d5 = BLITSIZE (Second BYTE is WORD-WIDTH+1) ; d6 = Lines counter ;--------------------------------------- if 1=1 DRAWBOB_XFLIP: ;--------------------------------------- btst #0,OBJ_FLAGS(a0) ; Test XFLIP flag: 0 = Face LEFT | 1 = Face RIGHT bne.w DRAWBOB_CLIP ;--------------------------------------- movem.l d2-a0/a3-a6,-(sp) ; Back Up Registers ;--------------------------------------- lea XFLIP_TABLE,a4 ; Lookup Table (65K) lea XFLIP_BUFFER,a5 ; Flipped Image Buffer lea XFLIP_MASK_BUFFER,a6 ; Flipped Mask Buffer ;--------------------------------------- FLIP_WIDTH: move.b d5,d3 ; Get WIDTH Counter sub.w #2,d3 ; Sub 2 (Don't need masked word & 0 for counter) move.w d3,d2 ; Backup add.w d3,d3 ; Double for offset value ;--------------------------------------- adda.w d3,a5 ; Add to Image Flip Buffer adda.w d3,a6 ; Add to Mask Flip Buffer move.w d2,d3 ; Refresh WIDTH to word value ;--------------------------------------- move.l a5,a0 ; Back up Image Flip Starting Pos move.l a6,a3 ; Back up MASK Flip Starting Pos move.l a1,d5 ; Back up Image Starting Pos move.l a2,d7 ; Back up MASK Starting Pos ;-------------------------------------- move.w #16-1,d6 ; LINE Counter (Always 16) ;-------------------------------------- macro FLIPIMAGE ;-------------------------------------- move.w (a2),d0 add.l d0,d0 move.w (a4,d0.l),(a5) adda.w #40,a2 adda.w #40,a5 moveq #0,d0 ;-------------------------------------- endm ;-------------------------------------- macro FLIPMASK ;-------------------------------------- move.w (a1),d0 add.l d0,d0 move.w (a4,d0.l),(a6) adda.w #40,a1 adda.w #40,a6 moveq #0,d0 ;-------------------------------------- endm BOB_FLIPLOOP: ;--------------------------------------- ; FLIP IMAGE WORDS ON ALL 4 BITPLANES ;--------------------------------------- FLIPIMAGE ; Bitplane 1 FLIPIMAGE ; Bitplane 2 FLIPIMAGE ; Bitplane 3 FLIPIMAGE ; Bitplane 4 ;--------------------------------------- ; FLIP MASK WORDS ON ALL 4 BITPLANES ;--------------------------------------- FLIPMASK ; Bitplane 1 FLIPMASK ; Bitplane 2 FLIPMASK ; Bitplane 3 FLIPMASK ; Bitplane 4 ;--------------------------------------- dbf d6,BOB_FLIPLOOP move.w #16-1,d6 ; Refresh Line counter ;--------------------------------------- move.l a0,a5 ; Refresh Positions move.l a3,a6 move.l d5,a1 move.l d7,a2 adda.w #2,a1 ; Apply next WORD to be flipped adda.w #2,a2 suba.w #2,a5 suba.w #2,a6 move.l a5,a0 ; Store new Positions move.l a6,a3 move.l a1,d5 move.l a2,d7 ;--------------------------------------- dbf d3,BOB_FLIPLOOP ; Dec slice width Counter ;--------------------------------------- ; FLIPPING DONE ; - LOAD REGISTERS FOR BLITTING ;--------------------------------------- lea XFLIP_MASK_BUFFER,a1 lea XFLIP_BUFFER,a2 ;--------------------------------------- movem.l (sp)+,d2-a0/a3-a6 ; Restore Registers ;--------------------------------------- endif |
27 November 2022, 15:53 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Lets start with a few simple micro optimizations:
Code:
; adda.w #40,ax lea (40,ax),ax ; y = 1 to 8 ; adda.w #y,ax addq.w #y,ax ; suba.w #y,ax subq.w #y,ax ; y = -128 to 127 ; move.w #y,dx moveq #y,dx Code:
move.w (a4,d0.l),(<OFFSET>*40,a5) ... move.w (a4,d0.l),(<OFFSET>*40,a6) Code:
init: move.l #XFLIP_TABLE/2,d0 loop: ; move.w (a2),d0 ; add.l d0,d0 ; move.w (a4,d0.l),(a5) move.w (a2),d0 move.l d0,a0 ; +4 add.l a0,a0 move.w (a0),(a5) ; -6 |
27 November 2022, 17:41 | #3 |
Registered User
Join Date: May 2018
Location: Ireland
Posts: 672
|
Would a LUT of words/bytes help? Where direction A value is an offset of a base address and the value stored there is the opposite direction B? LUT woyld be 255 in length for bytes, 64k for words, or would that be too slow?(very long time since I did 68k assembly so realise my post might be useless lol)
|
27 November 2022, 17:49 | #4 | |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Quote:
I'll give them a try, and definitely unroll that dbf (I tried it with the Macros and it did feel a bit smoother). |
|
27 November 2022, 17:49 | #5 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
I guess there's a small typo: I was wrong, no typo, see a/b's post below.
EDIT: And those adda.w #40 eat up quite some cycles probably... could you organize your data/code differently to avoid that? Like doing mask and image one after the other, that should let you use a separate address register for every plane. I'm not sure if I understand your code correctly, but that part looks a bit wasteful to me. PS: The absence of an easy blitter flip is annoying, true. You can use the blitter to flip, but it takes four passes (AB->D type) + then drawing the BOB, so it's probably in most cases slower than using the table approach. Last edited by chb; 27 November 2022 at 19:06. |
27 November 2022, 18:51 | #6 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
|
27 November 2022, 19:05 | #7 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
|
27 November 2022, 19:13 | #8 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 387
|
Are these bob arranged for interleaved blitting and if so then are those four masks all the same?
If they are the same then you only need to flip the first one and copy that result to the others. I wonder if it'd be less memory overhead to blit for each plane separately so you only need to store one copy of the mask. Then you might not need to do all this flipping, or at least do it for fewer bobs. |
27 November 2022, 19:19 | #9 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 387
|
If you have the registers to spare then it might be worth loading one up with #40 and replacing:
Code:
lea (40,a0),a0 Code:
add.w d0,a0 |
27 November 2022, 20:02 | #10 | |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Thanks for all the suggestions folks. This is great stuff!
Quote:
|
|
27 November 2022, 20:12 | #11 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
|
A common trick to speed up X-flipping of bobs is to store bobs with 1/2 of the lines pointing to the left and 1/2 pointing to the right. When blitting, you blit the 1/2 that is pointing in the correct direction as normal and only flip the other 1/2 (well, you output the flipped result, not actually flip the bob data).
In your case that would mean storing each slice with 1/2 the lines pointing to the right and 1/2 to the left |
28 November 2022, 10:29 | #12 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
|
28 November 2022, 11:34 | #13 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
|
Yup, so you have no 'cheap' vs 'expensive' frames to consider
|
28 November 2022, 15:17 | #14 | |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Quote:
I've actuality got an old routine which flips the slices in the sheets anyway (when I still stored both left and right versions), so I could easily modify it to just do half of each slice. I never actually thought of trying it myself, so thanks for the suggestion. |
|
28 November 2022, 21:06 | #15 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
I think there was already a thread and the fastest was a 32K words table + roll a carry bit around?
Gist: Code:
add.w d0,d0 move.w (a0,d0.w),d0 addx d0,d0 |
29 November 2022, 02:39 | #16 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
If you could serialize the reads and writes with movem, then it would be faster than what I posted above (which wouldn't work because movem would wipe the upper words). Otherwise, it's 2+4 cycles slower (4: it can't do a table read + output write in a single move, it needs an extra opcode) and has one more memory access.
But it has advantages if you have memory constraints. And it's a nice trick overall. |
30 November 2022, 16:10 | #17 |
Registered User
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
|
Did not read the entire thread, but just in case no one has posted it before: There was quite an interesting conversation about tile flipping going on here:
https://eab.abime.net/showthread.php...5555555&page=2 Looks like there is no such thing as a single best code. Using a table-based or logical-based approach has both their pro's and con's. |
30 November 2022, 17:02 | #18 |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Thanks for all the help folks!
I've been working my way through the suggestions and the flipping is noticeably faster now. Still got a bit to go, but I learned quite a few tricks and tips from this great thread. |
02 December 2022, 10:50 | #19 |
Prototron
Join Date: Mar 2015
Location: Glasgow, Scotland
Posts: 411
|
Just a little additional question while this thread is still relatively warm:
I'm trying to run this instruction: move.w (a4,d0.l*2),(a5) But the assembler (VASM) is complaining that the d0.l*2 isn't supported, however I checked and it works in Asm-One. I'm hoping it's just a case of needing a new module or something, because I think this would save all the (many) lines of "add.l d0,d0" that I currently have. I've tried a few different exes from a download pack (vasmm68k_madmac/vasmm68k_mod/vasmm68k_std) but none resolve the issue as I don't really know what the versions do. I'm quite unclear about VASM as I find the documentation very vague, so any advice on what to do/install would be great. Thanks! |
02 December 2022, 11:06 | #20 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
That's a 68020+ addressing mode: if you aren't targeting 68000/68010 as well, you need to tell the assembler (for example with a directive or a command line switch) that the code is for (at least) 68020.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
WANTED - A1000 Real time clock "A-time" | loggio | MarketPlace | 0 | 21 August 2020 04:40 |
Will a faster CPU make the blitter obsolete? | olleharstedt | Coders. General | 12 | 21 April 2020 23:57 |
Make Window Refresh Faster? | AGS | Coders. System | 4 | 06 January 2014 17:05 |
Anything to make A600 IDE go faster? | Photon | support.Hardware | 6 | 18 October 2009 18:31 |
Can I make WinUAE faster? (loading time and such) | EssKung | support.WinUAE | 15 | 29 May 2007 11:59 |
|
|