![]() |
![]() |
#1 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,408
|
C++ to Assembler conversion (speedup) memory copy hack
Hi,
In this game-port that I'm working on there is some code that copies dirty rectangles to a chunky buffer and sometimes to update the entire screen (it's not my code BTW). Can anyone give me a faster assembly based version or any general speed-up comments (for plain C), it's for 030+ and AGA only 320x200 8 bit display. static byte *backBuffer; backBuffer = (byte*)AllocMem(64000, MEMF_FAST); void updateBackBuffer(byte *src, int x, int y, int w, int h) { byte *dst; dst = (byte*)backBuffer + y*320 + x; do { CopyMem(src, dst, w); dst += 320; src += 320; } while (--h); } |
![]() |
![]() |
#2 |
Registered User
Join Date: Mar 2009
Location: UK
Posts: 457
|
does CopyMem uses the blitter for the copy?
The CopyMem copy the screen horizontal line by hor line it seems. The secret to speed up this is to know what CopyMem does and can do. (example uses the blitter). Again, it's just guesses from my part, as i never coded for the Amiga. ![]() |
![]() |
![]() |
#3 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
|
![]() |
![]() |
#4 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
Code:
; a0: source ; a1: dest updateBackBuffer_full move.w #320*200/4-1,d0 .loop move.l (a0)+,(a1)+ dbf d0,.loop rts |
|
![]() |
![]() |
#5 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,408
|
I hoped that you could do something clever with the memory pointers but maybe it's not possible.
Someone else came up with this, but I'm not sure if it would actually be any faster: Code:
src = src + y1 * width + x1; dst = dst + y2 * width + x2; for (i = 0; i < copyheight; i++) { CopyMem(src, dst, copywidth); src += width; dst += width; } Code:
if(x == 0 & y == 0) { CopyMemQuick(src, backBuffer , w*h); } else { src = src + y1 * width + x1; dst = dst + y2 * width + x2; for (i = 0; i < copyheight; i++) { CopyMem(src, dst, copywidth); src += width; dst += width; } } |
![]() |
![]() |
#6 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
The first version will not be any faster since the innerloop (i.e. time consuming part) is exactly the same. Your version is fine (if you remove the bug that is :P), you should better spend time optimising other parts of the game anyway IMHO.
![]() Last edited by StingRay; 19 January 2010 at 09:57. Reason: some corrections |
![]() |
![]() |
#7 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Have you looked at CopyMem to see if it's worth optimizing or not, i.e. if it's anything more clever than a plain for-loop or the equivalent byte copy loop in assembly?
This is a generic byte copy loop to compare execution speed with: Code:
; A0 = source A1 = destination ; D0 = x D1 = y ; D2 = width D3 = height updateBackBuffer mulu.w #320, d1 add.l d1, a1 add.w d0, a1 move.w #320, d1 sub.w d2, d1 subq.w #1, d2 subq.w #1, d3 .nextrow move.w d2, d0 .copy move.b (a0)+, (a1)+ dbf d0, .copy add.w d1, a0 add.w d1, a1 dbf d3, .nextrow rts |
![]() |
![]() |
#8 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
AFAIR CopyMem has several routines and copies words/longwords when possible etc. Thus I still do not think it is worth to spend much time optimising the copyloop, there should be LOTS of other things that can (and should) be optimised in such game.
|
![]() |
![]() |
#9 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Yeah, first thing that comes to mind is to convert all graphics to planar and ditch the whole C2P thing. Maybe it's a lot of work but the speed improvement would probably be worth it.
Since no chunky buffer is immediately visible on the screen, can't you just keep a single buffer and restore changes, draw new graphics and merge the dirty rectangles? Why do you need both a front and a back chunky buffer? Also, the AmigaOS memory allocation functions can be real performance killers, so never allocate memory repeatedly when drawing or updating unless it's absolutely necessary. |
![]() |
![]() |
#10 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,408
|
|
![]() |
![]() |
#11 | |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,408
|
Quote:
![]() |
|
![]() |
![]() |
#12 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
![]() |
![]() |
#13 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
|
Doing simple things with the proper method is hacking now? That means I'm finally a hacker!!
![]() Align both buffers a0,a1 to an even address. Pseudocode: save stack pointer REPT copyareabytesize/4/14 movem.l (a0),d0-d7/a2-a7 movem.l d0-d7/a2-a7,(a1)+ lea 14*4(a0),a0 ENDR restore stack pointer copy the last few words with any method you like If source and destination are within 64K of each other (unlikely...) you can use the a0 register as well for a small gain, replace 14 with 15 in the code. You can also remove the lea line and replace (a0) with a calculated offset(a0) if you have a capable assembler. |
![]() |
![]() |
#14 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
![]() |
![]() |
#15 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
|
Drunkard revision.
Set a0 and a1 to end of buffers save stack pointer REPT copyareabytesize/4/14 lea -14*4(a0),a0 movem.l (a0),d0-d7/a2-a7 movem.l d0-d7/a2-a7,-(a1) ENDR restore stack pointer copy the last few words with any method you like beats move.l (a0)+,(a1)+ and dbf anyway. Muhaha. |
![]() |
![]() |
#16 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
![]() |
![]() |
#17 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
|
He can use as few registers as he wants ofc, that's why I explained the 14/15 factor.
![]() |
![]() |
![]() |
#18 | |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Quote:
![]() BTW what happened to this memory copy thing? Was the CopyMem routine optimized? The asm replacement for updateBackBuffer I wrote is slow and generic, and if the original C function calling CopyMem is faster then there's probably no performance gain to be found in the updateBackBuffer function. There are 2 errors in the if(x == 0 & y == 0) line: the single & is for bitwise while the double && is the one for boolean, however it will still work in this very case since a true expression will return a non zero value, and bitwise and of two same non zero values will again result in the same non zero value which in turn will evaluate as true. Also you probably intended to check if the width was 320, that's when there's no gap between the end of one line and the beginning of the next, and it can all be thought of and copied as a contiguous block of bytes. |
|
![]() |
![]() |
#19 | |
Moderator
Join Date: Nov 2001
Location: Germany
Posts: 873
|
Quote:
in general for performance the 'for' loop is most suitable for c compilers, e.g. Code:
void updateBackBuffer(byte *src, int x, int y, int w, int h) { int i,j; byte dst*; dst = (byte*) backbuffer + y*320 + x; j = 320 - w; for (; h--, dst += j, src += j; h>0) for (i=w; i--; i>0) *dst++ = *src++; } |
|
![]() |
![]() |
#20 | |||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
Quote:
|
|||
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A2091ToFast: Even more A2091/A590 speedup possible! | SpeedGeek | Coders. System | 8 | 24 July 2015 14:47 |
Requester Bug when copying IPF to Standard ADF with X-Copy/Power Copy. | BarryB | support.WinUAE | 9 | 17 January 2012 20:20 |
1Mb CHIP RAM hack and extra memory | orange | Hardware mods | 3 | 29 June 2010 13:18 |
DMA memory to memory copy | BlueAchenar | Coders. General | 14 | 22 January 2009 23:29 |
|
|