21 May 2010, 19:50 | #1 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
|
Coders challenge: Memcopy
I have a small and perhaps simple challenge for anyone interested. Assume an asm-function that takes 3 parameters:
A0=Source (fastmem) A1=Destination (fastmem) D0=Bytes It's a simple memory-copy function. D0 can have values $1-$FF. What do you think is the fastest way of doing this? A simple move.b -loop would do the job but I guess it's more efficient to use .w /.l if possible (but how to check for that without wasting precious CPU-time? ) |
21 May 2010, 20:07 | #2 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Code:
CopyMem lsr.w #1,d0 bcc.b .nobyte move.b (a0)+,(a1)+ .nobyte lsr.w #1,d0 bcc.b .noword move.w (a0)+,(a1)+ .noword subq.w #1,d0 bmi.b .exit .copy move.l (a0)+,(a1)+ dbf d0,.copy .exit rts |
21 May 2010, 20:09 | #3 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,515
|
Lacks important information: does it have to work on 68000/68010?
|
21 May 2010, 20:12 | #4 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
21 May 2010, 20:13 | #5 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,515
|
|
21 May 2010, 20:27 | #6 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
21 May 2010, 20:53 | #7 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
|
020/030/060 allowed
|
21 May 2010, 21:20 | #8 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
|
@StingRay:
That's a cute solution. I had an idea about precalculating the entire thing and make a jumptable of some sort. |
21 May 2010, 21:24 | #9 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
Code:
CopyMem lsr.w #1,d0 bcc.b .nobyte move.b (a0)+,(a1)+ .nobyte lsr.w #1,d0 bcc.b .noword move.w (a0)+,(a1)+ .noword subq.w #1,d0 bmi.b .exit moveq #-1,d2 sub.w d0,d2 lea .TAB(pc),a2 jmp (a2,d2.w*2) REPT 256/4 move.l (a0)+,(a1)+ ENDR .TAB .exit rts |
|
21 May 2010, 21:39 | #10 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
|
Hmm... wasn't an unaligned access quite slow on 020+?
I would also check if the source and destination pointer is 32-bit aligned, do a few byte-copies for alignment when required, and then copy the rest with move.l. |
21 May 2010, 21:43 | #11 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Yes, that's the fastest approach. However, since he's only copying max. 255 bytes I didn't bother aligning the pointers as I don't think it'll make much of a difference in this case.
|
21 May 2010, 22:56 | #12 | |
Moderator
Join Date: Jan 2002
Location: France
Posts: 491
|
Quote:
Code:
... .noword neg.w d0 lea .TAB(pc),a2 jmp (a2,d0.w*2) ... |
|
22 May 2010, 00:56 | #13 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
22 May 2010, 06:53 | #14 |
Banned
Join Date: Jan 2007
Location: France
Posts: 655
|
jmp .TAB(PC,d0.w*2) is shorter !
|
22 May 2010, 11:34 | #15 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
22 May 2010, 23:29 | #16 |
Registered User
Join Date: May 2005
Location: Cheshire, UK
Age: 56
Posts: 322
|
What's wrong with Movem if it's not just limited to bytes?
|
23 May 2010, 14:02 | #17 |
Moderator
Join Date: Jan 2002
Location: France
Posts: 491
|
|
23 May 2010, 16:45 | #18 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
movem has set up overhead so it's only good on big copies. It's not any faster for the 68060 and it's slower for the 68040. move16 for the 68040 and 68060 also has setup overhead and is only good for large copies.
An unrolled loop is good on all of the 68k family except 68060 where it's the same speed. However, a blind "calculated" jmp into an unrolled loop like shown above will not gain much if anything on the 68040+ because the branch (jmp) can not be predicted and is slow. Yes, this even applies to the 68040. I don't know about the 68020/68030 but I would expect it to be better for them. It's always good to test as it's not always obvious what is fastest. The 68040+ handles unaligned copies well when in the cache. The 68020/68030 hates unaligned everything. I like to align the data as I go unless the size is guaranteed to be tiny (<16 bytes). It's more important that the destination (writes) are aligned. Something like this... Code:
move.l a1,d1 btst #0,d1 beq.b .daligned2 subq.l #1,d0 addq.l #1,d1 move.b (a0)+,(a1)+ .daligned2: btst #1,d1 beq.b .daligned4 move.w (a0)+,(a1)+ subq.l #2,d0 .daligned4: http://aminet.net/util/boot/CopyMem.lha http://aminet.net/util/boot/CopyMem.readme I might be a little biased though as I wrote it. |
24 May 2010, 10:57 | #19 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Of course I am! Can be easily proved:
input: d0.w: 4 lsr.w #1,d0 ; d0.w = 2 lsr.w #1,d0 ; d0.w = 1 subq.w #1,d0 ; d0.w = 0 neg.w d0 ; -> d0.w = 0 -> nothing will be copied -> bug! Edit: And after I wrote all that I finally noticed that you removed the subq instruction already. Totally missed that! So yes, your optimization is fine! Last edited by StingRay; 24 May 2010 at 11:06. Reason: Edit: facepalm ;D |
24 May 2010, 10:59 | #20 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,515
|
There is no generic and fastest possible Amiga 68K memory copy routine.
If you want fastest possible memory copies, you have to select the one (or multiple ones) that work best with your algorithms. (or modify your algorithm) Is memory aligned? is length aligned? short copy? long copy? CPU type? Even simple memory copy isn't simple (oops, this was already answered previously) |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
coders wanted for shenandoah | starlord | project.Amiga Game Factory | 608 | 28 August 2015 11:32 |
Coders Challenge #2: C2P | oRBIT | Coders. General | 4 | 04 June 2010 18:12 |
Which coders used the dirtiest hacks? | Gnorman | Retrogaming General Discussion | 10 | 17 April 2006 01:46 |
A huge challenge to all the Amiga coders/hackers out there! | JohnnyWalker | project.CARE | 6 | 14 June 2005 22:04 |
Coders Heaven (PC Development) | Feltzkrone | Coders. General | 5 | 15 November 2004 10:08 |
|
|