English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 21 May 2010, 19:50   #1
oRBIT
Zone Friend
 
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
Coders challenge: Memcopy

I have a small and perhaps simple challenge for anyone interested. Assume an asm-function that takes 3 parameters:
A0=Source (fastmem)
A1=Destination (fastmem)
D0=Bytes
It's a simple memory-copy function. D0 can have values $1-$FF.
What do you think is the fastest way of doing this? A simple move.b -loop would do the job but I guess it's more efficient to use .w /.l if possible (but how to check for that without wasting precious CPU-time? )
oRBIT is offline  
Old 21 May 2010, 20:07   #2
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Code:
CopyMem
	lsr.w	#1,d0
	bcc.b	.nobyte
	move.b	(a0)+,(a1)+
.nobyte	lsr.w	#1,d0
	bcc.b	.noword
	move.w	(a0)+,(a1)+
.noword	subq.w	#1,d0
	bmi.b	.exit
.copy	move.l	(a0)+,(a1)+
	dbf	d0,.copy
.exit	rts
Untested but this code should do what you need.
StingRay is offline  
Old 21 May 2010, 20:09   #3
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
Lacks important information: does it have to work on 68000/68010?
Toni Wilen is offline  
Old 21 May 2010, 20:12   #4
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Toni Wilen View Post
Lacks important information: does it have to work on 68000/68010?
Let me guess, you're thinking of move16?
StingRay is offline  
Old 21 May 2010, 20:13   #5
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
Quote:
Originally Posted by StingRay View Post
Let me guess, you're thinking of move16?
More like possible address error in your solution if it has to work on 68000
Toni Wilen is offline  
Old 21 May 2010, 20:27   #6
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Toni Wilen View Post
More like possible address error in your solution if it has to work on 68000
Indeed. But I really didn't care (or even think ;D) about 68000 compatibility at all.
StingRay is offline  
Old 21 May 2010, 20:53   #7
oRBIT
Zone Friend
 
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
020/030/060 allowed
oRBIT is offline  
Old 21 May 2010, 21:20   #8
oRBIT
Zone Friend
 
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 339
@StingRay:
That's a cute solution.
I had an idea about precalculating the entire thing and make a jumptable of some sort.
oRBIT is offline  
Old 21 May 2010, 21:24   #9
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by oRBIT View Post
@StingRay:
That's a cute solution.
I had an idea about precalculating the entire thing and make a jumptable of some sort.
Funny that you say this, I was about to post another version which looks like this:

Code:
CopyMem
	lsr.w	#1,d0
	bcc.b	.nobyte
	move.b	(a0)+,(a1)+
.nobyte	lsr.w	#1,d0
	bcc.b	.noword
	move.w	(a0)+,(a1)+
.noword	
	subq.w	#1,d0
	bmi.b	.exit

	moveq	#-1,d2
	sub.w	d0,d2
	lea	.TAB(pc),a2
	jmp	(a2,d2.w*2)

	REPT	256/4
	move.l	(a0)+,(a1)+
	ENDR
.TAB
.exit	rts
I'm not sure if this is really faster than the first version version on 020+ though (cache).
StingRay is offline  
Old 21 May 2010, 21:39   #10
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Hmm... wasn't an unaligned access quite slow on 020+?

I would also check if the source and destination pointer is 32-bit aligned, do a few byte-copies for alignment when required, and then copy the rest with move.l.
phx is offline  
Old 21 May 2010, 21:43   #11
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by phx View Post
Hmm... wasn't an unaligned access quite slow on 020+?

I would also check if the source and destination pointer is 32-bit aligned, do a few byte-copies for alignment when required, and then copy the rest with move.l.
Yes, that's the fastest approach. However, since he's only copying max. 255 bytes I didn't bother aligning the pointers as I don't think it'll make much of a difference in this case.
StingRay is offline  
Old 21 May 2010, 22:56   #12
Psygore
Moderator
 
Psygore's Avatar
 
Join Date: Jan 2002
Location: France
Posts: 491
Quote:
Originally Posted by StingRay View Post
Funny that you say this, I was about to post another version which looks like this:

Code:
CopyMem
	lsr.w	#1,d0
	bcc.b	.nobyte
	move.b	(a0)+,(a1)+
.nobyte	lsr.w	#1,d0
	bcc.b	.noword
	move.w	(a0)+,(a1)+
.noword	
	subq.w	#1,d0
	bmi.b	.exit

	moveq	#-1,d2
	sub.w	d0,d2
	lea	.TAB(pc),a2
	jmp	(a2,d2.w*2)

	REPT	256/4
	move.l	(a0)+,(a1)+
	ENDR
.TAB
.exit	rts
I'm not sure if this is really faster than the first version version on 020+ though (cache).
btw the code could be optimized with this:
Code:
...
.noword	
	neg.w	d0
	lea	.TAB(pc),a2
	jmp	(a2,d0.w*2)
...
Psygore is offline  
Old 22 May 2010, 00:56   #13
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Psygore View Post
btw the code could be optimized with this:
Code:
...
.noword    
    neg.w    d0
    lea    .TAB(pc),a2
    jmp    (a2,d0.w*2)
...
And it would be buggy then. not.w d0 would work though.
StingRay is offline  
Old 22 May 2010, 06:53   #14
Cosmos
Banned
 
Join Date: Jan 2007
Location: France
Posts: 655
jmp .TAB(PC,d0.w*2) is shorter !
Cosmos is offline  
Old 22 May 2010, 11:34   #15
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Cosmos View Post
jmp .TAB(PC,d0.w*2) is shorter !
Don't you think there is a reason why I didn't use that instruction? It can't be used here since the "table" is too large. Besides, this is not about size optimizing!
StingRay is offline  
Old 22 May 2010, 23:29   #16
Ray Norrish
Registered User
 
Ray Norrish's Avatar
 
Join Date: May 2005
Location: Cheshire, UK
Age: 56
Posts: 322
What's wrong with Movem if it's not just limited to bytes?
Ray Norrish is offline  
Old 23 May 2010, 14:02   #17
Psygore
Moderator
 
Psygore's Avatar
 
Join Date: Jan 2002
Location: France
Posts: 491
Quote:
Originally Posted by StingRay View Post
And it would be buggy then. not.w d0 would work though.
Are you sure?
Psygore is offline  
Old 23 May 2010, 16:45   #18
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Ray Norrish View Post
What's wrong with Movem if it's not just limited to bytes?
movem has set up overhead so it's only good on big copies. It's not any faster for the 68060 and it's slower for the 68040. move16 for the 68040 and 68060 also has setup overhead and is only good for large copies.

An unrolled loop is good on all of the 68k family except 68060 where it's the same speed. However, a blind "calculated" jmp into an unrolled loop like shown above will not gain much if anything on the 68040+ because the branch (jmp) can not be predicted and is slow. Yes, this even applies to the 68040. I don't know about the 68020/68030 but I would expect it to be better for them. It's always good to test as it's not always obvious what is fastest. The 68040+ handles unaligned copies well when in the cache. The 68020/68030 hates unaligned everything. I like to align the data as I go unless the size is guaranteed to be tiny (<16 bytes). It's more important that the destination (writes) are aligned. Something like this...

Code:
   move.l a1,d1
   btst #0,d1
   beq.b .daligned2
   subq.l #1,d0
   addq.l #1,d1
   move.b (a0)+,(a1)+
.daligned2:
   btst #1,d1
   beq.b .daligned4
   move.w (a0)+,(a1)+
   subq.l #2,d0
.daligned4:
The fastest algorithm is going to vary depending on size, cpu, and memory location. If the size can be bigger than 16 bytes, is not static, and will run on different 68k processors then I would recommend installing a fast cpu specific patch for copying memory into exec.library/CopyMem() and using it. I would recommend CopyMem for the 68040 and 68060 as it's the fastest at small copies...

http://aminet.net/util/boot/CopyMem.lha
http://aminet.net/util/boot/CopyMem.readme

I might be a little biased though as I wrote it.
matthey is offline  
Old 24 May 2010, 10:57   #19
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Psygore View Post
Are you sure?
Of course I am! Can be easily proved:

input: d0.w: 4

lsr.w #1,d0 ; d0.w = 2
lsr.w #1,d0 ; d0.w = 1
subq.w #1,d0 ; d0.w = 0

neg.w d0 ; -> d0.w = 0 -> nothing will be copied -> bug!

Edit: And after I wrote all that I finally noticed that you removed the subq instruction already. Totally missed that! So yes, your optimization is fine!

Last edited by StingRay; 24 May 2010 at 11:06. Reason: Edit: facepalm ;D
StingRay is offline  
Old 24 May 2010, 10:59   #20
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
There is no generic and fastest possible Amiga 68K memory copy routine.

If you want fastest possible memory copies, you have to select the one (or multiple ones) that work best with your algorithms. (or modify your algorithm)

Is memory aligned? is length aligned? short copy? long copy? CPU type? Even simple memory copy isn't simple

(oops, this was already answered previously)
Toni Wilen is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
coders wanted for shenandoah starlord project.Amiga Game Factory 608 28 August 2015 11:32
Coders Challenge #2: C2P oRBIT Coders. General 4 04 June 2010 18:12
Which coders used the dirtiest hacks? Gnorman Retrogaming General Discussion 10 17 April 2006 01:46
A huge challenge to all the Amiga coders/hackers out there! JohnnyWalker project.CARE 6 14 June 2005 22:04
Coders Heaven (PC Development) Feltzkrone Coders. General 5 15 November 2004 10:08

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:48.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09709 seconds with 13 queries