English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 19 April 2014, 18:18   #1
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 253
fastest possible rom copy loop

I needed a small rom copy routine to toy around with, so I made a small piece of code which seems to be working. Can this loop be improved to use less cycles?

lea.l $f80000,a0
lea.l $e00000,a1

Loop:
move.l (a0),(a1)
add.l #$4,a0
add.l #$4,a1
cmpi.l #$e80000,a1
bne Loop

thanks in advance for suggestions.
Yulquen74 is offline  
Old 19 April 2014, 18:26   #2
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Code:
lea.l $f80000,a0
lea.l $e00000,a1
move.l #$80000,d0

Loop:
move.l (a0)+,(a1)+
subq.l #4,d0
bgt.s Loop
Mrs Beanbag is offline  
Old 19 April 2014, 18:35   #3
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Code:
lea.l $f80000,a0
lea.l $e00000,a1
move.l #$ffff,d0

Loop:
move.l (a0)+,(a1)+
move.l (a0)+,(a1)+
dbra d0,Loop
Mrs Beanbag is offline  
Old 19 April 2014, 18:40   #4
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 253
Massive speedup!
Thanks!
Yulquen74 is offline  
Old 19 April 2014, 19:25   #5
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,335
You could use a similar DBF loop but using MOVEM.L instead, so something like
Code:
        MOVE.W  #10921,D0            ;(512*1024)/48 - 1
loop:   MOVEM.L (A0)+,D1-D7/A2-A6    ;12 registers = 48 bytes
        MOVEM.L D1-D7/A2-A6,(A1)+
        DBF D0,loop
; There are 32 bytes left over to copy
        MOVEM.L (A0)+,D1-D7/A2      ;8 registers = 32 bytes
        MOVEM.L D1-D7/A2,(A1)
mark_k is online now  
Old 19 April 2014, 19:37   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,505
There is no MOVEM <regs>,(An)+ addressing mode. Only (An) or -(An).

Something like this works (but don't bother with it if CPU is 68020+)

copy
movem.l (A0)+,<regs>
movem.l <regs>,(A1)
add.l d1,a1
dbf d0,copy
Toni Wilen is offline  
Old 19 April 2014, 22:49   #7
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 253
This one is even faster than your first suggestion

My small program with 2 of those loops takes
only a second to complete at 7mhz (68000),
and less of course at higher frequencies.


Quote:
Originally Posted by Mrs Beanbag View Post
Code:
lea.l $f80000,a0
lea.l $e00000,a1
move.l #$ffff,d0

Loop:
move.l (a0)+,(a1)+
move.l (a0)+,(a1)+
dbra d0,Loop
Yulquen74 is offline  
Old 20 April 2014, 19:13   #8
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
why exactly are you copying the ROM to $e00000? This area is marked as "reserved" in the HW reference manual. If you want to be safe you should really reserve some memory from exec.library otherwise who knows what you will be writing on top of.
Mrs Beanbag is offline  
Old 21 April 2014, 01:51   #9
demolition
Unregistered User
 
demolition's Avatar
 
Join Date: Sep 2012
Location: Copenhagen / DK
Age: 43
Posts: 4,190
Quote:
Originally Posted by Yulquen74 View Post
This one is even faster than your first suggestion

My small program with 2 of those loops takes
only a second to complete at 7mhz (68000),
and less of course at higher frequencies.
Are you optimizing for speed or size? You could unroll the loop even further to gain more speed if size was not a major issue, e.g. try adding another two move.l's and halfing d0.
demolition is offline  
Old 21 April 2014, 22:59   #10
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 253
Quote:
Originally Posted by Mrs Beanbag View Post
why exactly are you copying the ROM to $e00000? This area is marked as "reserved" in the HW reference manual. If you want to be safe you should really reserve some memory from exec.library otherwise who knows what you will be writing on top of.
I have fastram mapped in that area, and its perfectly safe to use that way if I do it before it is added to the system pool with the addmem command.
The second loop transfers it back to rom address area, only now it is mapped up in fastram, so I get a "fast-rom".

Quote:
Originally Posted by demolition View Post
Are you optimizing for speed or size? You could unroll the loop even further to gain more speed if size was not a major issue, e.g. try adding another two move.l's and halfing d0.
Optimizing for speed.
Will try to add more lines and decrease loop counter. Thanks.

Last edited by prowler; 22 April 2014 at 21:20. Reason: Back-to-back posts merged.
Yulquen74 is offline  
Old 29 April 2014, 05:19   #11
Shadowfire
Registered User
 
Shadowfire's Avatar
 
Join Date: Aug 2001
Location: Connecticut USA
Posts: 617
MOVEM supports the following addressing modes:
(Ax)
-(Ax) (register to memory transfer only)
(Ax)+ (memory to register transfer only)
d(Ax)
d(Ax,Rx)
(Abs).L
(Abs).W
If you are shooting for speed, you should be using MOVEM, not MOVE. MOVEM requires two 16-bit fetch words for the instruction, and can transfer up to 14 longwords for that fetch and requires a 2nd MOVEM to write out the data, whereas MOVE.L requires 1 16-bit fetch but only copies 1 longword for that fetch.

If you use a MOVEM with 8 registers, unrolling, you can get a loop like:
Code:
(Instruction word count)
(3)LEA $F80000,A0; source address
(3)LEA $E00000,A1; destination address
(2)MOVE.W #(($80000/128)-1),D6; loop count, copying 128 bytes per iteration
(1)SUB.L D7,D7; clear d7 to 0
(1)BSET.W #5,D7; put 32 into d7

loop:
(2)MOVEM.L (A0)+,D0-D5/A2-A3; unrolled loop, 4 execution of movem
(2)MOVEM.L D0-D5/A2-A3,(A1)
(1)ADDA.L D7,A1
(2)MOVEM.L (A0)+,D0-D5/A2-A3
(2)MOVEM.L D0-D5/A2-A3,(A1)
(1)ADDA.L D7,A1
(2)MOVEM.L (A0)+,D0-D5/A2-A3
(2)MOVEM.L D0-D5/A2-A3,(A1)
(1)ADDA.L D7,A1
(2)MOVEM.L (A0)+,D0-D5/A2-A3
(2)MOVEM.L D0-D5/A2-A3,(A1)
(1)ADDA.L D7,A1
(2)DBRA.W D6,loop
This is a loop that fetches 22 instruction words (+4 dummy reads on the MOVEM.L (A0)+ instructions) to copy 128 bytes of data in each iteration, or 4.92 bytes copied per instruction word.

MRSBEANBAG'S loop of
Code:
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(1)move.l (a0)+,(a1)+
(2)dbra d0,Loop
unrolled to 20 levels, fetches 22 instruction words to copy 80 bytes, or 3.636 bytes/instruction word.

Your original loop
Code:
Loop:
(1)move.l (a0),(a1)
(3)add.l #$4,a0
(3)add.l #$4,a1
(3)cmpi.l #$e80000,a1
(1)bne Loop
fetches 11 instruction words to copy 4 bytes, or 0.36363 bytes/instruction word.

Last edited by Shadowfire; 29 April 2014 at 05:43.
Shadowfire is offline  
Old 29 April 2014, 06:27   #12
JimDrew
Registered User
 
Join Date: Dec 2013
Location: Lake Havasu City, AZ
Posts: 741
Yep, movem.l is the fastest for non-040/060 CPUs. With 040's I use move16 instead.
JimDrew is offline  
Old 29 April 2014, 15:14   #13
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
Quote:
Originally Posted by Yulquen74 View Post
I have fastram mapped in that area, and its perfectly safe to use that way if I do it before it is added to the system pool with the addmem command.
The second loop transfers it back to rom address area, only now it is mapped up in fastram, so I get a "fast-rom".
Really? How does your 7MHz 68000 access Fast RAM any faster than the Kickstart ROM?
SpeedGeek is offline  
Old 29 April 2014, 16:05   #14
demolition
Unregistered User
 
demolition's Avatar
 
Join Date: Sep 2012
Location: Copenhagen / DK
Age: 43
Posts: 4,190
Quote:
Originally Posted by SpeedGeek View Post
Really? How does your 7MHz 68000 access Fast RAM any faster than the Kickstart ROM?
Fast RAM is much faster than the ROM. I use skick on my A500+ with 7 MHz CPU to map the kickstart into fast RAM, and there is a noticeable difference in speed after doing it.
demolition is offline  
Old 29 April 2014, 16:28   #15
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
Quote:
Originally Posted by demolition View Post
Fast RAM is much faster than the ROM. I use skick on my A500+ with 7 MHz CPU to map the kickstart into fast RAM, and there is a noticeable difference in speed after doing it.
Here's my A2000 7MHz 68000 Bustest results:

BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768
========================================================================
memtype addr op cycle calib bandwidth
rom $00F80000 readw 1176.8 ns normal 1.7 * 10^6 byte/s
rom $00F80000 readl 1757.7 ns normal 2.3 * 10^6 byte/s
rom $00F80000 readm 1395.6 ns normal 2.9 * 10^6 byte/s

BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768
========================================================================
memtype addr op cycle calib bandwidth
fast $00240000 readw 1177.6 ns normal 1.7 * 10^6 byte/s
fast $00240000 readl 1760.5 ns normal 2.3 * 10^6 byte/s
fast $00240000 readm 1390.0 ns normal 2.9 * 10^6 byte/s
fast $00240000 writew 1178.0 ns normal 1.7 * 10^6 byte/s
fast $00240000 writel 1760.6 ns normal 2.3 * 10^6 byte/s
fast $00240000 writem 1319.3 ns normal 3.0 * 10^6 byte/s

Note: NTSC = 7.16 MHz, PAL = 7.09 MHz

Last edited by SpeedGeek; 30 April 2014 at 13:26.
SpeedGeek is offline  
Old 29 April 2014, 16:50   #16
demolition
Unregistered User
 
demolition's Avatar
 
Join Date: Sep 2012
Location: Copenhagen / DK
Age: 43
Posts: 4,190
They do look quite identical. Not sure then why I can feel a difference in responsiveness when using skick.
demolition is offline  
Old 02 May 2014, 20:58   #17
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 253
Quote:
Originally Posted by SpeedGeek View Post
Really? How does your 7MHz 68000 access Fast RAM any faster than the Kickstart ROM?
You are right of course, it is not faster at 7MHz.

But the important point is that it is much faster at higher cpu clock frequencies, for which this is intended (I'm toying with a homemade internal simple cpu/ram-board with a 68HC000 processor, a CPLD, 16MB of SRAM, and bidirectional bus drivers for data and address lines).

I have been using bustest, like yourself, to confirm better "rom" speeds at higher cpu frequencies.
Yulquen74 is offline  
Old 10 May 2014, 19:10   #18
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Toni's loop is the fastest IF you use more than 8 registers AND repeat many times to get less adds. Otherwise not.

move.l (a0)+,(a1)+ takes 20 cycles on 68000, 2x movem.l approaches 16 cycles per longword if you use many registers.

So a repeated move.l (a0)+,(a1)+ will take you close to the max already.

Remember that speed is only important on the slowest platforms you want to support, so code for them. On a 68030 a dead slow copy loop will be fast enough (as perceived by the user) already.
Photon is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unknown Copy-Dongle [SOLVED: Siegfried-Copy 1.9SE] TheZock support.Hardware 4 26 November 2013 00:23
Loop optimization + cycle counts losso Coders. Asm / Hardware 8 05 November 2013 11:50
Sampled loop in cracktro absence request.Music 2 30 June 2012 11:33
Requester Bug when copying IPF to Standard ADF with X-Copy/Power Copy. BarryB support.WinUAE 9 17 January 2012 20:20

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 00:14.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09951 seconds with 13 queries