English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 01 December 2023, 17:37   #21
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,720
Quote:
Originally Posted by patrik View Post
I would recommend to measure that there is an actual benefit for say the 040 for this usecase before adding code and some complexity otherwise not needed .
I agree.

I used the following code for reading IDE sectors...
Code:
readsector:
; a4 = IDE data port, a1 = buffer address
 moveq   #16-1,D0
.loop:
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+       
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 move.w  (A4),(A1)+
 dbra    D0,.loop
...and it was fast enough for me on 68000.

Trying to eek out a little more speed with 'clever' code is silly IMO, especially when the rest of the driver is written in C.

The 12 longword IDE port accesses will be split up into 24 word accesses because the Zorro II bus is 16 bit. With a fast CPU a large number of wait states must be inserted, and because those 24 word accesses are back-to-back it can't do anything else in between. Then it does 12 longword accesses of 32 bit memory (or worse, 16 bit Zorro II or ChipRAM) while the IDE port sits idle. More time might be wasted synchronizing to the Zorro II bus than if a more 'naive' algorithm was used.

Another (minor) factor is that at least some of those 12 registers will probably have to be saved on the stack and restored afterwards, wasting time that a routine using fewer registers wouldn't.

I would start by writing the most naive C code to do the job, then try unrolling the loop to 16 moves in C (like I did in asm), let the compiler do its magic and see what it produces. Measure the actual data transfer speeds when using the device and decide whether further optimization is worth doing.

BTW I found this in the 68040 user manual addendum. Relating to the MMU...

Quote:
16.When accessing I/O peripherals that are sensitive to double writes, the following guidelines must be
followed:
1) The peripheral must reside in non-cacheable, serialized memory.
2) If possible, use only instructions that can generate one data page fault per instruction.
3) Do not use the following instructions: bfclr, bfset, bfins, movem, fmove, fmovem, fsave, movep
Might be a problem if a page fault is triggered part way through writing a sector.

Another possible issue is that the movem will take a long time compared to most other instructions, increasing interrupt latency. Could cause overruns at high baud rates on the internal serial port.
Bruce Abbott is offline  
Old 01 December 2023, 18:09   #22
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,444
@Bruce

How did you determine how far to unroll that loop? It seems a bit excessive.
Karlos is online now  
Old 01 December 2023, 19:36   #23
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,720
Quote:
Originally Posted by Karlos View Post
@Bruce

How did you determine how far to unroll that loop? It seems a bit excessive.
I just kept increasing it until further improvement wasn't enough to justify it - and 16 was a nice round number. On a fast CPU with instruction and data caches I wouldn't unroll it at all.
Bruce Abbott is offline  
Old 01 December 2023, 20:38   #24
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,657
Quote:
Originally Posted by Wepl View Post
here is the difference: not movem is slower than multiple moves
instead: a loop of movem read and movem write is slower than single moves

the reason is the speed of memory accesses, wait states etc.
This.

Cache is not really useful for large linear memory operations. You want to reserve it for recurring individual lookups.

Also, you're writing a copy and then want to forget it immediately. The read will obey source RAM speeds since the data is not in cache yet and the write will obey destination RAM speeds since it's never read the source address before.
Photon is offline  
Old 02 December 2023, 09:04   #25
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,307
Quote:
Originally Posted by Bruce Abbott View Post
BTW I found this in the 68040 user manual addendum. Relating to the MMU...

Might be a problem if a page fault is triggered part way through writing a sector.

No worries here. The double-write problem only appears on physical page faults (a device or RAM pulls TAE) and not on address translation faults (the source or target MMU page is non-resident). The problem appears if during a transfer the first half of the data to transfer is written, then a physical page fault happens on the second half. In such a case, the 68040 (unlike the 68030) restarts the entire instruction after the cause of the fault has hopefully been repaired, and when the instruction is restarted, the read and the write part of the move is repeated, including the already worked-on first half. The 68030 would safe sufficient state information to continue the partially executed instruction, but neither the 68040 nor the 68060 can do that.


For non-resident pages, something different happens: The 68040 and 68060 MMU check upfront whether the entire instruction touches an invalid page descriptor (there could be in total 8 descriptors affected, due to the double-indirection addressing mode - two on the first level of source indirection, two on the second source level indirection, and four again for the destination), and if any of the accesses is invalid, the MMU triggers an access error upfront, before starting the instruction. So no worries for invalid pages, only for physical address errors.
Thomas Richter is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
movem or two separate move phx Coders. Asm / Hardware 15 20 August 2023 14:11
68040 to 68060 adapter respin with A2000 and Zeus 68040 Accelerator richx support.Hardware 14 26 April 2022 05:46
Changing 68040 25 to 68040 33 on 603e plus Jpor support.Hardware 27 06 January 2022 22:20
Amiga a3640 processor card and 68040/68040 processors Euphoria MarketPlace 3 26 February 2017 21:15
my prog gets slower and slower AGS Coders. System 2 19 March 2015 22:27

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 10:19.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.12661 seconds with 15 queries