![]() |
![]() |
#21 | ||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,720
|
Quote:
I used the following code for reading IDE sectors... Code:
readsector: ; a4 = IDE data port, a1 = buffer address moveq #16-1,D0 .loop: move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ move.w (A4),(A1)+ dbra D0,.loop Trying to eek out a little more speed with 'clever' code is silly IMO, especially when the rest of the driver is written in C. The 12 longword IDE port accesses will be split up into 24 word accesses because the Zorro II bus is 16 bit. With a fast CPU a large number of wait states must be inserted, and because those 24 word accesses are back-to-back it can't do anything else in between. Then it does 12 longword accesses of 32 bit memory (or worse, 16 bit Zorro II or ChipRAM) while the IDE port sits idle. More time might be wasted synchronizing to the Zorro II bus than if a more 'naive' algorithm was used. Another (minor) factor is that at least some of those 12 registers will probably have to be saved on the stack and restored afterwards, wasting time that a routine using fewer registers wouldn't. I would start by writing the most naive C code to do the job, then try unrolling the loop to 16 moves in C (like I did in asm), let the compiler do its magic and see what it produces. Measure the actual data transfer speeds when using the device and decide whether further optimization is worth doing. BTW I found this in the 68040 user manual addendum. Relating to the MMU... Quote:
Another possible issue is that the movem will take a long time compared to most other instructions, increasing interrupt latency. Could cause overruns at high baud rates on the internal serial port. |
||
![]() |
![]() |
#22 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,444
|
@Bruce
How did you determine how far to unroll that loop? It seems a bit excessive. |
![]() |
![]() |
#23 |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,720
|
I just kept increasing it until further improvement wasn't enough to justify it - and 16 was a nice round number. On a fast CPU with instruction and data caches I wouldn't unroll it at all.
|
![]() |
![]() |
#24 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,657
|
Quote:
Cache is not really useful for large linear memory operations. You want to reserve it for recurring individual lookups. Also, you're writing a copy and then want to forget it immediately. The read will obey source RAM speeds since the data is not in cache yet and the write will obey destination RAM speeds since it's never read the source address before. |
|
![]() |
![]() |
#25 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,307
|
Quote:
No worries here. The double-write problem only appears on physical page faults (a device or RAM pulls TAE) and not on address translation faults (the source or target MMU page is non-resident). The problem appears if during a transfer the first half of the data to transfer is written, then a physical page fault happens on the second half. In such a case, the 68040 (unlike the 68030) restarts the entire instruction after the cause of the fault has hopefully been repaired, and when the instruction is restarted, the read and the write part of the move is repeated, including the already worked-on first half. The 68030 would safe sufficient state information to continue the partially executed instruction, but neither the 68040 nor the 68060 can do that. For non-resident pages, something different happens: The 68040 and 68060 MMU check upfront whether the entire instruction touches an invalid page descriptor (there could be in total 8 descriptors affected, due to the double-indirection addressing mode - two on the first level of source indirection, two on the second source level indirection, and four again for the destination), and if any of the accesses is invalid, the MMU triggers an access error upfront, before starting the instruction. So no worries for invalid pages, only for physical address errors. |
|
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
movem or two separate move | phx | Coders. Asm / Hardware | 15 | 20 August 2023 14:11 |
68040 to 68060 adapter respin with A2000 and Zeus 68040 Accelerator | richx | support.Hardware | 14 | 26 April 2022 05:46 |
Changing 68040 25 to 68040 33 on 603e plus | Jpor | support.Hardware | 27 | 06 January 2022 22:20 |
Amiga a3640 processor card and 68040/68040 processors | Euphoria | MarketPlace | 3 | 26 February 2017 21:15 |
my prog gets slower and slower | AGS | Coders. System | 2 | 19 March 2015 22:27 |
|
|