26 December 2021, 16:04 | #1 |
Registered User
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
|
CIA access speed differs with Amiga model
Hi,
Have been fiddling with a project which pokes the CIA 8520 parallel port. Did all work initially on an A3000, but later when testing it on an A1200, I noticed I got significantly lower results. That was unexpected, so did some more thorough testing on more machines and the results were that on A500 (68000), A2000 (68030) or A1200 (68020, 68030, 68060), I got quite exactly half the speed when accessing CIA registers compared to an A3000 (68030) or A4000 (68060) and it was the same for a few repeated accesses or when mashing on it for seconds. I always assumed that the CIA access speed would be the same on all Amigas, as they are all driven by the same E-clock at ~710kHz, only differing slighly between PAL and NTSC machines. Figured out what I think is a quite good test to illustrate this difference - repeatedly reading the low byte register of one of the CIA timers in running mode. When the timer is running, this register will count down one step for each E-clock cycle, so if you do repeated reads of it and compare the values read, you can see how many E-clock cycles are required for each read. Have written a utility which does this plus an interleaved read/write test so it is possible to see if there is any difference for writes. It does this for all available timers (four in total, but usually only two are available). The reads and writes are repeated move.b to/from registers, done inside Disable()/Enable() so should be fast enough on all machines and not possible to disturb. Executable and source is included in the archive: http://megaburken.net/~patrik/CiaAccessTests.lha Runs on kickstart 1.2+. Results on A3000 (68030): Code:
10.Ram Disk:CiaAccessTests> CiaAccessTests ciaa.talo(BFE401) reads: 0: 89 1: 88 2: 87 3: 86 ciab.talo(BFD400) reads: 0: 161 1: 160 2: 159 3: 158 ciaa.talo(BFE401) reads interleaved with ciaa.ddrb(BFE301) writes: 0: 124 1: 122 2: 120 3: 118 ciab.talo(BFD400) reads interleaved with ciaa.ddrb(BFE301) writes: 0: 91 1: 89 2: 87 3: 85 Results on A1200 (68060): Code:
10.Ram Disk:CiaAccessTests> CiaAccessTests ciaa.talo(BFE401) reads: 0: 111 1: 109 2: 107 3: 105 ciab.talo(BFD400) reads: 0: 6 1: 4 2: 2 3: 0 ciaa.talo(BFE401) reads interleaved with ciaa.ddrb(BFE301) writes: 0: 233 1: 229 2: 225 3: 221 ciab.talo(BFD400) reads interleaved with ciaa.ddrb(BFE301) writes: 0: 199 1: 195 2: 191 3: 187 This significant difference perhaps is common knowledge, but it was unknown to me. Does anyone know any details about the reason for it? |
26 December 2021, 16:39 | #2 |
Zone Friend
Join Date: May 2006
Location: France
Posts: 1,801
|
I think I read in the 1993 Devcon that CIA differ and they said to use the OS but also gave technical info about the differences.
|
26 December 2021, 17:28 | #3 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Thanks for making this benchmark tool available!
Unfortunately, I now have my GVP G-Force 030 installed in my A2000 (waiting support for another long delayed project). So I can't provide any immediate (E clock speedup mod) benchmark results: https://eab.abime.net/showthread.php...92#post1523692 |
26 December 2021, 17:38 | #4 |
Registered User
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
|
I believe the problem is not with the speed of CIA itself, but the speed of the processor that accesses CIA.
You should consider how the 68000 processor accesses the 6800 bus to which CIA is attached (the E, VPA, VMA signals that are described on page 185 of this document https://www.nxp.com/docs/en/referenc.../MC68000UM.pdf). The E signal is low for six 7 MHz clock cycles, and then high for four 7 MHz clock cycles. The access of CIA is done during the four cycles when E is high. If the E signal is high when an access to CIA starts then the processor will wait until the E signal is low again, and make the access during the next window when E goes high. During the six cycles that E is low, the CPU needs to fetch and execute any instructions that are before the next instruction that is accessing CIA, and then fetch the instruction that accesses CIA. For a normal 68000 processor, fetching one instruction takes at least 4 cycles, so if any instruction comes between the two instructions that access CIA then the second CIA access will miss the "E high window", and the processor stalls until the the next E high window, so now the speed of CIA accesses will effectively halve. When using the 030 processor on the other hand, the processor is much faster and can manage to do one access in every E high window. The fastest way to communicate over the parallel port that I could come up with is as follows: https://github.com/niklasekstrom/ami...i_low.asm#L170. This results in 2E speed (1 byte communicated on every other E cycle) on an Amiga with an accelerator that allows the processor to get an access in on every opportunity. (For more background on the speed of protocols that communicate over the parallel port, see this page: https://lallafa.de/blog/2015/09/amig...st-can-you-go/.) |
26 December 2021, 17:49 | #5 |
Registered User
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
|
@Niklas:
I think you are right in that the answer lies not in the speed of CIA itself, but in how it is accessed. I can buy that the stock 68000 potentially might be busy fetching and executing the next instruction, missing the next E high window, however the 68030 A2000, the 68020, 68030 and 68060 A1200’s get the same 2-cycle result as the 68000 A500. Only the 68030 A3000 and 68060 A4000 manages 1-cycle access. |
26 December 2021, 18:20 | #6 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Check this thread https://eab.abime.net/showthread.php?t=107908 from message #8 onward.
There are a lot of similar arguments and test code. |
26 December 2021, 18:58 | #7 | |
Registered User
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
|
Quote:
|
|
27 December 2021, 00:06 | #8 |
Registered User
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
|
More information that may be relevant to this thread... In Amigas whose processor doesn't have the 6800 interface logic, that logic is instead in Gayle. On page 6 of https://www.amigawiki.org/lib/exe/fe...cification.pdf it says: "_AS must be asserted by 3 clocks before E CLK goes high, or you wait until next time around". I'm not sure if the logic in 68000 works the same but I would guess so. Minimig seems to have the same idea: https://github.com/MiSTer-devel/Mini...8k_bridge.v#L3.
|
27 December 2021, 03:10 | #9 |
Registered User
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
|
@Kamelito:
Did an attempt at finding it, but the closest thing I could find was the 88 and 89 8520_Timing documents, but I don't think it is right? @Speedgeek: Cheers, please do a run on the A2630 when you install it next time. @ross: Very interesting thread, thank you very much! Saved logs from a couple of machines from your and simion's tests: http://megaburken.net/~patrik/ciatest/cia-speed_b/ http://megaburken.net/~patrik/ciatest/CIA_tests/ @Niklas: In the A3000 and A4000, Fat Gary generates the E-clock and chip-selects for the 8520's. http://megaburken.net/~patrik/A3000/...cification.pdf (downloaded from http://amiga.serveftp.net/datasheets.html to not put unnecessary strain on his connection) mentions some iteresting details: "The ECLK signal is generated in GARY. It is a free running clock whose fequency is 1/10th of the 7M clock. Normally ECLK is low for six 7M clocks, and high for four 7M clocks. However, when the CIAs are accessed, the ECLK high time may be shorter than four 7M clocks. During writes to the CIAs, ECLK is high for only two 7M clocks. DUring reads ECLK stays high for a minimum of two 7M clocks, and a maximum of four 7M clocks. The frequency of ECLK does not change. If the ECLK high time is shortened during CIA access, the difference is made up by increasing the subsequency ECLK low time. Consequencyly, it is always ten 7M clocks from one rusing edge of ECLK to the next." However, there is no explaining why it does it, but I assume it has something to do with why the A3000 and A4000 gets consistent 1-cycle 8520 accesses. On that note, the 50MHz A500 accelerator you had seen 1E accesses on perhaps does some similar trick when generating the E-clock. Last edited by patrik; 27 December 2021 at 03:24. |
27 December 2021, 11:13 | #10 | |||
Registered User
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
|
Quote:
Quote:
Quote:
One thing I'm curious about is that the Fat Gary document says "During reads ECLK stays high for a minimum of two 7M clocks, and a maximum of four 7M clocks.". I wonder under what conditions reads become two or four clocks long. |
|||
27 December 2021, 12:25 | #11 |
Registered User
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
|
Could this be to be able to adaptively synchronize the local 25MHz 030 bus which runs asynchronous to the 7M clock (derived from the 28.x MHz "chipset clock")?
|
27 December 2021, 22:11 | #12 |
Registered User
Join Date: Aug 2006
Location: Augusta, Georgia, USA
Posts: 548
|
I always understood the A3000 and A4000 were the only fully 32-bit "path to chipset" machines, so this seems right.
The others are all only 16-bit, or there is another reason? Last edited by r.cade; 28 December 2021 at 00:33. |
28 December 2021, 11:19 | #13 | |
-
Join Date: Jul 2003
Location: Helsinki / Finland
Age: 43
Posts: 9,861
|
Quote:
Talking about CIA access, those have 8 bit wide registers, which are wired to the data bus so they take up 16 bits of the data bus in parallel, so the bus width of the CPU doesn't make that much difference. |
|
28 December 2021, 11:45 | #14 | |
Registered User
Join Date: Sep 2013
Location: Poland
Posts: 806
|
Quote:
|
|
29 December 2021, 08:33 | #15 |
-
Join Date: Jul 2003
Location: Helsinki / Finland
Age: 43
Posts: 9,861
|
I was specifically thinking about cpu access to chip ram.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is there a way to speed up internal SD access? | AmigaNoob | support.Hardware | 26 | 04 May 2020 12:17 |
memory access speed question | Lord Riton | Coders. General | 42 | 27 February 2019 14:26 |
Program to speed up floppy disk access? | BarryB | support.Apps | 22 | 26 March 2013 19:30 |
pinballs - same game speed or calcs differs | Chain | Retrogaming General Discussion | 4 | 01 March 2009 21:02 |
Slow speed Direct HD access | Dan Andrea | support.WinUAE | 3 | 27 December 2002 14:21 |
|
|