English Amiga Board


Go Back   English Amiga Board > Support > support.Hardware

 
 
Thread Tools
Old 26 December 2021, 16:04   #1
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
CIA access speed differs with Amiga model

Hi,

Have been fiddling with a project which pokes the CIA 8520 parallel port. Did all work initially on an A3000, but later when testing it on an A1200, I noticed I got significantly lower results.

That was unexpected, so did some more thorough testing on more machines and the results were that on A500 (68000), A2000 (68030) or A1200 (68020, 68030, 68060), I got quite exactly half the speed when accessing CIA registers compared to an A3000 (68030) or A4000 (68060) and it was the same for a few repeated accesses or when mashing on it for seconds.

I always assumed that the CIA access speed would be the same on all Amigas, as they are all driven by the same E-clock at ~710kHz, only differing slighly between PAL and NTSC machines.

Figured out what I think is a quite good test to illustrate this difference - repeatedly reading the low byte register of one of the CIA timers in running mode. When the timer is running, this register will count down one step for each E-clock cycle, so if you do repeated reads of it and compare the values read, you can see how many E-clock cycles are required for each read.

Have written a utility which does this plus an interleaved read/write test so it is possible to see if there is any difference for writes. It does this for all available timers (four in total, but usually only two are available). The reads and writes are repeated move.b to/from registers, done inside Disable()/Enable() so should be fast enough on all machines and not possible to disturb. Executable and source is included in the archive:
http://megaburken.net/~patrik/CiaAccessTests.lha

Runs on kickstart 1.2+.

Results on A3000 (68030):
Code:
10.Ram Disk:CiaAccessTests> CiaAccessTests 
ciaa.talo(BFE401) reads:
0:  89
1:  88
2:  87
3:  86
ciab.talo(BFD400) reads:
0: 161
1: 160
2: 159
3: 158
ciaa.talo(BFE401) reads interleaved with ciaa.ddrb(BFE301) writes:
0: 124
1: 122
2: 120
3: 118
ciab.talo(BFD400) reads interleaved with ciaa.ddrb(BFE301) writes:
0:  91
1:  89
2:  87
3:  85
One cycle between each read and two cycles between each read/write so one cycle both for read and write on the A3000.


Results on A1200 (68060):
Code:
10.Ram Disk:CiaAccessTests> CiaAccessTests 
ciaa.talo(BFE401) reads:
0: 111
1: 109
2: 107
3: 105
ciab.talo(BFD400) reads:
0:   6
1:   4
2:   2
3:   0
ciaa.talo(BFE401) reads interleaved with ciaa.ddrb(BFE301) writes:
0: 233
1: 229
2: 225
3: 221
ciab.talo(BFD400) reads interleaved with ciaa.ddrb(BFE301) writes:
0: 199
1: 195
2: 191
3: 187
Two cycles between each read and four cycles between each read/write so two cycles both for read and write on the A1200.


This significant difference perhaps is common knowledge, but it was unknown to me. Does anyone know any details about the reason for it?
patrik is offline  
Old 26 December 2021, 16:39   #2
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
I think I read in the 1993 Devcon that CIA differ and they said to use the OS but also gave technical info about the differences.
kamelito is offline  
Old 26 December 2021, 17:28   #3
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
Thanks for making this benchmark tool available!

Unfortunately, I now have my GVP G-Force 030 installed in my A2000 (waiting support for another long delayed project). So I can't provide any immediate (E clock speedup mod) benchmark results:

https://eab.abime.net/showthread.php...92#post1523692
SpeedGeek is offline  
Old 26 December 2021, 17:38   #4
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
I believe the problem is not with the speed of CIA itself, but the speed of the processor that accesses CIA.

You should consider how the 68000 processor accesses the 6800 bus to which CIA is attached (the E, VPA, VMA signals that are described on page 185 of this document https://www.nxp.com/docs/en/referenc.../MC68000UM.pdf).

The E signal is low for six 7 MHz clock cycles, and then high for four 7 MHz clock cycles.

The access of CIA is done during the four cycles when E is high. If the E signal is high when an access to CIA starts then the processor will wait until the E signal is low again, and make the access during the next window when E goes high.

During the six cycles that E is low, the CPU needs to fetch and execute any instructions that are before the next instruction that is accessing CIA, and then fetch the instruction that accesses CIA.

For a normal 68000 processor, fetching one instruction takes at least 4 cycles, so if any instruction comes between the two instructions that access CIA then the second CIA access will miss the "E high window", and the processor stalls until the the next E high window, so now the speed of CIA accesses will effectively halve.

When using the 030 processor on the other hand, the processor is much faster and can manage to do one access in every E high window.

The fastest way to communicate over the parallel port that I could come up with is as follows: https://github.com/niklasekstrom/ami...i_low.asm#L170. This results in 2E speed (1 byte communicated on every other E cycle) on an Amiga with an accelerator that allows the processor to get an access in on every opportunity.

(For more background on the speed of protocols that communicate over the parallel port, see this page: https://lallafa.de/blog/2015/09/amig...st-can-you-go/.)
Niklas is offline  
Old 26 December 2021, 17:49   #5
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
@Niklas:
I think you are right in that the answer lies not in the speed of CIA itself, but in how it is accessed.

I can buy that the stock 68000 potentially might be busy fetching and executing the next instruction, missing the next E high window, however the 68030 A2000, the 68020, 68030 and 68060 A1200’s get the same 2-cycle result as the 68000 A500. Only the 68030 A3000 and 68060 A4000 manages 1-cycle access.
patrik is offline  
Old 26 December 2021, 18:20   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Check this thread https://eab.abime.net/showthread.php?t=107908 from message #8 onward.
There are a lot of similar arguments and test code.
ross is offline  
Old 26 December 2021, 18:58   #7
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
Quote:
Originally Posted by patrik View Post
I can buy that the stock 68000 potentially might be busy fetching and executing the next instruction, missing the next E high window, however the 68030 A2000, the 68020, 68030 and 68060 A1200’s get the same 2-cycle result as the 68000 A500. Only the 68030 A3000 and 68060 A4000 manages 1-cycle access.
I have observed 1E-cycle CIA accesses using a 50 MHz HC508 accelerator in an A500, so that is definitively possible.
Niklas is offline  
Old 27 December 2021, 00:06   #8
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
More information that may be relevant to this thread... In Amigas whose processor doesn't have the 6800 interface logic, that logic is instead in Gayle. On page 6 of https://www.amigawiki.org/lib/exe/fe...cification.pdf it says: "_AS must be asserted by 3 clocks before E CLK goes high, or you wait until next time around". I'm not sure if the logic in 68000 works the same but I would guess so. Minimig seems to have the same idea: https://github.com/MiSTer-devel/Mini...8k_bridge.v#L3.
Niklas is offline  
Old 27 December 2021, 03:10   #9
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
@Kamelito:
Did an attempt at finding it, but the closest thing I could find was the 88 and 89 8520_Timing documents, but I don't think it is right?

@Speedgeek:
Cheers, please do a run on the A2630 when you install it next time.

@ross:
Very interesting thread, thank you very much! Saved logs from a couple of machines from your and simion's tests:
http://megaburken.net/~patrik/ciatest/cia-speed_b/ http://megaburken.net/~patrik/ciatest/CIA_tests/

@Niklas:
In the A3000 and A4000, Fat Gary generates the E-clock and chip-selects for the 8520's. http://megaburken.net/~patrik/A3000/...cification.pdf (downloaded from http://amiga.serveftp.net/datasheets.html to not put unnecessary strain on his connection) mentions some iteresting details:
"The ECLK signal is generated in GARY. It is a free running clock whose fequency is 1/10th of the 7M clock. Normally ECLK is low for six 7M clocks, and high for four 7M clocks. However, when the CIAs are accessed, the ECLK high time may be shorter than four 7M clocks. During writes to the CIAs, ECLK is high for only two 7M clocks. DUring reads ECLK stays high for a minimum of two 7M clocks, and a maximum of four 7M clocks. The frequency of ECLK does not change. If the ECLK high time is shortened during CIA access, the difference is made up by increasing the subsequency ECLK low time. Consequencyly, it is always ten 7M clocks from one rusing edge of ECLK to the next."

However, there is no explaining why it does it, but I assume it has something to do with why the A3000 and A4000 gets consistent 1-cycle 8520 accesses. On that note, the 50MHz A500 accelerator you had seen 1E accesses on perhaps does some similar trick when generating the E-clock.

Last edited by patrik; 27 December 2021 at 03:24.
patrik is offline  
Old 27 December 2021, 11:13   #10
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
Quote:
Originally Posted by patrik View Post
In the A3000 and A4000, Fat Gary generates the E-clock and chip-selects for the 8520's. (...) mentions some interesting details: (...)
That is interesting indeed.

Quote:
Originally Posted by patrik View Post
However, there is no explaining why it does it, but I assume it has something to do with why the A3000 and A4000 gets consistent 1-cycle 8520 accesses.
I guess so too, it seems like they came up with an optimization of sorts.

Quote:
Originally Posted by patrik View Post
On that note, the 50MHz A500 accelerator you had seen 1E accesses on perhaps does some similar trick when generating the E-clock.
Quite possibly. I don't have the accelerator here, but it would be interesting to see if that is the case.

One thing I'm curious about is that the Fat Gary document says "During reads ECLK stays high for a minimum of two 7M clocks, and a maximum of four 7M clocks.". I wonder under what conditions reads become two or four clocks long.
Niklas is offline  
Old 27 December 2021, 12:25   #11
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 922
Quote:
Originally Posted by Niklas View Post
One thing I'm curious about is that the Fat Gary document says "During reads ECLK stays high for a minimum of two 7M clocks, and a maximum of four 7M clocks.". I wonder under what conditions reads become two or four clocks long.
Could this be to be able to adaptively synchronize the local 25MHz 030 bus which runs asynchronous to the 7M clock (derived from the 28.x MHz "chipset clock")?
patrik is offline  
Old 27 December 2021, 22:11   #12
r.cade
Registered User
 
r.cade's Avatar
 
Join Date: Aug 2006
Location: Augusta, Georgia, USA
Posts: 548
I always understood the A3000 and A4000 were the only fully 32-bit "path to chipset" machines, so this seems right.

The others are all only 16-bit, or there is another reason?

Last edited by r.cade; 28 December 2021 at 00:33.
r.cade is offline  
Old 28 December 2021, 11:19   #13
Jope
-
 
Jope's Avatar
 
Join Date: Jul 2003
Location: Helsinki / Finland
Age: 43
Posts: 9,861
Quote:
Originally Posted by r.cade View Post
I always understood the A3000 and A4000 were the only fully 32-bit "path to chipset" machines, so this seems right.

The others are all only 16-bit, or there is another reason?
You're probably thinking about chip ram data bus width? There the A3000 is the only ECS machine that has 32 bits there. All AGA machines have 32 bits to chip ram.

Talking about CIA access, those have 8 bit wide registers, which are wired to the data bus so they take up 16 bits of the data bus in parallel, so the bus width of the CPU doesn't make that much difference.
Jope is offline  
Old 28 December 2021, 11:45   #14
Promilus
Registered User
 
Join Date: Sep 2013
Location: Poland
Posts: 806
Quote:
You're probably thinking about chip ram data bus width? There the A3000 is the only ECS machine that has 32 bits there
That's not entirely true. Clever buffering allows CPU to access CHIPRAM on A3000 with full-width 32b data interface and CHIPRAM itself is organized as 32bit but chipset only can access 16bit at a time so latching and proper control signals were used whenever Agnus access chipram.
Promilus is offline  
Old 29 December 2021, 08:33   #15
Jope
-
 
Jope's Avatar
 
Join Date: Jul 2003
Location: Helsinki / Finland
Age: 43
Posts: 9,861
I was specifically thinking about cpu access to chip ram.
Jope is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a way to speed up internal SD access? AmigaNoob support.Hardware 26 04 May 2020 12:17
memory access speed question Lord Riton Coders. General 42 27 February 2019 14:26
Program to speed up floppy disk access? BarryB support.Apps 22 26 March 2013 19:30
pinballs - same game speed or calcs differs Chain Retrogaming General Discussion 4 01 March 2009 21:02
Slow speed Direct HD access Dan Andrea support.WinUAE 3 27 December 2002 14:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 14:10.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.19775 seconds with 15 queries