14 August 2021, 15:06 | #21 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
And I suppose Saimo hasn't changed the CACR settings, so icache is on by default and the code is run from there. I think this is an interesting case |
|
14 August 2021, 15:40 | #22 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
https://lallafa.de/blog/2015/09/amig...st-can-you-go/ And also the emulation of an A500 with a 68k@14MHz and fast-ram I think is accurate in WinUAE. What I don't realise is the A1200.. But now I have a doubt .. in your tests what do you mean by "stock A1200"? Is it still the same A1200 with Blizzard off or another basic A1200 without an accelerator card and fast memory? |
|
14 August 2021, 16:00 | #23 | ||
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Skimming quickly, I see that it's mostly oriented to writing rather than reading. At the bottom, there's a little paragraph about reading that says that, basically, the speed should be the same. But I wonder if the context of proper data transfers (instead of just testing PRA repetedly) affects timings. By the way, this just suggested me to try my test again with DRA set entirely to output, entirely to input and in a mixed way. I don't believe it should make any difference, but anyway... The bad news is that ealier, when I turned on the machine to make the tests, the monitor refused to come to life: I temporarily hooked the Amiga up to the monitor I use for the C64, but that one (newer and better) doesn't support the 50 Hz refresh rate (silly device... it's also a TV, but when the VGA input is selected it refuses to support any mode which doesn't match the built-in ones). This forces me to make some tests by typing blindly on the keyboard and redirecting the output to file... Quote:
|
||
14 August 2021, 16:25 | #24 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
As I see it, after setting up the read, the correct state will be available to read a while later. We don't know the while, but we think we need much less than 300 usecs (~5 scanlines). Code uses multiple reads to cause a delay and then relies on the last reading. I don't see how that must be necessary. It should be possible to do something useful for the wait period required.
tst.b ;if necessary ;..useful code that takes at least n usecs (accelerator caveat) tst.b ;read the state I agree that the best time to read the inputs are during the VBI. A Copper Interrupt should suffice, as long as no higher priority interrupts are... taking priority. |
14 August 2021, 16:37 | #25 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
If the results are confirmed, the A1200 emulation could be (slightly ) improved. |
|
14 August 2021, 16:50 | #26 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
No need to bother with ddr. Just use this version that write to the unused $B register
|
14 August 2021, 18:30 | #27 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Well, from experience, the board is dead when disabled. Only a machine reset can revive it. I made a number of tests. I'm preparing the results... |
|
14 August 2021, 18:56 | #28 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
I should have been doing other stuff, but I got sucked into this...
I decided to write a number of tests to check various read/write combinations. The tests are based on these sequences of instructions (which are used also as labels, without "dbf"): Code:
* clr dbf... * st dbf... * tst dbf... * clr clr dbf... * st st dbf... * tst tst dbf... * clr tst dbf... * st tst dbf... * tst clr dbf... * tst st dbf... * clr clr tst tst dbf... * st st tst tst dbf... * tst tst clr clr dbf... * tst tst st st dbf... * clr tst st tst dbf... * st tst clr tst dbf... * tst clr tst st dbf... * tst st tst clr dbf... The core loop executes 709379 times and looks like this (the actual instructions in the loop reflect the combinations posted above): Code:
lea.l $bfe001,a0 ;not included in the time measurement lea.l $bfe201,a1 ;not included in the time measurement move.l #709378,d0 ;not included in the time measurement .l tst.b (a0) st.b (a1) tst.b (a0) clr.b (a1) dbf d0,.l clr.w d0 subq.l #1,d0 bpl.l .l * the only case where access takes 1 E Clock cycle is clr, and only on the stock A1200 (on the Blizzard it takes 2 cycles, instead); * all other accesses take 2 cycles; * in some cases it might seem that combinations of reads and writes give an average access of 1.5 cycles, but I'm pretty sure that's just the effect of st, which is slower than clr and tst, and overlaps partially with the following instruction (including dbf); * in some cases the 68030 seems to overlap the instructions slightly less efficiently than the 68020 (no, I didn't swap the CPUs around) - see the cases where the loop timing is about 3.1, 6.2 and 7.1 cycles. Maybe later, if I can, I'll make a test where st is replaced with move.b dx,(ax). Attached is the whole set of test programs and a script that will execute all them, producing a log. It would be very interesting to see the results of other machines. EDIT: I have made also the following tests with move.b d0,(a1) now: Code:
* move dbf... * move_move dbf... * move_tst dbf... * tst_move dbf... The archive attached here now contains also these new tests. And, by the way, I forgot to mention that they are for 68020+ (I made them on top of a test-bed program I use for everything else, and it's written for 68020 or better CPUs only). (Not so) funny side note: in the meanwhile, I opened the monitor I use for the A1200 to see why it wouldn't turn on anymore; I expected to find some leaking capacitors - and indeed several capacitors belonging to the internal power supply block were bulging - but unfortunately another part of the same block suffered a much worse damage Last edited by saimo; 17 August 2021 at 19:11. |
14 August 2021, 22:50 | #29 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
So I suppose my cia-speed_b.68k return 1 E-cycle as a result. The results are somewhat similar to what happens with writing and reading in chip-ram, but greatly amplified due to the granularity of the clock. There is a GAYLE document (it also deals with the synchronization of the processor to the CIA accesses as well as the generation of the E-clock) that could perhaps explain this time difference between read and write. If all times were confirmed in other real machines then it could be useful in emulation. |
|
14 August 2021, 23:06 | #30 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
EDIT: sorry, I hadn't noticed the second test program! I'll try it now and report back. EDIT2: here are the results: Code:
S) Elapsed: 1409 ms, data: 1000000 bytes, speed: 709,72 KB/s B) Elapsed: 5614 ms, data: 1000000 bytes, speed: 178,12 KB/s S) stock PAL A1200 B) same A1200, but with Blizzard 1230 IV (68030 at 50 MHz and 60 ns RAM) on BTW, by coincidence you posted just when I edited my previous post: if you haven't noticed that already, give it another look to see the results with the move instruction. Last edited by saimo; 14 August 2021 at 23:33. |
|
14 August 2021, 23:42 | #31 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Quote:
EDIT: I play hard too, attached NODMA test (but i don't really believe in it...) Last edited by ross; 14 August 2021 at 23:55. |
||
15 August 2021, 00:06 | #32 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Now, given that the E clock cycle is 5 times slower than a color clock cycle, I'd expect the 68030 to have something between 26*5 = 130 and 27*5 = 135 free CPU cycles after a write to a CIA. This is an example of how much dummy code can be added after a write to DDRA without affecting at all the overall execution time: Code:
.l move.b d0,(a1) moveq.l #14,d1 ;2 cycles moveq.l #14,d1 ;2 cycles moveq.l #14,d1 ;2 cycles moveq.l #14,d1 ;2 cycles .d add.l d2,d2 ;2 cycles dbf d1,.d ;6 cycles (except for the last time) dbf d0,.l clr.w d0 subq.l #1,d0 bpl.b .l (The timings are relative to the CPU and cached instructions.) There is no instruction overlap because none of the instructions has a tail. The moveqs take 4*2 = 8 cycles. The inner loop takes 2+6 = 8 cycles. It executes 15 times, so it takes 8*15 = 120 cycles. The total is thus 8+120 = 128 cycles. Actually, though, also the outer dbf has be taken into account, as that executes in parallel as well: hence, the total is 134 cycles. Makes perfect sense. EDIT: I forgot the following (I was too exhausted). The loop executes 709379 times and takes 2 seconds (both with and without the dummy code). This tells us that (cycle = E clock cycle): * each loop takes 2 cycles (confirming the previous tests); * the CPU manages to commit the write in 1 cycle (otherwise it wouldn't be able to execute the dummy code for 1 additional cycle); * two consecutive writes can't happen in less than 2 cycles (as proven by the other tests, even if made with two consecutive instructions - e.g. move move). From this, one might think: * that the extra cycle is needed to complete the bus protocol (until that is completed, the CPU bus controller is stalled, so the CPU cannot execute another access to memory) or * that the CIA needs the extra cycle to be able to accept the second write. However, the tests on the stock A1200 disprove both the hypotheses: the 68020 does manage to execute clr and move (but not scc) entirely in 1 cycle (and also execute something else in between consecutive writes, like dbf)! So, the oddity must lie in the expansion bus. That said, please keep in mind that reads, instead, always take 2 cycles (also on the stock A1200, I mean) - again, on my machine, at least. Last edited by saimo; 15 August 2021 at 10:01. |
|
15 August 2021, 10:33 | #33 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Then, it dawned on me: your program reports that the test lasted about 5.6 seconds, but it actually lasted half as much (or something: I just mentally counted the seconds)! Given that the A1200 monitor is no more, I did the test after a full system boot, which opens an Euro72 screen (so that I get to see it also on a dumb VGA monitor) and I guessed that perhaps you use the screen refresh frequency to measure the time. So, I rebooted without startup-sequence and ran all the three tests blindly, redirecting the output to file; in all cases, the results has been: Code:
Elapsed: 2819 ms, data: 1000000 bytes, speed: 354,73 KB/s P.S. I won't be able to make further tests today. |
|
15 August 2021, 11:46 | #34 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
To have a fairly accurate time and not touch the standard timers I used the TOD-B which is locked to the horizontal frequency. Then all I do is make a difference and calculate the time with: divu.w #15625*256/1000,d0 ; d0=ms (pal_hfreq*scale_down/granularity) Of course it only works in PAL mode Therefore excellent, the $B register can be used to generate precise minimum delays of 1 E-Clock even on A1200, without worries, to be used as a sort of $1FE of the custom ones. |
|
15 August 2021, 13:43 | #35 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
|
|
15 August 2021, 14:15 | #36 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
I only need a minimum |
|
15 August 2021, 14:25 | #37 |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
There seems to be enough margin to miss a pulse or two when using VBLANK interrupt as the CD32 blue button (first in the pulse train) duration is 3 cycles through the shift register (the other buttons are 2).
http://gerdkautzmann.de/cd32gamepad/cd32gamepad.html Out of curiosity, are there any Amiga interrupts capable of triggering on pin 9 in the joystick port(s)? I assume that would be ideal, to get an interrupt when the CD32 button pulses are coming, deal with that, and then return. |
15 August 2021, 15:11 | #38 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
The pulse train is initiated and managed by the Amiga (and could be at any time during VBI). Quote:
|
||
15 August 2021, 15:11 | #39 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
|
15 August 2021, 15:22 | #40 |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Yes, my mistake, starting to remember now, so the Amiga activates pin 5 to get the data from CD32 controller shifted out on pin 9, right?
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Shift + F3 not working in ProTracker? | h0ffman | support.WinUAE | 4 | 06 February 2014 14:21 |
shift pattern | AGS | Coders. Asm / Hardware | 16 | 16 December 2013 21:27 |
Dead shift keys... | clownstyle | support.Hardware | 21 | 13 October 2013 22:30 |
Right Shift+Right Amiga works, but not Left shift+Left Amiga | Photon | support.WinUAE | 13 | 22 November 2010 21:43 |
Sound shift | mcferson | support.WinUAE | 26 | 15 October 2008 13:03 |
|
|