14 March 2022, 16:50 | #1 |
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Exact functioning of DDFSTRT & STOP?
For the sake of improving an emulator I have written, can anyone provide some exact details on how DDFSTRT and STOP are applied in hardware? I'm probably being a dunce in having to ask, but I thought better to be a dunce that knows the answer.
My current understanding is that bitplane fetching has a fixed correlation to horizontal cycle number, in OCS terms: Code:
if(high resolution) { plane_order[] = {3, 1, 2, 0} plane = plane_order[horizontal_position mod 4] } else { plane_order[] = {_, 3, 5, 1, _, 2, 4, 0} plane = plane_order[horizontal_position mod 8] } Can anyone just quickly explain the way the machine handles this bit of state? Sorry for the question that's not exactly on topic. |
14 March 2022, 17:40 | #2 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
This isn't exactly how the hardware works, but briefly it's more like when HPOS==DDFSTRT it starts fetching data in the order you call "plane_order" (skipping planes that aren't enabled) to BPLxDAT. After write to BPL1DAT the data is passed on to temporary holding registers to and finally moved to the actual output shift registers when the hscroll values match.
When HPOS==DDFSTOP it starts the final fetch cycle (i.e. DDFSTOP doesn't actually mean stop right now!). It's during this final fetch cycle that BPLxMOD is added. Obviously many more details to it, but compared to what you write the main difference is you want something more like: plane = plane_order[(horizontal_position - DDFSTRT) mod 8] BTW in case you're not already aware the author of the vAmiga emulator has a pretty extensive test suite that you may find useful. If that's not what you're asking, maybe you could describe what issue you're seeing? |
14 March 2022, 18:16 | #3 | |||
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
Quote:
Quote:
|
|||
14 March 2022, 19:13 | #4 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
|
|
14 March 2022, 19:22 | #5 | ||
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
Quote:
|
||
14 March 2022, 19:33 | #6 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
|
Quote:
Check "undocumented hardware stuff" thread. It should explained there, somewhere.. |
|
14 March 2022, 19:36 | #7 | ||
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
First off, sorry there are some (many) details I'm forgetting. I coded up a basic emulator that works for most cases with OCS last year and was going by notes/code
Quote:
I think the test is actually continuous, but this shouldn't matter for simple emulation. DDFSTRT passed means start actually doing bitplane DMA, output happens a bit later (0.5 CCK after BPL1 data has been read). DDFSTOP passed one more cycle then add modulos. 99% of software works fine if you just reverse the calculations in the hardware reference (with a caveat for HIRES) and ignore cycle exact stuff. Quote:
Is it almost right except for a few cases or always off? My recommendation would be to get lowres screens (without scrolling) down pat, then scrolling for those and finally also hires. The KS2.x boot screens for example has HIRES+scroll. |
||
14 March 2022, 20:32 | #8 | ||
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
Quote:
This is the main one where expedient meant "because I am aware that I don't know what the real test is" rather than to get to a first version more quickly (e.g. as per the current essentially-instant Blitter) or, I'm in sure in most cases, because I don't realise that I don't know something. |
||
15 March 2022, 18:42 | #9 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
Quote:
Anyway good luck. Getting to say 95% correct was super exciting, but after that it just gets exponentially harder. Very rewarding when you finally figure it out, but I have maybe 1/10 of the patience and experience e.g. Toni has so, 96% is fine for me P.S. For a fun one, see if you can get Razor 1911 - Voyage to load |
|
18 March 2022, 18:58 | #10 | ||
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
So program flow is all over the place, there are lots of very predictable conditionals, nothing is ever deferred even when it logically could be (e.g. bitplane collection for any period over which nothing modifies chip RAM), etc. I even have a PLL accumulating disk data, just because that's my standard code for that. Essentially: I love causing cache misses and pipeline stalls. These are probably the reasons why I waited until 2021/2 to give an emulator a go; with my level of insight I had to wait for processors to get very, very fast. In theory I should optimise in the future when I think things are correct, but as per your 95% comment I don't really know what lower-than-strictly-correct threshold I'd want to apply to that... Quote:
EDIT: I definitely can't. The Amiga just constantly restarts. I dare imagine I'm still posting the incorrect stack contents for at least one exception. I made determining the proper PC upon an exception too difficult — I should have just latched it upon decode. I instead concluded that since it could always be determined from the next prefetch address and current location in the YACHT-obtained bus pattern, there was a greater risk in redundant storage. I made the wrong choice. Last edited by TommoH; 18 March 2022 at 19:03. |
||
18 March 2022, 19:30 | #11 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
FWIW I've done the same thing: Doing everything in lockstep (not disk access though, which has been a major headache. Some loaders are picky!). But I meant the testcases are useful for correctness. If I wanted speed (and more correct results) I'd use WinUAE anyway
Quote:
The Voyage loader is extra difficult in that it also requires mostly correct interrupt and CIA timing though in addition to handling of prefetches (for SMC). In case you might find them useful, I've attached the debugging notes I made while figuring it out (note: some details might be wrong, they're just unedited notes after all). |
|
18 March 2022, 21:19 | #12 | ||
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
To provide a little personal history: I wrote the 68000 back in 2019 and have already used it in Macintosh and Atari ST emulators, but at the time I wrote it there weren't any test suites whatsoever. So I'm sure there are still issues lurking. Quote:
This is, of course, a common refrain of emulator authors about to be disappointed. |
||
18 March 2022, 22:20 | #13 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
As I wrote on Pouet some years ago, this demo is a very good emulator test case. |
|
20 March 2022, 18:10 | #14 | |
Registered User
Join Date: Jul 2021
Location: New York
Posts: 12
|
Quote:
The CIA was cycle accurate but for the one failed test when tested against Wolfgang Lorenz's test suite, which originally inhabited a C64 so there's only the much-simpler timing of the 6502 on the other side of things. My 68000 is implemented according to the YACHT cycle-counting document, but I notice there's a new version of that out with some corrections and, in any case, I've not yet done the work to verify my implementation. So I'll wager there's something off in my 68000 microprograms, or in my handling of VPA, or in some other Chipset functionality that should have caused different CPU delay than was actually applied. i.e. I think I've quite a task ahead of me. |
|
20 March 2022, 19:27 | #15 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
I saw StingRay's comment while pulling my hair out when implementing it myself, and that's also what inspired me to suggest it I'm pretty sure you don't need 100% correct CIA emulation though, but you can't be very far off. Your interrupt timing (and priorities) also needs to be cycle accurate. Also remember that accessing CIA registers involves syncing with the E-clock (10th of 7MHz clock), not the custom registers.
I debugged it by running my emulator in lock step with WinUAE and seeing where it diverged. For a "quick" check you can put a breakpoint at $40146 and check the contents of the array at $70000..$70258. It should match what WinUAE has (it's in my debug notes). |
04 July 2023, 12:15 | #16 |
German Translator
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
|
Two questions to DDFSTRT and DIWSTRT.
1. Why is Bitplane Datafetch in WinUAE Debugger not like HRM DMA Time Slot Diagram? (Datafetch doesn't finished with BPL1 on Slot $40) 2. We calculate and define horizontal DIWSTRT with $81. What means the sentence from the HRM DMA Time Slot Allocation Diagram: Five clocks must occur before the data fetched for a particular position can appear on-screen. For example, if data fetch start is $38, data will not be available for display until ciock number $45. It is available at $45 because display processing does not begin until all of the bit-planes for a particular pixel have been fetched. For me, this statement doesn't match to the horizontal DIWSTRT with $81. Code:
WinUAE Debugger: 0006a518: 008e 2c81 ; DIWSTRT := 0x2c81 0006a51c: 0090 2cc1 ; DIWSTOP := 0x2cc1 0006a520: 0092 0038 ; DDFSTRT := 0x0038 0006a524: 0094 00d0 ; DDFSTOP := 0x00d0 [38 56] [39 57] [3A 58] [3B 59] [3C 60] [3D 61] [3E 62] [3F 63] BPL4 116 BPL6 11A BPL2 112 0 0000 0000 FF00 0007554C 00077D4C 00072D4C BE608A00 BE608C00 BE608E00 BE609000 BE609200 BE609400 BE609600 BE609800 [40 64] [41 65] [42 66] [43 67] [44 68] [45 69] [46 70] [47 71] BPL3 114 BPL5 118 BPL1 110 BPL4 116 BPL6 11A BPL2 112 0000 0000 FFFF 0000 0000 0000 0006DD4C 0007054C 0006B54C 0007554E 00077D4E 00072D4E BE609A00 BE609C00 BE609E00 BE60A000 BE60A200 BE60A400 BE60A600 BE60A800 ($81/2)-8,5=$38 $38 - DDFSTRT $39 - $3A - BPL4 $3B - BPL6 $3C - BPL2 $3D - $3E - BPL3 $3F - BPL5 $40 - BPL1 - Datafetch completed for cycle $38 $41 - Display or 0,5CCK later is DIWSTRT:= HH $81 |
04 July 2023, 12:43 | #17 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
|
HRM lies. Infamous "..This timing chart has been 'adjusted' to match those requirements" text next to the timing DMA time slot allocation diagram.
HRM tries to explain that bitplane DMA starts instantly when DDFSTRT matches by "adjusting" horizontal cycle numbers. (For example mysterious refresh slot at "-1" is actually slot 3) HRM added 4 cycle offset to make this diagram simpler. WinUAE also used wrong DMA slot numbers until 4.9, I think. When DDFSTRT matches, it takes 4 cycles before first bitplane DMA access is visible due to internal pipelining and DDFSTRT DMA started decision delay before bitplane sequencer gets enable signal. |
04 July 2023, 13:51 | #18 |
German Translator
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
|
At some point we need a new complete WinUAE/HRM reference manual with all corrections and descriptions also to undocumented features etc.
Who will start writing? |
04 July 2023, 21:05 | #19 |
German Translator
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
|
If I open the window horizontally with DIWSTRT $81, the first pixel is at position $81. (corresponds to Copper $40+0.5CCK).
With Datafetch at $38 I start fetching the data and finish the fetch with BPL1. How can I display something that can't actually be there yet? (BPL1 at $43) If I set DDFSTRT to $38 and DDFSTOP to $38 I would expect only one datafetch to occur. However, the datafetch is done until the end of the scanline? |
04 July 2023, 21:31 | #20 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
The two counters, in addition to not having the same granularity, are not perfectly synchronized. Denise's counter is initialized to 2, at the beginning of the line, by the strobe that occurs on cycle 3 of Agnus. And this is probably why HRM lies, trying to make people think that the 2 chips are always synchronized at the internal counter level.. (new) WinUAE DMA debugger always displays Agnus cycles. Quote:
So with a DDFSTOP so low (relative to the STRT) you don't have a match, and not a stop for the fetches |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Amiga 1200 right mouse button not functioning | AmazingAmiga | support.Hardware | 4 | 09 February 2020 15:26 |
OCS + DDFSTRT=$30 - Losing spr6? | Antiriad_UK | Coders. Asm / Hardware | 5 | 18 December 2019 14:43 |
diwstrt, ddfstrt and hires | leonard | Coders. Asm / Hardware | 6 | 02 December 2019 00:38 |
7th sprite corrupt with DDFSTRT of 0x30 | FSizzle | Coders. Asm / Hardware | 9 | 11 November 2017 17:36 |
Worms: The Director's Cut no longer functioning under WinUAE [2.3.0] | squirminator2k | support.WinUAE | 13 | 12 October 2010 17:48 |
|
|