English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 March 2022, 16:50   #1
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Exact functioning of DDFSTRT & STOP?

For the sake of improving an emulator I have written, can anyone provide some exact details on how DDFSTRT and STOP are applied in hardware? I'm probably being a dunce in having to ask, but I thought better to be a dunce that knows the answer.

My current understanding is that bitplane fetching has a fixed correlation to horizontal cycle number, in OCS terms:

Code:
    if(high resolution) {
        plane_order[] = {3, 1, 2, 0}
        plane = plane_order[horizontal_position mod 4]
    } else {
        plane_order[] = {_, 3, 5, 1, _, 2, 4, 0}
        plane = plane_order[horizontal_position mod 8] 
    }
So I had assumed that DDFSTRT and STOP just set some sort of flag that is observed somewhere in that state machine, but I can't seem to hit upon an expression of that which produces the correct empirical results. Though there's always the suspicion that some other error in the emulator is defeating the correct behaviour here.

Can anyone just quickly explain the way the machine handles this bit of state?

Sorry for the question that's not exactly on topic.
TommoH is offline  
Old 14 March 2022, 17:40   #2
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
This isn't exactly how the hardware works, but briefly it's more like when HPOS==DDFSTRT it starts fetching data in the order you call "plane_order" (skipping planes that aren't enabled) to BPLxDAT. After write to BPL1DAT the data is passed on to temporary holding registers to and finally moved to the actual output shift registers when the hscroll values match.

When HPOS==DDFSTOP it starts the final fetch cycle (i.e. DDFSTOP doesn't actually mean stop right now!). It's during this final fetch cycle that BPLxMOD is added.

Obviously many more details to it, but compared to what you write the main difference is you want something more like:
plane = plane_order[(horizontal_position - DDFSTRT) mod 8]


BTW in case you're not already aware the author of the vAmiga emulator has a pretty extensive test suite that you may find useful.

If that's not what you're asking, maybe you could describe what issue you're seeing?
paraj is offline  
Old 14 March 2022, 18:16   #3
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by paraj View Post
When HPOS==DDFSTOP it starts the final fetch cycle (i.e. DDFSTOP doesn't actually mean stop right now!). It's during this final fetch cycle that BPLxMOD is added.
Okay, so to restate I guess it's more like at the start of each complete set of bitplanes a test is made against a properly-masked DDFSTRT and DDFSTOP; equality with the former starts collection, equality with the latter flags this up as the final set to be collected for now?

Quote:
Originally Posted by paraj View Post
Obviously many more details to it, but compared to what you write the main difference is you want something more like:
plane = plane_order[(horizontal_position - DDFSTRT) mod 8]
Given that one can specify DDFSTOP and DDFSTRT only down to H3 I don't think that subtraction adds anything. DDFSTART mod 8 is guaranteed to be zero. Indeed I'd guessed that that's exactly why DDFSTRT and STOP are constrained in their placement, but that's just another guess.

Quote:
Originally Posted by paraj View Post
BTW in case you're not already aware the author of the vAmiga emulator has a pretty extensive test suite that you may find useful.
Yep, that's likely to come in handy eventually. But because it's all in-machine tests and visually comparing results, I think it's mostly helpful for telling you when things still aren't right. In this case I already know that my implementation isn't right.
TommoH is offline  
Old 14 March 2022, 19:13   #4
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
Quote:
Originally Posted by TommoH View Post
DDFSTART mod 8 is guaranteed to be zero
No, it isn't necessarily true. DDFSTRT and STOP bit 2 exists. ECS Agnus added bit 1.

Bitplane index counter starts from zero when DDFSTRT matches. DDFSTRT "alignment" does not affect bitplane selection.
Toni Wilen is online now  
Old 14 March 2022, 19:22   #5
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by Toni Wilen View Post
No, it isn't necessarily true. DDFSTRT and STOP bit 2 exists. ECS Agnus added bit 1.

Bitplane index counter starts from zero when DDFSTRT matches. DDFSTRT "alignment" does not affect bitplane selection.
So then I've been misled by the HRM asserting that there is no bit 2 (as per e.g. the quote below for the benefit of the conversation, but obviously you're already aware). Fair enough.

Quote:
Originally Posted by HRM
DDFSTRT - Data-fetch Start (Beginning position for data fetch)
Bits 15-8 - not used
Bits 7-2 - pixel position H8-H3
Bit H3 only respected in HiRes Mode.
Bits 1-0 - not used
Is there any reason why the documentation might be wrong about this? Some situation in which the bit doesn't work, which made it easier to pretend it doesn't exist? Or just an ordinary omission rather than a motivated one?
TommoH is offline  
Old 14 March 2022, 19:33   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
Quote:
Originally Posted by TommoH View Post
Is there any reason why the documentation might be wrong about this? Some situation in which the bit doesn't work, which made it easier to pretend it doesn't exist? Or just an ordinary omission rather than a motivated one?
My guess is that it is not mentioned because "unaligned" DDFSTRT adds hidden offset to BPLCON1 (Because Denise shifter logic expects it to be "aligned").

Check "undocumented hardware stuff" thread. It should explained there, somewhere..
Toni Wilen is online now  
Old 14 March 2022, 19:36   #7
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
First off, sorry there are some (many) details I'm forgetting. I coded up a basic emulator that works for most cases with OCS last year and was going by notes/code
Quote:
Originally Posted by TommoH View Post
Okay, so to restate I guess it's more like at the start of each complete set of bitplanes a test is made against a properly-masked DDFSTRT and DDFSTOP; equality with the former starts collection, equality with the latter flags this up as the final set to be collected for now?
If you really want to dig into the HW details, the schematics are available at https://github.com/nonarkitten/amiga...ement_project/
I think the test is actually continuous, but this shouldn't matter for simple emulation. DDFSTRT passed means start actually doing bitplane DMA, output happens a bit later (0.5 CCK after BPL1 data has been read). DDFSTOP passed one more cycle then add modulos. 99% of software works fine if you just reverse the calculations in the hardware reference (with a caveat for HIRES) and ignore cycle exact stuff.
Quote:
Originally Posted by TommoH View Post
Given that one can specify DDFSTOP and DDFSTRT only down to H3 I don't think that subtraction adds anything. DDFSTART mod 8 is guaranteed to be zero. Indeed I'd guessed that that's exactly why DDFSTRT and STOP are constrained in their placement, but that's just another guess.
Yes, sorry, I was mostly looking at my code and not explaining in terms of what the HW does. As I recall there are some tricky bits to handling HIRES screens and scrolling, but unfortunately I don't recall what the issue was...
Quote:
Originally Posted by TommoH View Post
Yep, that's likely to come in handy eventually. But because it's all in-machine tests and visually comparing results, I think it's mostly helpful for telling you when things still aren't right. In this case I already know that my implementation isn't right.
Is it almost right except for a few cases or always off? My recommendation would be to get lowres screens (without scrolling) down pat, then scrolling for those and finally also hires. The KS2.x boot screens for example has HIRES+scroll.
paraj is offline  
Old 14 March 2022, 20:32   #8
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by paraj View Post
First off, sorry there are some (many) details I'm forgetting. I coded up a basic emulator that works for most cases with OCS last year and was going by notes/code
I'm incredibly grateful for anything, thanks! (And to Toni!)

Quote:
Originally Posted by paraj View Post
Is it almost right except for a few cases or always off? My recommendation would be to get lowres screens (without scrolling) down pat, then scrolling for those and finally also hires. The KS2.x boot screens for example has HIRES+scroll.
It depends on the test, I don't want to bore anyone with the case-by-case — the emulator is already complete and runs most things correctly, including proper native games like Robocod and both R-Types, I'm just aware of quite a few places where I've consciously done the expedient thing rather than the correct thing.

This is the main one where expedient meant "because I am aware that I don't know what the real test is" rather than to get to a first version more quickly (e.g. as per the current essentially-instant Blitter) or, I'm in sure in most cases, because I don't realise that I don't know something.
TommoH is offline  
Old 15 March 2022, 18:42   #9
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
Quote:
Originally Posted by TommoH View Post
This is the main one where expedient meant "because I am aware that I don't know what the real test is" rather than to get to a first version more quickly (e.g. as per the current essentially-instant Blitter) or, I'm in sure in most cases, because I don't realise that I don't know something.
If some (or even most) games work correctly you're already pretty far, and you should give some of the testcases in the test suite I linked a try (under Agnus/DDF). They usually work without everything having to be implemented, and it's much easier to see what's wrong when only one thing is tested and you have the source available. Certainly much easier than trying to figure out why a "real" program is misbehaving (but you've probably also already spent lots of time disassembling and comparing DMA usage in WinUAE).

Anyway good luck. Getting to say 95% correct was super exciting, but after that it just gets exponentially harder. Very rewarding when you finally figure it out, but I have maybe 1/10 of the patience and experience e.g. Toni has so, 96% is fine for me

P.S. For a fun one, see if you can get Razor 1911 - Voyage to load
paraj is offline  
Old 18 March 2022, 18:58   #10
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by paraj View Post
If some (or even most) games work correctly you're already pretty far, and you should give some of the testcases in the test suite I linked a try (under Agnus/DDF).
Yeah, I should get on top of that. I can't justify doing anything about my emulator's negligible performance — it's a straightforward implementation for now, so upon each 68000 bus interaction the chipset is advanced in lockstep, which literally just steps through the per-line DMA allocation as per the HRM and offers the relevant components their slots.

So program flow is all over the place, there are lots of very predictable conditionals, nothing is ever deferred even when it logically could be (e.g. bitplane collection for any period over which nothing modifies chip RAM), etc. I even have a PLL accumulating disk data, just because that's my standard code for that.

Essentially: I love causing cache misses and pipeline stalls.

These are probably the reasons why I waited until 2021/2 to give an emulator a go; with my level of insight I had to wait for processors to get very, very fast.

In theory I should optimise in the future when I think things are correct, but as per your 95% comment I don't really know what lower-than-strictly-correct threshold I'd want to apply to that...

Quote:
Originally Posted by paraj View Post
P.S. For a fun one, see if you can get Razor 1911 - Voyage to load
I'll bet I can't. But I'll find out...

EDIT: I definitely can't. The Amiga just constantly restarts. I dare imagine I'm still posting the incorrect stack contents for at least one exception. I made determining the proper PC upon an exception too difficult — I should have just latched it upon decode. I instead concluded that since it could always be determined from the next prefetch address and current location in the YACHT-obtained bus pattern, there was a greater risk in redundant storage. I made the wrong choice.

Last edited by TommoH; 18 March 2022 at 19:03.
TommoH is offline  
Old 18 March 2022, 19:30   #11
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
Quote:
Originally Posted by TommoH View Post
Essentially: I love causing cache misses and pipeline stalls.
FWIW I've done the same thing: Doing everything in lockstep (not disk access though, which has been a major headache. Some loaders are picky!). But I meant the testcases are useful for correctness. If I wanted speed (and more correct results) I'd use WinUAE anyway

Quote:
Originally Posted by TommoH View Post
I'll bet I can't. But I'll find out...

EDIT: I definitely can't. The Amiga just constantly restarts. I dare imagine I'm still posting the incorrect stack contents for at least one exception. I made determining the proper PC upon an exception too difficult — I should have just latched it upon decode. I instead concluded that since it could always be determined from the next prefetch address and current location in the YACHT-obtained bus pattern, there was a greater risk in redundant storage. I made the wrong choice.
The WinUAE source tree includes a "cputester" that can be used to get most corner cases correct (running the cycle exact tests requires fast ram support though, and don't enable slow ram at the same time!).


The Voyage loader is extra difficult in that it also requires mostly correct interrupt and CIA timing though in addition to handling of prefetches (for SMC). In case you might find them useful, I've attached the debugging notes I made while figuring it out (note: some details might be wrong, they're just unedited notes after all).
Attached Files
File Type: txt voyage.txt (19.4 KB, 71 views)
paraj is offline  
Old 18 March 2022, 21:19   #12
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by paraj View Post
The WinUAE source tree includes a "cputester" that can be used to get most corner cases correct (running the cycle exact tests requires fast ram support though, and don't enable slow ram at the same time!).
I don't actually have any access to Windows, so haven't referred to UAE-anything. But it looks like the vAmiga test suite you linked to includes the ADFs so that need no longer be an obstacle.

To provide a little personal history: I wrote the 68000 back in 2019 and have already used it in Macintosh and Atari ST emulators, but at the time I wrote it there weren't any test suites whatsoever. So I'm sure there are still issues lurking.

Quote:
Originally Posted by paraj View Post
The Voyage loader is extra difficult in that it also requires mostly correct interrupt and CIA timing though in addition to handling of prefetches (for SMC). In case you might find them useful, I've attached the debugging notes I made while figuring it out (note: some details might be wrong, they're just unedited notes after all).
I tested my CIA against Wolfgang Lorenz's set, but I think there's still one failure. If memory serves, it was something that appears to be contrary to his own documentation of the CIA state machine so I declined to pursue it at the time, not being fully aware of the correct answer. Also, his tests are actually for the 6526 so the Amiga's much more sensible binary time-of-day counter doesn't have any test coverage. But most of my CIA should be generally correct.

This is, of course, a common refrain of emulator authors about to be disappointed.
TommoH is offline  
Old 18 March 2022, 22:20   #13
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by TommoH View Post
EDIT: I definitely can't. The Amiga just constantly restarts. I dare imagine I'm still posting the incorrect stack contents for at least one exception. I made determining the proper PC upon an exception too difficult — I should have just latched it upon decode. I instead concluded that since it could always be determined from the next prefetch address and current location in the YACHT-obtained bus pattern, there was a greater risk in redundant storage. I made the wrong choice.
The boot code is protected and changes registers in the CIA interrupt to calculate the decryption keys for the trace vector decoder. The emulator must feature 100% exact CPU and CIA timing as otherwise the decryption won't be done correctly.

As I wrote on Pouet some years ago, this demo is a very good emulator test case.
StingRay is offline  
Old 20 March 2022, 18:10   #14
TommoH
Registered User
 
Join Date: Jul 2021
Location: New York
Posts: 12
Quote:
Originally Posted by StingRay View Post
The boot code is protected and changes registers in the CIA interrupt to calculate the decryption keys for the trace vector decoder. The emulator must feature 100% exact CPU and CIA timing as otherwise the decryption won't be done correctly.

As I wrote on Pouet some years ago, this demo is a very good emulator test case.
That does suggest a somewhat more compelling theory as to what the problem might be.

The CIA was cycle accurate but for the one failed test when tested against Wolfgang Lorenz's test suite, which originally inhabited a C64 so there's only the much-simpler timing of the 6502 on the other side of things.

My 68000 is implemented according to the YACHT cycle-counting document, but I notice there's a new version of that out with some corrections and, in any case, I've not yet done the work to verify my implementation.

So I'll wager there's something off in my 68000 microprograms, or in my handling of VPA, or in some other Chipset functionality that should have caused different CPU delay than was actually applied.

i.e. I think I've quite a task ahead of me.
TommoH is offline  
Old 20 March 2022, 19:27   #15
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
I saw StingRay's comment while pulling my hair out when implementing it myself, and that's also what inspired me to suggest it I'm pretty sure you don't need 100% correct CIA emulation though, but you can't be very far off. Your interrupt timing (and priorities) also needs to be cycle accurate. Also remember that accessing CIA registers involves syncing with the E-clock (10th of 7MHz clock), not the custom registers.

I debugged it by running my emulator in lock step with WinUAE and seeing where it diverged. For a "quick" check you can put a breakpoint at $40146 and check the contents of the array at $70000..$70258. It should match what WinUAE has (it's in my debug notes).
paraj is offline  
Old 04 July 2023, 12:15   #16
Rock'n Roll
German Translator
 
Rock'n Roll's Avatar
 
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
Two questions to DDFSTRT and DIWSTRT.
1. Why is Bitplane Datafetch in WinUAE Debugger not like HRM DMA Time Slot Diagram?
(Datafetch doesn't finished with BPL1 on Slot $40)
2. We calculate and define horizontal DIWSTRT with $81.
What means the sentence from the HRM DMA Time Slot Allocation Diagram:
Five clocks must occur before the data fetched for a particular position can appear on-screen. For example, if data fetch start
is $38, data will not be available for display until ciock number $45. It is available at $45 because display processing does not
begin until all of the bit-planes for a particular pixel have been fetched.
For me, this statement doesn't match to the horizontal DIWSTRT with $81.

Code:
WinUAE Debugger:
 0006a518: 008e 2c81            ;  DIWSTRT := 0x2c81	
 0006a51c: 0090 2cc1            ;  DIWSTOP := 0x2cc1
 0006a520: 0092 0038            ;  DDFSTRT := 0x0038
 0006a524: 0094 00d0            ;  DDFSTOP := 0x00d0

 [38  56]  [39  57]  [3A  58]  [3B  59]  [3C  60]  [3D  61]  [3E  62]  [3F  63]
                                                   BPL4 116  BPL6 11A  BPL2 112
 0                                                     0000      0000      FF00
                                                   0007554C  00077D4C  00072D4C
 BE608A00  BE608C00  BE608E00  BE609000  BE609200  BE609400  BE609600  BE609800

 [40  64]  [41  65]  [42  66]  [43  67]  [44  68]  [45  69]  [46  70]  [47  71]
           BPL3 114  BPL5 118  BPL1 110            BPL4 116  BPL6 11A  BPL2 112
               0000      0000      FFFF                0000      0000      0000
           0006DD4C  0007054C  0006B54C            0007554E  00077D4E  00072D4E
 BE609A00  BE609C00  BE609E00  BE60A000  BE60A200  BE60A400  BE60A600  BE60A800

($81/2)-8,5=$38

$38 - DDFSTRT
$39 - 
$3A - BPL4
$3B - BPL6 
$3C - BPL2
$3D - 
$3E - BPL3
$3F - BPL5 
$40 - BPL1 - Datafetch completed for cycle $38 
$41 - Display or 0,5CCK later is DIWSTRT:= HH $81
Rock'n Roll is offline  
Old 04 July 2023, 12:43   #17
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,510
HRM lies. Infamous "..This timing chart has been 'adjusted' to match those requirements" text next to the timing DMA time slot allocation diagram.

HRM tries to explain that bitplane DMA starts instantly when DDFSTRT matches by "adjusting" horizontal cycle numbers. (For example mysterious refresh slot at "-1" is actually slot 3) HRM added 4 cycle offset to make this diagram simpler. WinUAE also used wrong DMA slot numbers until 4.9, I think.

When DDFSTRT matches, it takes 4 cycles before first bitplane DMA access is visible due to internal pipelining and DDFSTRT DMA started decision delay before bitplane sequencer gets enable signal.
Toni Wilen is online now  
Old 04 July 2023, 13:51   #18
Rock'n Roll
German Translator
 
Rock'n Roll's Avatar
 
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
At some point we need a new complete WinUAE/HRM reference manual with all corrections and descriptions also to undocumented features etc.

Who will start writing?
Rock'n Roll is offline  
Old 04 July 2023, 21:05   #19
Rock'n Roll
German Translator
 
Rock'n Roll's Avatar
 
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 183
If I open the window horizontally with DIWSTRT $81, the first pixel is at position $81. (corresponds to Copper $40+0.5CCK).
With Datafetch at $38 I start fetching the data and finish the fetch with BPL1.
How can I display something that can't actually be there yet? (BPL1 at $43)

If I set DDFSTRT to $38 and DDFSTOP to $38 I would expect only one datafetch to occur.
However, the datafetch is done until the end of the scanline?
Rock'n Roll is offline  
Old 04 July 2023, 21:31   #20
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Rock'n Roll View Post
If I open the window horizontally with DIWSTRT $81, the first pixel is at position $81. (corresponds to Copper $40+0.5CCK).
With Datafetch at $38 I start fetching the data and finish the fetch with BPL1.
How can I display something that can't actually be there yet? (BPL1 at $43)
The horizontal part of the DIW registers is in Denise, not in Agnus.
The two counters, in addition to not having the same granularity, are not perfectly synchronized.
Denise's counter is initialized to 2, at the beginning of the line, by the strobe that occurs on cycle 3 of Agnus.
And this is probably why HRM lies, trying to make people think that the 2 chips are always synchronized at the internal counter level..

(new) WinUAE DMA debugger always displays Agnus cycles.

Quote:
If I set DDFSTRT to $38 and DDFSTOP to $38 I would expect only one datafetch to occur.
However, the datafetch is done until the end of the scanline?
At least two fetch cycles are performed.
So with a DDFSTOP so low (relative to the STRT) you don't have a match, and not a stop for the fetches
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiga 1200 right mouse button not functioning AmazingAmiga support.Hardware 4 09 February 2020 15:26
OCS + DDFSTRT=$30 - Losing spr6? Antiriad_UK Coders. Asm / Hardware 5 18 December 2019 14:43
diwstrt, ddfstrt and hires leonard Coders. Asm / Hardware 6 02 December 2019 00:38
7th sprite corrupt with DDFSTRT of 0x30 FSizzle Coders. Asm / Hardware 9 11 November 2017 17:36
Worms: The Director's Cut no longer functioning under WinUAE [2.3.0] squirminator2k support.WinUAE 13 12 October 2010 17:48

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 20:12.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10496 seconds with 16 queries