English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   Undocumented Amiga hardware stuff (https://eab.abime.net/showthread.php?t=19676)

Toni Wilen 24 September 2008 09:34

OCS-only "scanline" effect. (discovered in other theads, documenting here)

DDFSTRT < 0x18 = every other line is blanked.

My theory:

Bitplane state machine state can only change sequentially. Bitplane state machine states are something like:

1: inactive
2: "passed 0x18" (more like passed 0x14 but it isn't important)
3: "passed ddfstrt (=active)"
4: "passed ddfstop"

Normal horizontal line, DDFSTRT >= 0x18. 1 -> 2 -> 3 -> 4 -> 1

DDFSTRT < 0x18:

first line: 1 -> 2 (1->3 can't happen)
second line: 2 -> 3 -> 4 -> 1 (normal visible line)
third line: 1 -> 2 (another blank line)
and so on..

(I guess this was supposed to prevent < 0x18 DDFSTRT but implementation was 100% correct)

Note that display can get quite messed up (or system can even crash) if DDFSTRT is too small because bitplane dma starts conflicting with refresh slots.

ECS/AGA fixes this "bug".

Photon 14 October 2008 23:42

I think this is a normal behavior of the vdu (IC) timing vs video timing. At least if I remember page 190 of HRM correctly.

Toni Wilen 20 October 2008 11:18

Quote:

Originally Posted by Photon (Post 468149)
I think this is a normal behavior of the vdu (IC) timing vs video timing. At least if I remember page 190 of HRM correctly.

What chapter? Different HRM revisions have different page numbers :)

Perhaps it is documented somewhere but it has nothing to do with timing, Denise could simply inhibit sprite parallel to serial conventers after DDFSTOP but I am quite sure Denise does not know anything about DDFSTRT or DDFSTOP, it most likely waits for first BPL1DAT write (by Agnus) which then starts display processing circuitry until next hsync.

Toni Wilen 01 December 2008 19:48

OCS/ECS-only "7-plane" mode.

If BPLCON0 is set to 7 planes following interesting things happen:

- Agnus DMA sequencer uses 4 planes mode
- Denise shows all 6 planes (can be EHB,HAM or DPF, all work)

BPL5DAT and BPL6DAT are "static" (16-bit pattern repeats) 5th and 6th plane pattern. (of course copper can also be used to update the pattern)

This trick was used in intro First Anniversary by Lazy Bones (perhaps only one program using it..) BPL5DAT was used for vertical lines. Normally you can't have >4 planes without slowing down the copper.

Perhaps it can also be used for some weird EHB/HAM tricks :)

dlfrsilver 01 December 2008 21:16

great ! Toni, can this technic be used for games or it is specifically made for programs
like demos ?

StingRay 15 December 2008 12:28

Quote:

Originally Posted by Toni Wilen (Post 482689)
This trick was used in intro First Anniversary by Lazy Bones (perhaps only one program using it..) BPL5DAT was used for vertical lines. Normally you can't have >4 planes without slowing down the copper.

I think this intro by Brainwalker [hi mate, if you should ever happen to read this: Greetings =)] uses the same feature (for the analyzer). Maybe you can check Toni? :)

Toni Wilen 15 December 2008 17:11

Quote:

Originally Posted by StingRay (Post 487947)
I think this intro by Brainwalker [hi mate, if you should ever happen to read this: Greetings =)] uses the same feature (for the analyzer). Maybe you can check Toni? :)

Yes, it does use "7 planes" for "noise" effect. 2 programs so far, maybe there are more..

(download link is broken)

StingRay 15 December 2008 17:18

Quote:

Originally Posted by Toni Wilen (Post 488062)
Yes, it does use "7 planes" for "noise" effect. 2 programs so far, maybe there are more..

(download link is broken)


Thanks for checking! :) And sorry that I didn't check the download link, I've fixed it now, thanks for that info too! :)

Toni Wilen 24 December 2008 21:44

"Not really undocumented but if you want to make A1000-only program very easily"

A1000: 00D80000 - 00DDFFFF is custom chip mirror
Other models: RTC + non mapped space. Never custom chips.

Original/nonmodifed game Hacker uses (accidentally?) 0xDDF00A to read mouse counters. This only works on A1000 :)

(note that WinUAE 1.4+ properly maps this area which can look like Hacker got broken but in reality it was "broken" previously..)

Toni Wilen 02 August 2009 13:04

More blitter stuff:

All blitter cycles require free bus cycle (=not used by any other Agnus DMA channel), including blitter idle cycles. (which can become free cycle for CPU if CPU needs bus cycles)

Following blitter channel combinations add extra idle cycle if fill mode is enabled: 1, 5, 9 and D (for example normal AD copy cycle diagram becomes AD-)

Switching from "extra cycle fill mode" to non-extra cycle mode (fill or not) when blitter is already active: blitter stops. (there is one demo part that only works due to this feature, it has buggy blitter wait..) EDIT: blitter can also stop when changing bltcon0 while blit is active.

Starting blitter: 2 blitter idle cycles needed before normal blitter cycle diagram starts.

Copper writes to custom registers happen 1 cycle later than CPU writes.

FrenchShark 03 September 2009 06:37

Quote:

Originally Posted by Toni Wilen (Post 579567)
More blitter stuff:

All blitter cycles require free bus cycle (=not used by any other Agnus DMA channel), including blitter idle cycles. (which can become free cycle for CPU if CPU needs bus cycles)

Following blitter channel combinations add extra idle cycle if fill mode is enabled: 1, 5, 9 and D (for example normal AD copy cycle diagram becomes AD-)

Switching from "extra cycle fill mode" to non-extra cycle mode (fill or not) when blitter is already active: blitter stops. (there is one demo part that only works due to this feature, it has buggy blitter wait..) EDIT: blitter can also stop when changing bltcon0 while blit is active.

After implementing the blitter in VHDL and studying the DMA sequences and the remarks from Toni, I think I get most of it :
- Sources A and B datapaths need two cycles to reach the minterm block because of the 32-bit shifter.
- Source C datapath needs just one cycle to reach the minterm block.
- Source A DMA cycle is always present, if USEA = 0, it becomes an idle cycle. During this cycle, the minterm from the previous data is computed, that's the reason why the blitter put an idle cycle if source A DMA is off.

These 3 remarks are enough to explain the "Blitter Speed" rules in the HRM.

A and D only : 2 cycles
cycle #1 : read A(0)
cycle #2 : D idle + mask & shift A(0)
cycle #3 : read A(1) + minterm(0)
cycle #4 : write D(0) + mask & shift A(1)
etc...

B always add one cycle : because of the second cycle needed for its shifter.

C or D is free : because it is during the second cycle needed by A to mask and shift.

I do not explain the fill mode "bug" yet, the only extra cycle that would make sense is for AD. It must be A, idle, D (if we suppose that the blitter needs an extra "fill mode" cycle after the "minterm" cycle occuring during A)

Quote:

Starting blitter: 2 blitter idle cycles needed before normal blitter cycle diagram starts.
I bet it is 3 idle cycles if USEA=0 and 2 idle cycles if USEA=1.
Blitter cycle sequences #1 to 7 in the HRM should be shifted by one cycle to the right to show the first empty A cycle.

Toni, tell me what you think.

Regards,

Frederic

Toni Wilen 03 September 2009 14:34

Quote:

After implementing the blitter in VHDL and studying the DMA sequences and the remarks from Toni, I think I get most of it :
- Sources A and B datapaths need two cycles to reach the minterm block because of the 32-bit shifter.
- Source C datapath needs just one cycle to reach the minterm block.
I haven't thought about internal logic too much but this seems perfectly logical.

Quote:

I do not explain the fill mode "bug" yet, the only extra cycle that would make sense is for AD. It must be A, idle, D (if we suppose that the blitter needs an extra "fill mode" cycle after the "minterm" cycle occuring during A)
I am quite sure it is AD- (A--AD-...AD--D)
(I guess the fill logic happens in "second cycle" in blitter pipeline)

Quote:

I bet it is 3 idle cycles if USEA=0 and 2 idle cycles if USEA=1.
Yes. I meant 2 idle cycles that are always at start, not counting possible first "missing A" cycle. ("missing A" I counted as a part of "real" blitter cycle because it repeats)

EDIT: do you think 2 idle cycles at start are related to "preloading" A/B shifts or something?

FrenchShark 04 September 2009 03:32

Quote:

Originally Posted by Toni Wilen (Post 591219)
I haven't thought about internal logic too much but this seems perfectly logical.



I am quite sure it is AD- (A--AD-...AD--D)
(I guess the fill logic happens in "second cycle" in blitter pipeline)

That's crazy: that means "minterm" cycle has to move to the idle cycle before A and the "fill logic" cycle is hapenning during A. This is the only way the data can be available to the D cycle after A.
The other explanation is that this is a real silicon bug and that minterm + fill logic are done in one cycle anyway.

Quote:

Yes. I meant 2 idle cycles that are always at start, not counting possible first "missing A" cycle. ("missing A" I counted as a part of "real" blitter cycle because it repeats)
Ok, so we are in phase here.

Quote:

EDIT: do you think 2 idle cycles at start are related to "preloading" A/B shifts or something?
That's funny, in my design, the ASH and BSH values are actually "expanded" to a 16-bit multiplying value since my 32-bit shifters are done with HW multipliers. It would be a waste of resources for an ASIC but it saves a bunch of cells on an FPGA.
I also use these 2 cycles to initialize the width and height counters and to load some flags like USEx (because line mode has to force them to 1011, no matter what the BLTCON0 value is).

Regards,

Frederic

Toni Wilen 04 September 2009 08:07

Quote:

Originally Posted by FrenchShark (Post 591504)
initialize the width and height counters and to load some flags like USEx (because line mode has to force them to 1011, no matter what the BLTCON0 value is).

I wouldn't do this because there are at least one demo that clears C-channel BLTCON0 bit, writes to BLTSIZE and expects nothing to be drawn :)

EDIT:

(I talked about this in some beta thread): blitter seems to change to (currently) unexplained cycle sequences (or even stop) if USEx flags or fill mode on/off change and blitter is already running. Testing all combinations could reveal blitter's internal cycle sequence system. Unfortunately this would be extremely boring.. (32 * 32 tests, 16 channel combinations + fill mode on/off)

FrenchShark 05 September 2009 02:52

Quote:

Originally Posted by Toni Wilen (Post 591526)
I wouldn't do this because there are at least one demo that clears C-channel BLTCON0 bit, writes to BLTSIZE and expects nothing to be drawn :)

EDIT:

(I talked about this in some beta thread): blitter seems to change to (currently) unexplained cycle sequences (or even stop) if USEx flags or fill mode on/off change and blitter is already running. Testing all combinations could reveal blitter's internal cycle sequence system. Unfortunately this would be extremely boring.. (32 * 32 tests, 16 channel combinations + fill mode on/off)

Well, for one demo, I do not really care. I am even wondering if I will implement this extra fill mode cycle. The Minimig does not do it and it still nice to be faster than the original. Usually it does not hurt.

For example, the VHDL implementation of Capcom's 1943 that I have has a Z80 running at 24MHz (the original runs at 6MHz). Only the instructions fetches are slowed down to 6MHz. As a result, the animation is smooth all the time even with the maximum number of sprites (128 total).

Regards,

Frederic

redmonlee 05 September 2009 04:41

Thanks for your sharing. Thanks for sharing this useful information. It's great.




property internet management software project document asset This internet management software is perfect for your document and web company property internet management software project document asset

Photon 07 September 2009 23:24

Quote:

Originally Posted by FrenchShark (Post 591811)
Well, for one demo, I do not really care. I am even wondering if I will implement this extra fill mode cycle. The Minimig does not do it and it still nice to be faster than the original. Usually it does not hurt.

For example, the VHDL implementation of Capcom's 1943 that I have has a Z80 running at 24MHz (the original runs at 6MHz). Only the instructions fetches are slowed down to 6MHz. As a result, the animation is smooth all the time even with the maximum number of sprites (128 total).

Regards,

Frederic

I think sacrificing compatibility is a bad idea. You don't know it's just for one demo. There are plenty of platforms that "run Amiga faster than any Amiga". And even more games/demos/apps that are incompatible with all versions of the real thing. :) But that was usually future-incompatibility with machines that didn't exist yet. Now all original Amigas have been manufactured, so you can look back and choose the model you like.

So if you know the cause and it doesn't require complex patching, why not make it proper? After release missing features might be requested and then you might have to make patches instead of a simpler solution perhaps.

I'm a bit passionate about this subject so sorry :) I'd love to see any Amiga model 100% re-made in off-the-shelf components. Will yours use buffers for I/O, video, audio?

kamelito 07 September 2009 23:38

Maximum compatibility is mandatory
 
Hi,

I agree with Photon, compatibility is key, take for instance the Clone-A project the is aim of individual computers is to be 100% compatible.
So posting things like that is like shooting in your foots :)
I'll wait for Clone-A then, I'm not in a hurry, the only problem is that Clone-A is just targeting the Amiga.

kml

Quote:

Originally Posted by Photon (Post 592477)
I think sacrificing compatibility is a bad idea. You don't know it's just for one demo. There are plenty of platforms that "run Amiga faster than any Amiga". And even more games/demos/apps that are incompatible with all versions of the real thing. :) But that was usually future-incompatibility with machines that didn't exist yet. Now all original Amigas have been manufactured, so you can look back and choose the model you like.

So if you know the cause and it doesn't require complex patching, why not make it proper? After release missing features might be requested and then you might have to make patches instead of a simpler solution perhaps.

I'm a bit passionate about this subject so sorry :) I'd love to see any Amiga model 100% re-made in off-the-shelf components. Will yours use buffers for I/O, video, audio?


Photon 08 September 2009 00:26

Well, he hadn't decided yet, as I read it. While it's pretty darn hard to get every single cycle timing right, it's very important for the chipset, and especially for Amiga as we know... CPU speed and memory timings/wait-states will affect compatibility in ways that Amiga users are well aquainted with and for which solutions exist. Chipset - nope.

Maybe a solution could be to attempt an exact replica in cycle timings for the A4000 chipset & chipmem timing, and then have a non cycle exact option (or a factor so at least the whole chipset is in "sync")? And CPU and memory could run as fast as they can.

Remaining problems are then "superfastcopper" vs display and CPU vs display cycle timings (or non-DMA audio). The former for things like fullscreen sprite backgrounds and the latter for things like c2p modes, misc "extra-HAM modes" etc.

FrenchShark 09 September 2009 15:50

Quote:

Originally Posted by Photon (Post 592490)
Well, he hadn't decided yet, as I read it. While it's pretty darn hard to get every single cycle timing right, it's very important for the chipset, and especially for Amiga as we know... CPU speed and memory timings/wait-states will affect compatibility in ways that Amiga users are well aquainted with and for which solutions exist. Chipset - nope.

Maybe a solution could be to attempt an exact replica in cycle timings for the A4000 chipset & chipmem timing, and then have a non cycle exact option (or a factor so at least the whole chipset is in "sync")? And CPU and memory could run as fast as they can.

Remaining problems are then "superfastcopper" vs display and CPU vs display cycle timings (or non-DMA audio). The former for things like fullscreen sprite backgrounds and the latter for things like c2p modes, misc "extra-HAM modes" etc.

I plan to be able to set the copper speed to 3.5, 7, 14 or 28 MHz.
Along with the MOVEM instruction, this will give you a way to get high-color chunky mode with the copper (despite the faster Chip RAM, in hi-res doublescan, you will still eat up all the cycles).

Regards,

Frederic


All times are GMT +2. The time now is 21:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.06040 seconds with 11 queries