24 May 2022, 03:33 | #1 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Chunky True Color 4 pixels
Hi Amiga coders,
I was thinking of something and wanted to see if this would be technically doable. Please rectify me at any steps, should I made any errors or miscalculations. The Copper can Move a color registers in 8 clock cycles when 4 or less bitplanes are enabled. (i.e.: If I am not mistaken the Amiga cpu clock speed matches a lowres pixel duration?) It appears that even when turning off bitplane DMA, the copper wont go faster than 8 pixels per color change: This seems to indicate the copper fetches its two 16-bit instructions words one at a time with 2 clock cycle "internal processing" in between, maybe like so: Code:
read-work-read-Move-... 0 1 2 3 4 5 7 8 With that in mind, I thought that by interleaving copper and 68000 both setting color registers, it should be possible to have a 4 pixels wide chunky full color screen running in 0 bitplane. (However the only way I found so far to emit one word to color #0 in only 8 clock cycles is move d0,(a0), which implies preloading the cpu registers. The dma timing would look like this: Code:
Copper: read-work-read-Move CPU: read-Move-read-work clock: 0 1 2 3 4 5 7 8 (note: The cpu doesn't appear to align perfectly at every scanline even with interrupt turned off: maybe something precise needs to be taken into account, for example one 'nop' every other scaneline perhaps due to alternating horizontal line length, I am not sure at this point). Also note that this is assumed to be all running from chip ram. If cpu is running off fast ram, then technically bitplane dma could be still running up to 4 bitplanes without problem. Last edited by remz; 24 May 2022 at 03:42. Reason: Typo in title |
24 May 2022, 09:56 | #2 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
The following thread discusses color changes using cpu: http://eab.abime.net/showthread.php?t=110394
|
24 May 2022, 10:05 | #3 | |
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
The advantage of a chunky display is to reduce the number of RAM accesses. So for the CPU to write a pixel on a normal 16 colour Amiga planar display it requires at least 4 separate RAM writes (and possibly some reads as well) not to mention masking and shifting (though 68k Bitwise operations mitigate this somewhat). With a chunky display the pixel can be written with a single RAM write (or perhaps a read and a write in the case of a 4 bit framebuffer where you might need to mask half of the byte you aren't writing to). I don't see how changing the colour palette multiple times per scanine using both the CPU and the Copper helps in this situation |
|
24 May 2022, 10:32 | #4 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Considering a maximum preload of 15 registers (with a7=$dff180) and using 17 changes with copper (the first and last in the line), you would have a 128 wide pixels chunky 'screen' (15+17)*4, not extended to a full view. Quote:
That said, as an 'academic' problem it's interesting, but there are other ways to make chunky displays that are more usable. |
||
24 May 2022, 11:01 | #5 | |
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
|
|
24 May 2022, 11:57 | #6 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
He is trying to directly display a 12-bit true color buffer on the screen. The buffer itself is not linear (or double), because it contains the even pixels on one side for the copper, and the odd ones on the other for the cpu. This of course also leads to the problem of rendering to this buffer(s).. |
|
24 May 2022, 19:31 | #7 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Yes I was intrigued about the "technical possibility" more than its real-life usefulness, as you both mentionned the memory layout would be irksome and timings complicated.
However with fast ram and 16 colors (4 bitplanes), possibly changing 80 colors per scanline could be intriguing. A sort of "UltraDynamic HighColor" mode? |
24 May 2022, 21:02 | #8 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
With fast ram available it probably doesn't make sense to involve the copper at all (at least for changing colors).
|
24 May 2022, 23:14 | #9 |
Registered User
Join Date: Jun 2020
Location: Brno
Posts: 90
|
I'm afraid that using CPU to fetch colors to Denise (plus rather uncomfortable color buffer as some colors are set by CPU and others by Copper) is too restrictive.
This is very nice writing about good old classic copper chunky (on OCS using 7bpl bug): https://eab.abime.net/showthread.php?t=107015 |
25 May 2022, 00:04 | #10 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Yes that video of 57 copper chunky trick was very inspiring.
With code running off Fast ram, is it possible to fully saturate the chip ram bus completely just with the 680x0 cpu? If I read correctly, maximum chip ram bandwidth is 7.15MB/sec? This would mean being able to set one word every two pixels? (meaning a potential 160 pixels true color mode? I tried it in WinUAE but I didn't manage to get smaller than 4 pixel wide. [edit] Thinking about it, setting a color register has nothing to do with chip ram: It is direct access to Denise, so it doesn't have any dma bandwidth restriction. Do someone know if the display hardward is fetching color registers at every pixels during a scanline? Maybe there is a limit in there too. Last edited by remz; 25 May 2022 at 02:11. Reason: Adding precision about chip ram bandwidth |
25 May 2022, 09:38 | #11 |
Registered User
Join Date: Jun 2020
Location: Brno
Posts: 90
|
There are 8 bus cycles per 16 lo-res pixels. The bus arbitration allows CPU to access every second bus cycle only (*if cycle is available). Hence 4 pixels by CPU, at best. You must check cycle counts for your particular processor (and its operating frequency) if it is able to utilize every available bus cycle. Therefore it is very configuration dependent.
(P.S.: Custom registers access (i.e. custom-chips) happens through the bus = all chip-ram access restrictions apply.) |
25 May 2022, 10:11 | #12 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,505
|
It is not possible for CPU to access chip ram (or chipset bus) every cycle. All chipset variants have same interleaved CPU access timing: first cycle is used to transfer address to Agnus/Alice (this cycle is always free for chipset DMA), second cycle is used to transfer data.
Fast CPUs waste lots of cycles doing nothing when accessing chip bus. EDIT: 7M/s is possible if chip ram bus is 32-bit (A3000 or AGA) Last edited by Toni Wilen; 25 May 2022 at 10:24. |
25 May 2022, 23:23 | #13 | |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Quote:
You mean even with all DMA off, the CPU cannot uses all bus cycle? For example, if I tried to MOVEM 64 bytes to set the whole 32 color palette as fast as possible, the MOVEM itself when done on chip ram would not be 14+4*32 = 142 clock cycles to set 64 bytes? (i.e.: one color per 2 lo-res pixel?) Toni: What you are saying is interesting for the Amiga 3000 32-bit chip ram: basically I would be inclined to say the Amiga 3000 could be running as an "almost AGA" speed: with 32-bit chip ram access, and fast ram, would the CPU be able to set sprites and colors potentially 4 times faster than copper? This could open the door for massive ECS sprites by recycling them during a scanline by the CPU instead of the copper Oh I am tempted to try it |
|
25 May 2022, 23:46 | #14 | ||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
|
Quote:
This half-internal, half-bus split is also why the 68000 on an OCS/ECS system isn't really slowed down by bitplane DMA until you go to 5 bitplanes lowres or 3 bitplanes hires. Note however that this explanation is slightly simplified. For one, the CPU can access any cycle on the Chip Memory bus that isn't in use by DMA, it just can't access two cycles back to back. Quote:
Might be hard to time correctly though and I don't actually know if the A3000 has static Chip RAM access speeds or that they are CPU dependent. On A1200 at least many CPU cards don't get full bandwidth when accessing Chip RAM, this might also be the case on the A3000? Edit: the above text was replaced, it erroneously referred to speed differences between the 68000/OCS and 32 bit Chip RAM speeds instead of Copper vs. CPU on 32 bit ECS/AGA. Last edited by roondar; 25 May 2022 at 23:53. Reason: Misunderstood the post I replied to. |
||
26 May 2022, 00:58 | #15 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Can you however interleave Copper and CPU to saturate the chip ram bandwidth if bitplane dma is turned off?
The problem that I expect with the copper is that it itself runs off chip ram: so any Move operation costs two word-fetches. Please correct me if I'm wrong, but the Copper writing to a custom register, is it "using the bus"? From what I understand so far, it seems not: That would mean copper can write to any custom registers (even to other chips like Denise and Paula) "for free"? |
26 May 2022, 08:03 | #16 | |
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
-Edit- I don’t call that “for free”, as all Copper instructions use the same two cycles. Last edited by bloodline; 26 May 2022 at 08:21. |
|
26 May 2022, 10:13 | #17 | |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
|
Quote:
Or they can be accessed faster (1 CPU clock or 1 chipset bus)? Last edited by Cyprian; 26 May 2022 at 10:49. |
|
26 May 2022, 10:14 | #18 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,505
|
Quote:
32-wide chip RAM (A3000 or AGA): CPU can read or write 32-bit word (if address is 32-bit aligned) every second chipset cycle. Custom registers (all chipsets): CPU can read or write 16-bit word every second chipset cycle. BPLxDAT and SPRxDAT are also only 16-bit wide from CPU point of view. Only DMA can do AGA 32-bit or 2x32-bit transfers. CPU can use any free chipset cycle but CPU chipset bus access will always take 2 chipset cycles to complete. (Note that this is from chipset point of view, CPU/accelerator board can have write buffer(s) that can improve performance noticeably) |
|
26 May 2022, 10:52 | #19 |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
|
thanks for clarification
Last edited by Cyprian; 26 May 2022 at 12:12. |
26 May 2022, 13:21 | #20 | |
Registered User
Join Date: Jun 2020
Location: Brno
Posts: 90
|
Quote:
I though that the bus controller (Agnus?) strictly allows CPU to access even numbered cycles only (if available). The DMA time slot allocation diagram in HRM suggests that . |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Color Saturation and Color Tint/Hue | Retro-Nerd | support.WinUAE | 22 | 02 August 2018 10:38 |
Poland in pixels | s2325 | Nostalgia & memories | 3 | 05 May 2014 22:38 |
Printing in color with WinUAE on color laser | source | support.Apps | 7 | 14 April 2013 00:32 |
Déjà Vu: A Nightmare Comes True | alkis21 | project.Killergorilla's WHD packs | 12 | 02 September 2012 18:49 |
ISO true color to 256 color algorithm | Lord Riton | Coders. General | 19 | 15 April 2011 17:49 |
|
|