26 May 2022, 13:37 | #21 | |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Quote:
A CopMove takes 8 clock cycles, which requires 2 DMA slots (interleaved), a bit like: fetch-wait-fetch-wait 0 1 2 3 4 5 6 7 But somewhere in there, the custom destination register is getting set: would you know when is this happening exactly? The 'free' part I was implying is that compared to the 68000, which requires one DMA slot to actually write to a custom register, the Copper appears to do this 'for free', or else a CopMove would require 3 slots? [Edit]: Pondering about it, maybe I am thinking of the Copper too much like a general CPU: Maybe the way it works is more like: - First Word - Decode: instruction is a Move To Custom Register At Address xxxx: prepare destination for copy - Second Word - Value to Copy: Transfer Value directly into Destination: This operation is apparently special and unique to Copper because it allows direct copy of a Word from chip ram unto a custom Register in just one single dma cycle. CPU cannot do that, nor Blitter (Blitter cannot because custom registers are out of range). Could that make more sense? Last edited by remz; 26 May 2022 at 14:30. Reason: Additional thoughts on copper |
|
26 May 2022, 14:33 | #22 | ||
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
|
Quote:
Quote:
|
||
27 May 2022, 20:51 | #23 | |
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
If the first instruction word is a valid Chip register address then the copper just loads the second instruction word value directly into that valid register address. That’s your 2 cycles, it really is that simple. If the first instruction word is an odd value (all Chip registers are even), then the second instruction word is used to establish if the copper is going to wait or skip at a certain beam position (the first instruction word is then treated as a bit mask for the position). -Edit- Also remember that the copper only has bus access on odd cycles, so each move operation takes at least 4 cycles! Last edited by bloodline; 28 May 2022 at 10:11. |
|
28 May 2022, 03:52 | #24 | |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Quote:
As a comparison from what I understood so far, to move a 16-word to a 16-bit custom register, CPU running off Chip needs to: - Read Address from Chip Ram (1 dma cycle) - Read Value from Chip Ram (1 dma cycle) - Write Value to Custom Register (1 dma cycle) 3 cycles required, whereas Copper does that in 2 cycles. Please correct me if I am wrong, but so far I deduced the following facts: Assuming stock A500 with 512KB Chip Ram, running in NTSC at 7159090 clock/sec. At 59.94Hz, this yields 455 clock cycle per horizontal scanline. DMA bus runs at half that rate, so 227.5 dma-cycle per scanline, which is reduced to 226 in practice. Copper uses ever only Odd cycles, and takes 8 clock cycles to perform a 16-bit move. Maximum throughput is thus: 1.7MB/sec Blitter can use any cycles (even or odd), and is extremely efficient when copying: It can theoretically saturate the bus and copy 3.5MB/sec (using A/D channels). Blitter is however not as efficient when filling/clearing: still 3.5MB/sec, so it waste half the DMA bus doing nothing. CPU, like blitter, can also use any dma cycles (odd or even), however is not as fast for copy. I think with movem.w, it can reach slightly less than half the speed of the blitter with approx 1.4MB/sec. *(I think that movem.l would not be significantly faster because the chip bus width is 16-bit?). One thing that I don't understand is that if Copper only runs on Odd cycles, does it mean that during a 4-bitplane lo-res screen where Display takes 80 odd cycles, Copper cannot do anything? Also another question I didn't find a clear answer in the HRM: If Display runs in 1-bitplane mode, does Display DMA only take 20 odd cycles? Ah if only Commodore could have decided to add even just 64KB of fast ram on all Amigas, it would have made a world of difference. Or even more versatile: how about making the chip ram/fast ram frontier software-programmable? A game could decide for example to assign 256KB of ram, and 256KB for fast ram. Even the CD32 would have been almost twice as fast with that kind of flexibility. |
|
28 May 2022, 10:42 | #25 | ||||
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
To follow your model, the Custom Chip registers are the Copper's registers. The copper cannot write to RAM, if it needs to do so, then it needs to do that with the blitter, by setting the blitter registers. Quote:
1bit graphics only uses one of those slots per fetch cycle, which leaves 7 slots free per fetch cycle. 2bit graphics uses two of those slots per fetch cycle, which leaves 6 slots free per fetch cycle... etc. The most slots used in lores mode per fetch cycle is 6 slots (EHB and HAM), leaving 2 free. In hires, there are 8 DMA slots per 32 pixels. 1bit graphics uses 2 DMA slots per fetch cycle. So 4bit graphics in hires mode does completely saturate the bus for the duration of the biplane fetch per scanline. Thus very few games use hires mode Quote:
Quote:
DRAM controller, and obviously more motherboard space, clearly the cost was prohibitive for Commodore's management. As an aside, once I realised that the A1200 didn't use Alice for Chipram DRAM refreshes, but instead had a Budgie chip, it was clear to me that Commodore had totally lost the plot... why not use Alice for Chipram, and Budgie for Fastram? Then the A1200 could have had 1meg Chipram and 1 meg Fastram. -Edit- I also think the Amiga graphics system should have been feature frozen at ECS, and a new chunky system used for 8bit graphics onwards. But hindsight is always always very clear, especially for an "armchair engineer" like me Last edited by bloodline; 28 May 2022 at 11:11. |
||||
28 May 2022, 13:35 | #26 | ||
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Quote:
Quote:
So my question is when using 4 bitplanes lo-res: bitplane dma uses all odd cycles because 4-2-3-1 are interleaved: Does that mean Copper is completely stopped during the 320 visible pixels portion? Since only Blitter and CPU are able to utilize even cycles? |
||
28 May 2022, 14:37 | #27 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
The Blitter and the CPU have lower priority, so if requested by the Copper they can be used. |
|
28 May 2022, 15:43 | #28 | |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
Quote:
Copper is using odd, or even cycles? |
|
28 May 2022, 15:57 | #29 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Odd cycles are for many predefined channels. But it is only a matter of definition, in fact the unreadable cycles during the refresh ones are even HPOS values, so you can easily invert them, the final effect does not change. EDIT: For example, here is how WinUAE indicates them; as you can see the Copper cycles are even: Code:
[00 0] [01 1] [02 2] [03 3] [04 4] [05 5] [06 6] [07 7] RFS0 038 COP 1FE RFS1 1FE COP 08C RFS2 1FE The cycles are realigned at the end of the video line (or at the beginning if you prefer, again it's just a matter of definition), in case the total number per line is odd (like in PAL, or alternating lines in NTSC). And this 'realignment' allows the Copper cycles to always be even. I hope I have not made you more confused than before Last edited by ross; 28 May 2022 at 16:38. |
|
28 May 2022, 18:24 | #30 |
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Definitely go with whatever Ross and Toni say, they have tested real hardware, where I just go by the HRM which is notoriously incorrect in places. Even cycles would make more sense
|
03 June 2022, 04:42 | #31 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 138
|
You all are extremely helpful
So if I attempt to recap, please inform me if my statements are correct: Assuming a stock Amiga 500 with 512KB chip ram, running a lowres screen in 6bpp, NTSC or PAL doesn't matter, with interrupt disabled: Excluding the short DMAs like disk, audio, ram refresh, for the sake of simplicity: - During VBlank, even and odd cycles are free: One possible usage to maximize the DMA usage could be having Copper using Even cycles, while CPU can run at full speed on the Odd cycles. Another possible usage could be using Blitter, which can run at full speed using all cycles, with an option (blithog) to let CPU run once every 3 DMA: Such setting is the 'best pipelining' achievable since CPU spends half its clock cycles on DMA, and the other half on internal instruction execution: it means that CPU borrowing 1 DMA cycle every 3 cycles will slow down the blitter slightly, but essentially yields more effective 'work per clock'. - During horizontal blank: Copper can be used on Even cycles at full speed to setup sprites and stuff, and CPU and/or blitter can also use the Odd Cycles to do a bit of work - During display portion: Display DMA takes all Odd cycles (planes 1 to 4), and borrows half the Even cycles for the planes 5 and 6, leaving 40 cycles free. Copper can use all of those for example to change colors or reposition a few sprites. CPU would essentially be completely idle during this part. One possible way to make CPU parallelize work even during fully saturate chip bus could be doing a few mul or a div instructions. With some careful timing, the CPU could be doing almost 400 div or 1200 mul per frame essentially "for free" while the DMA is completely used by display & copper. All at the same time, 4 channel audio could be playing with less than 2% dma performance cost, and maybe even reading of a disk with a 1% dma performance cost (although I have never did a disk reading routine; it is possible that interrupts would take a much larger toll on CPU to handle copying buffers, etc. Also if I calculate correctly, having sprite DMA active will cost 7% of dma, but just during the visible scanlines which are about 76%~82% of the total time, so real cost of sprites is approximately 5.5% per frame. |
08 June 2022, 13:04 | #32 | ||||||
Registered User
Join Date: Jan 2017
Location: London, UK
Posts: 433
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Color Saturation and Color Tint/Hue | Retro-Nerd | support.WinUAE | 22 | 02 August 2018 10:38 |
Poland in pixels | s2325 | Nostalgia & memories | 3 | 05 May 2014 22:38 |
Printing in color with WinUAE on color laser | source | support.Apps | 7 | 14 April 2013 00:32 |
Déjà Vu: A Nightmare Comes True | alkis21 | project.Killergorilla's WHD packs | 12 | 02 September 2012 18:49 |
ISO true color to 256 color algorithm | Lord Riton | Coders. General | 19 | 15 April 2011 17:49 |
|
|