28 May 2010, 07:46 | #1 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
Chipram 3x faster?
I've read somewhere that accessing chipram when the screen is off (bitplanes=off), the speed is 3x faster? Is this correct? Is this valid for both reading/writing?
I guess the way to go then is obviously 1)Turn off the screen while rendering or 2) Access the screen during vblank? |
28 May 2010, 08:13 | #2 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
|
"It depends". The more DMA activity, the less memory access slots left for CPU.
Check the DMA allocation diagram in HRM. |
28 May 2010, 11:52 | #3 |
Registered User
Join Date: Oct 2009
Location: Germany
Posts: 3,307
|
MCP its "PowerSaver" function can switch off the Screen-DMA for faster Chip-Ram. Maybe you can try/check if that is true.
|
29 May 2010, 05:05 | #4 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
It's false advertising.
By default, if you're writing to chipram, and there is no DMA activity, you can write 5.5-7MB/s, depending on MMU setup and accelerator board (assuming AGA machine). Then, if you enable DMA activity, some slots will get stolen. A 320-pixels-wide LORES screen steals 25% of the memory access slots during the visible scanline, and then none of them during horizontal blanking. A 1280-pixels-wide SHRES screen steals 100% of the memory access slots during the visible scanline, and then none of them during horizontal blanking. So the concept behind this "oh look 3x faster than copyspeed" stuff is to make sure that the CPU is only doing chipmem accesses when bitplane DMA isn't active. But if you want to implement that you'd better first do the maths and decide whether the max theoretical framerate you will be getting is good enough, and whether the extra latency is worth it. |
29 May 2010, 13:18 | #5 |
Ya' like it Retr0?
Join Date: Jul 2005
Location: United Kingdom
Age: 49
Posts: 9,768
|
Wouldn't the DMA activty be effected by other custom chip activity, say like Paula ?
or are we specifically talking about LISA's DMA access to chip? |
29 May 2010, 14:16 | #6 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Only the even chipbus cycles can be used by the CPU. These are shared by CPU, blitter, copper, and also some of the bitplane DMA. The above reasoning is about CPU vs bitplane DMA.
Other things, such as audio & sprite DMA, run on odd cycles and therefore don't compete with the CPU about buscycles. |
29 May 2010, 14:50 | #7 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
|
That is not exactly right
68000 memory cycle is 4 CPU clocks (= 2 DMA bus cycles) 68000 bus cycle can be thought of 2 separate phases, address phase and data phase (simplified). During "address phase" Agnus latches the address from CPU _while_ Agnus can do normal DMA transfer ("odd"). During data phase data is finally transferred, chip bus is reserved for CPU. ("even") Usually CPU does use even cycles because sooner or later there is odd chip cycle that "syncronizes" CPU to even cycles. (until DMA needs even cycle or CPU executes instruction that uses 2 CPU cycle idle cycles) Nothing in Amiga is as simple as documentation says |
30 May 2010, 00:39 | #8 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Toni,
hmm. Please tell me if the below is correct: So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions? Let's say for instance that 100 CPU cycles = 1 DMA cycle. Every CPU instruction takes 1 CPU cycle. And that all DMA is turned off. The following code in a loop would thus take exactly 2 DMA cycles per iteration, right? Code:
move.l d0,(a0) ; write to chipmem REPT 199 nop ENDR address phase data phase And the following would take exactly 3 DMA cycles per iteration, right? Code:
move.l d0,(a0) ; write to chipmem REPT 299 nop ENDR address phase data phase idle Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be: Code:
CPU-induced activity DMA-induced activity address phase fetch bitplane data waiting fetch bitplane data waiting fetch bitplane data waiting fetch bitplane data ... ... data phase nothing Last edited by Kalms; 30 May 2010 at 01:08. |
30 May 2010, 10:13 | #9 | ||||
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
|
Quote:
Quote:
But lets assume word writes, CPU code is in superfast ram and no other DMA Yes, except because horizontal line has 227 cycles in PAL and 4 refresh cycles, there is always one "unaligned" cycle because of number of lines is odd. Quote:
Quote:
|
||||
30 May 2010, 12:28 | #10 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Thanks Toni.
Tying back to the original discussion, this means that the maximum sustainable speed by the CPU is 1 write every 2 DMA cycles, but 1) DMA activity can delay it 2) that DMA activity usually comes from blitter/copper/bitplanes 3) in the case discussed above, the main culprit is bitplane DMA When running in 8bpl LORES, bitplane DMA uses 8 cycles, then idles for 24, then uses the next 8, etc. This continues for 160 cycles; then 67 are not used by bitplane DMA (some of those go to refresh, sprites, audio etc). It's very difficult to adapt one's code to this (especially when taking account accelerators running at different speeds) and therefore the total effect will be that some CPU writes will get delayed up to 8 DMA cycles. Code that does a lot of chipwrites during the visible display portion will get delayed by ~15%. Similarly, when running 8bpl SHRES, *all* DMA cycles are taken by bitplane DMA during the visible display portion. This means, 160 DMA cycles are used by bitplane DMA, and then 67 are not (a few of those 67 will go to refresh, sprites, audio etc here, but let's not dwell on that). Code that does a lot of chipwrites during the visible display portion will get delayed by ~70%. |
20 July 2010, 02:13 | #11 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,643
|
Kalms, yes. (As you explain, there is only a "fight" over the half of the cycles that are open to both chipset and CPU.) I find the best way to think about it is to have your root point of view "inside the chipmem bus", as in "if I want to ride the bus, when will the roads be trafficjammed and by whom?" It's simplest on OCS where the MA cycles are documented quite well.
And exact memory access scheduling by AGA chipset has not been mapped or documented (AFAIK, which isn't much). But certainly it can be mapped, most easily with some simple routines on a stock A1200, I guess. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can it be faster? | oRBIT | Coders. General | 2 | 16 May 2011 20:38 |
More ChipRam than 8 MByte? | AmigaSurfer | support.WinUAE | 34 | 26 November 2010 00:29 |
A1200 FASTRAM Vs CHIPRAM | NovaCoder | Coders. General | 15 | 21 October 2009 22:37 |
2MB+ ChipRAM | AlfaRomeo | support.Hardware | 26 | 24 August 2008 19:53 |
Chipram faulty on A4000???? | keropi | support.Hardware | 6 | 15 January 2007 21:54 |
|
|