Chipram 3x faster?

oRBIT · 28 May 2010, 07:46

I've read somewhere that accessing chipram when the screen is off (bitplanes=off), the speed is 3x faster? Is this correct? Is this valid for both reading/writing?
I guess the way to go then is obviously 1)Turn off the screen while rendering

or 2) Access the screen during vblank?

Toni Wilen · 28 May 2010, 08:13

"It depends". The more DMA activity, the less memory access slots left for CPU.

Check the DMA allocation diagram in HRM.

daxb · 28 May 2010, 11:52

MCP its "PowerSaver" function can switch off the Screen-DMA for faster Chip-Ram. Maybe you can try/check if that is true.

Kalms · 29 May 2010, 05:05

It's false advertising.

By default, if you're writing to chipram, and there is no DMA activity, you can write 5.5-7MB/s, depending on MMU setup and accelerator board (assuming AGA machine).

Then, if you enable DMA activity, some slots will get stolen. A 320-pixels-wide LORES screen steals 25% of the memory access slots during the visible scanline, and then none of them during horizontal blanking.

A 1280-pixels-wide SHRES screen steals 100% of the memory access slots during the visible scanline, and then none of them during horizontal blanking.

So the concept behind this "oh look 3x faster than copyspeed" stuff is to make sure that the CPU is only doing chipmem accesses when bitplane DMA isn't active.

But if you want to implement that you'd better first do the maths and decide whether the max theoretical framerate you will be getting is good enough, and whether the extra latency is worth it.

Zetr0 · 29 May 2010, 13:18

Wouldn't the DMA activty be effected by other custom chip activity, say like Paula ?

or are we specifically talking about LISA's DMA access to chip?

Kalms · 29 May 2010, 14:16

Only the even chipbus cycles can be used by the CPU. These are shared by CPU, blitter, copper, and also some of the bitplane DMA. The above reasoning is about CPU vs bitplane DMA.
Other things, such as audio & sprite DMA, run on odd cycles and therefore don't compete with the CPU about buscycles.

Toni Wilen · 29 May 2010, 14:50

Quote:

Originally Posted by Kalms

Only the even chipbus cycles can be used by the CPU.

That is not exactly right

68000 memory cycle is 4 CPU clocks (= 2 DMA bus cycles)

68000 bus cycle can be thought of 2 separate phases, address phase and data phase (simplified). During "address phase" Agnus latches the address from CPU _while_ Agnus can do normal DMA transfer ("odd"). During data phase data is finally transferred, chip bus is reserved for CPU. ("even")

Usually CPU does use even cycles because sooner or later there is odd chip cycle that "syncronizes" CPU to even cycles. (until DMA needs even cycle or CPU executes instruction that uses 2 CPU cycle idle cycles)

Nothing in Amiga is as simple as documentation says

Kalms · 30 May 2010, 00:39

Toni,

hmm. Please tell me if the below is correct:

So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions?

Let's say for instance that 100 CPU cycles = 1 DMA cycle. Every CPU instruction takes 1 CPU cycle. And that all DMA is turned off.

The following code in a loop would thus take exactly 2 DMA cycles per iteration, right?

Code:

    move.l  d0,(a0)    ; write to chipmem
    REPT    199
    nop
    ENDR

so the pattern on the chipbus would be:
address phase
data phase

And the following would take exactly 3 DMA cycles per iteration, right?

Code:

    move.l  d0,(a0)    ; write to chipmem
    REPT    299
    nop
    ENDR

and the pattern on the chipbus would be:
address phase
data phase
idle

Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be:

Code:

    CPU-induced activity   DMA-induced activity

    address phase          fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    ...                    ...
    data phase             nothing

That is; the address phase will execute immediately, but the data phase will get delayed until there is a free chipbus cycle.

Toni Wilen · 30 May 2010, 10:13

Quote:

Originally Posted by Kalms

So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions?

Perhaps Supraturbo can be used for testing

(I still want one!)

Quote:

The following code in a loop would thus take exactly 2 DMA cycles per iteration, right?

Code:

    move.l  d0,(a0)    ; write to chipmem
    REPT    199
    nop
    ENDR

I can't say yes because of long write. (requires AGA or A3000, I don't know how these exactly work, for example CPU memory access cycles are different between 68000/010 and later models)

But lets assume word writes, CPU code is in superfast ram and no other DMA

Yes, except because horizontal line has 227 cycles in PAL and 4 refresh cycles, there is always one "unaligned" cycle because of number of lines is odd.

Quote:

And the following would take exactly 3 DMA cycles per iteration, right?

Code:

    move.l  d0,(a0)    ; write to chipmem
    REPT    299
    nop
    ENDR

I think so

Quote:

Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be:

Code:

    CPU-induced activity   DMA-induced activity

    address phase          fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    ...                    ...
    data phase             nothing

That is; the address phase will execute immediately, but the data phase will get delayed until there is a free chipbus cycle.

Yes, this can happen, at least on an A500. CPU is tricked by Agnus and Gary to think that memory is really slow, CPU happily waits until "memory is ready" signal comes

Kalms · 30 May 2010, 12:28

Thanks Toni.

Tying back to the original discussion, this means that the maximum sustainable speed by the CPU is 1 write every 2 DMA cycles, but
1) DMA activity can delay it
2) that DMA activity usually comes from blitter/copper/bitplanes
3) in the case discussed above, the main culprit is bitplane DMA

When running in 8bpl LORES, bitplane DMA uses 8 cycles, then idles for 24, then uses the next 8, etc. This continues for 160 cycles; then 67 are not used by bitplane DMA (some of those go to refresh, sprites, audio etc). It's very difficult to adapt one's code to this (especially when taking account accelerators running at different speeds) and therefore the total effect will be that some CPU writes will get delayed up to 8 DMA cycles. Code that does a lot of chipwrites during the visible display portion will get delayed by ~15%.

Similarly, when running 8bpl SHRES, *all* DMA cycles are taken by bitplane DMA during the visible display portion. This means, 160 DMA cycles are used by bitplane DMA, and then 67 are not (a few of those 67 will go to refresh, sprites, audio etc here, but let's not dwell on that). Code that does a lot of chipwrites during the visible display portion will get delayed by ~70%.

Photon · 20 July 2010, 02:13

Kalms, yes. (As you explain, there is only a "fight" over the half of the cycles that are open to both chipset and CPU.) I find the best way to think about it is to have your root point of view "inside the chipmem bus", as in "if I want to ride the bus, when will the roads be trafficjammed and by whom?" It's simplest on OCS where the MA cycles are documented quite well.

And exact memory access scheduling by AGA chipset has not been mapped or documented (AFAIK, which isn't much). But certainly it can be mapped, most easily with some simple routines on a stock A1200, I guess.

28 May 2010, 07:46	#1
oRBIT Zone Friend Join Date: Apr 2006 Location: Gothenburg/Sweden Age: 48 Posts: 344	Chipram 3x faster? I've read somewhere that accessing chipram when the screen is off (bitplanes=off), the speed is 3x faster? Is this correct? Is this valid for both reading/writing? I guess the way to go then is obviously 1)Turn off the screen while rendering or 2) Access the screen during vblank?

30 May 2010, 00:39	#8
Kalms Registered User Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 237	Toni, hmm. Please tell me if the below is correct: So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions? Let's say for instance that 100 CPU cycles = 1 DMA cycle. Every CPU instruction takes 1 CPU cycle. And that all DMA is turned off. The following code in a loop would thus take exactly 2 DMA cycles per iteration, right? Code: move.l d0,(a0) ; write to chipmem REPT 199 nop ENDR so the pattern on the chipbus would be: address phase data phase And the following would take exactly 3 DMA cycles per iteration, right? Code: move.l d0,(a0) ; write to chipmem REPT 299 nop ENDR and the pattern on the chipbus would be: address phase data phase idle Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be: Code: CPU-induced activity DMA-induced activity address phase fetch bitplane data waiting fetch bitplane data waiting fetch bitplane data waiting fetch bitplane data ... ... data phase nothing That is; the address phase will execute immediately, but the data phase will get delayed until there is a free chipbus cycle. Last edited by Kalms; 30 May 2010 at 01:08.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can it be faster?	oRBIT	Coders. General	2	16 May 2011 20:38
More ChipRam than 8 MByte?	AmigaSurfer	support.WinUAE	34	26 November 2010 00:29
A1200 FASTRAM Vs CHIPRAM	NovaCoder	Coders. General	15	21 October 2009 22:37
2MB+ ChipRAM	AlfaRomeo	support.Hardware	26	24 August 2008 19:53
Chipram faulty on A4000????	keropi	support.Hardware	6	15 January 2007 21:54

28 May 2010, 08:13	#2
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,545	"It depends". The more DMA activity, the less memory access slots left for CPU. Check the DMA allocation diagram in HRM.

28 May 2010, 11:52	#3
daxb Registered User Join Date: Oct 2009 Location: Germany Posts: 3,307	MCP its "PowerSaver" function can switch off the Screen-DMA for faster Chip-Ram. Maybe you can try/check if that is true.

29 May 2010, 05:05	#4
Kalms Registered User Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 237	It's false advertising. By default, if you're writing to chipram, and there is no DMA activity, you can write 5.5-7MB/s, depending on MMU setup and accelerator board (assuming AGA machine). Then, if you enable DMA activity, some slots will get stolen. A 320-pixels-wide LORES screen steals 25% of the memory access slots during the visible scanline, and then none of them during horizontal blanking. A 1280-pixels-wide SHRES screen steals 100% of the memory access slots during the visible scanline, and then none of them during horizontal blanking. So the concept behind this "oh look 3x faster than copyspeed" stuff is to make sure that the CPU is only doing chipmem accesses when bitplane DMA isn't active. But if you want to implement that you'd better first do the maths and decide whether the max theoretical framerate you will be getting is good enough, and whether the extra latency is worth it.

29 May 2010, 13:18	#5
Zetr0 Ya' like it Retr0? Join Date: Jul 2005 Location: United Kingdom Age: 49 Posts: 9,768	Wouldn't the DMA activty be effected by other custom chip activity, say like Paula ? or are we specifically talking about LISA's DMA access to chip?

29 May 2010, 14:16	#6
Kalms Registered User Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 237	Only the even chipbus cycles can be used by the CPU. These are shared by CPU, blitter, copper, and also some of the bitplane DMA. The above reasoning is about CPU vs bitplane DMA. Other things, such as audio & sprite DMA, run on odd cycles and therefore don't compete with the CPU about buscycles.

30 May 2010, 12:28	#10
Kalms Registered User Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 237	Thanks Toni. Tying back to the original discussion, this means that the maximum sustainable speed by the CPU is 1 write every 2 DMA cycles, but 1) DMA activity can delay it 2) that DMA activity usually comes from blitter/copper/bitplanes 3) in the case discussed above, the main culprit is bitplane DMA When running in 8bpl LORES, bitplane DMA uses 8 cycles, then idles for 24, then uses the next 8, etc. This continues for 160 cycles; then 67 are not used by bitplane DMA (some of those go to refresh, sprites, audio etc). It's very difficult to adapt one's code to this (especially when taking account accelerators running at different speeds) and therefore the total effect will be that some CPU writes will get delayed up to 8 DMA cycles. Code that does a lot of chipwrites during the visible display portion will get delayed by ~15%. Similarly, when running 8bpl SHRES, all DMA cycles are taken by bitplane DMA during the visible display portion. This means, 160 DMA cycles are used by bitplane DMA, and then 67 are not (a few of those 67 will go to refresh, sprites, audio etc here, but let's not dwell on that). Code that does a lot of chipwrites during the visible display portion will get delayed by ~70%.

20 July 2010, 02:13	#11
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,643	Kalms, yes. (As you explain, there is only a "fight" over the half of the cycles that are open to both chipset and CPU.) I find the best way to think about it is to have your root point of view "inside the chipmem bus", as in "if I want to ride the bus, when will the roads be trafficjammed and by whom?" It's simplest on OCS where the MA cycles are documented quite well. And exact memory access scheduling by AGA chipset has not been mapped or documented (AFAIK, which isn't much). But certainly it can be mapped, most easily with some simple routines on a stock A1200, I guess.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)