English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 28 May 2010, 07:46   #1
oRBIT
Zone Friend
 
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
Chipram 3x faster?

I've read somewhere that accessing chipram when the screen is off (bitplanes=off), the speed is 3x faster? Is this correct? Is this valid for both reading/writing?
I guess the way to go then is obviously 1)Turn off the screen while rendering or 2) Access the screen during vblank?
oRBIT is offline  
Old 28 May 2010, 08:13   #2
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
"It depends". The more DMA activity, the less memory access slots left for CPU.

Check the DMA allocation diagram in HRM.
Toni Wilen is offline  
Old 28 May 2010, 11:52   #3
daxb
Registered User
 
Join Date: Oct 2009
Location: Germany
Posts: 3,307
MCP its "PowerSaver" function can switch off the Screen-DMA for faster Chip-Ram. Maybe you can try/check if that is true.
daxb is offline  
Old 29 May 2010, 05:05   #4
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
It's false advertising.

By default, if you're writing to chipram, and there is no DMA activity, you can write 5.5-7MB/s, depending on MMU setup and accelerator board (assuming AGA machine).

Then, if you enable DMA activity, some slots will get stolen. A 320-pixels-wide LORES screen steals 25% of the memory access slots during the visible scanline, and then none of them during horizontal blanking.

A 1280-pixels-wide SHRES screen steals 100% of the memory access slots during the visible scanline, and then none of them during horizontal blanking.

So the concept behind this "oh look 3x faster than copyspeed" stuff is to make sure that the CPU is only doing chipmem accesses when bitplane DMA isn't active.

But if you want to implement that you'd better first do the maths and decide whether the max theoretical framerate you will be getting is good enough, and whether the extra latency is worth it.
Kalms is offline  
Old 29 May 2010, 13:18   #5
Zetr0
Ya' like it Retr0?
 
Zetr0's Avatar
 
Join Date: Jul 2005
Location: United Kingdom
Age: 49
Posts: 9,768
Wouldn't the DMA activty be effected by other custom chip activity, say like Paula ?

or are we specifically talking about LISA's DMA access to chip?
Zetr0 is offline  
Old 29 May 2010, 14:16   #6
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
Only the even chipbus cycles can be used by the CPU. These are shared by CPU, blitter, copper, and also some of the bitplane DMA. The above reasoning is about CPU vs bitplane DMA.
Other things, such as audio & sprite DMA, run on odd cycles and therefore don't compete with the CPU about buscycles.
Kalms is offline  
Old 29 May 2010, 14:50   #7
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
Quote:
Originally Posted by Kalms View Post
Only the even chipbus cycles can be used by the CPU.
That is not exactly right

68000 memory cycle is 4 CPU clocks (= 2 DMA bus cycles)

68000 bus cycle can be thought of 2 separate phases, address phase and data phase (simplified). During "address phase" Agnus latches the address from CPU _while_ Agnus can do normal DMA transfer ("odd"). During data phase data is finally transferred, chip bus is reserved for CPU. ("even")

Usually CPU does use even cycles because sooner or later there is odd chip cycle that "syncronizes" CPU to even cycles. (until DMA needs even cycle or CPU executes instruction that uses 2 CPU cycle idle cycles)

Nothing in Amiga is as simple as documentation says
Toni Wilen is offline  
Old 30 May 2010, 00:39   #8
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
Toni,

hmm. Please tell me if the below is correct:


So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions?

Let's say for instance that 100 CPU cycles = 1 DMA cycle. Every CPU instruction takes 1 CPU cycle. And that all DMA is turned off.

The following code in a loop would thus take exactly 2 DMA cycles per iteration, right?

Code:
    move.l  d0,(a0)    ; write to chipmem
    REPT    199
    nop
    ENDR
so the pattern on the chipbus would be:
address phase
data phase


And the following would take exactly 3 DMA cycles per iteration, right?

Code:
    move.l  d0,(a0)    ; write to chipmem
    REPT    299
    nop
    ENDR
and the pattern on the chipbus would be:
address phase
data phase
idle


Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be:

Code:
    CPU-induced activity   DMA-induced activity

    address phase          fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    ...                    ...
    data phase             nothing
That is; the address phase will execute immediately, but the data phase will get delayed until there is a free chipbus cycle.

Last edited by Kalms; 30 May 2010 at 01:08.
Kalms is offline  
Old 30 May 2010, 10:13   #9
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,545
Quote:
Originally Posted by Kalms View Post
So what if there is a very fast CPU connected? I presume that the transmission still takes two bus cycles under optimal conditions?
Perhaps Supraturbo can be used for testing (I still want one!)

Quote:
The following code in a loop would thus take exactly 2 DMA cycles per iteration, right?

Code:
    move.l  d0,(a0)    ; write to chipmem
    REPT    199
    nop
    ENDR
I can't say yes because of long write. (requires AGA or A3000, I don't know how these exactly work, for example CPU memory access cycles are different between 68000/010 and later models)

But lets assume word writes, CPU code is in superfast ram and no other DMA

Yes, except because horizontal line has 227 cycles in PAL and 4 refresh cycles, there is always one "unaligned" cycle because of number of lines is odd.

Quote:
And the following would take exactly 3 DMA cycles per iteration, right?

Code:
    move.l  d0,(a0)    ; write to chipmem
    REPT    299
    nop
    ENDR
I think so

Quote:
Now let's say that bitplane DMA is active, 8 planes SHRES on AGA (so all buscycles are consumed by DMA). The CPU executes a write sometime during this period, and then the pattern on the chipbus would be:

Code:
    CPU-induced activity   DMA-induced activity

    address phase          fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    waiting                fetch bitplane data
    ...                    ...
    data phase             nothing
That is; the address phase will execute immediately, but the data phase will get delayed until there is a free chipbus cycle.
Yes, this can happen, at least on an A500. CPU is tricked by Agnus and Gary to think that memory is really slow, CPU happily waits until "memory is ready" signal comes
Toni Wilen is offline  
Old 30 May 2010, 12:28   #10
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
Thanks Toni.

Tying back to the original discussion, this means that the maximum sustainable speed by the CPU is 1 write every 2 DMA cycles, but
1) DMA activity can delay it
2) that DMA activity usually comes from blitter/copper/bitplanes
3) in the case discussed above, the main culprit is bitplane DMA

When running in 8bpl LORES, bitplane DMA uses 8 cycles, then idles for 24, then uses the next 8, etc. This continues for 160 cycles; then 67 are not used by bitplane DMA (some of those go to refresh, sprites, audio etc). It's very difficult to adapt one's code to this (especially when taking account accelerators running at different speeds) and therefore the total effect will be that some CPU writes will get delayed up to 8 DMA cycles. Code that does a lot of chipwrites during the visible display portion will get delayed by ~15%.

Similarly, when running 8bpl SHRES, *all* DMA cycles are taken by bitplane DMA during the visible display portion. This means, 160 DMA cycles are used by bitplane DMA, and then 67 are not (a few of those 67 will go to refresh, sprites, audio etc here, but let's not dwell on that). Code that does a lot of chipwrites during the visible display portion will get delayed by ~70%.
Kalms is offline  
Old 20 July 2010, 02:13   #11
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,643
Kalms, yes. (As you explain, there is only a "fight" over the half of the cycles that are open to both chipset and CPU.) I find the best way to think about it is to have your root point of view "inside the chipmem bus", as in "if I want to ride the bus, when will the roads be trafficjammed and by whom?" It's simplest on OCS where the MA cycles are documented quite well.

And exact memory access scheduling by AGA chipset has not been mapped or documented (AFAIK, which isn't much). But certainly it can be mapped, most easily with some simple routines on a stock A1200, I guess.
Photon is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can it be faster? oRBIT Coders. General 2 16 May 2011 20:38
More ChipRam than 8 MByte? AmigaSurfer support.WinUAE 34 26 November 2010 00:29
A1200 FASTRAM Vs CHIPRAM NovaCoder Coders. General 15 21 October 2009 22:37
2MB+ ChipRAM AlfaRomeo support.Hardware 26 24 August 2008 19:53
Chipram faulty on A4000???? keropi support.Hardware 6 15 January 2007 21:54

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:19.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09059 seconds with 13 queries