English Amiga Board - View Single Post

Kalms · 11 April 2010, 00:51

I'll just take 68060 as an example.

a full c2p conversion (without delta etc) need to do 3 things:

1) read 256x240 bytes = 60kB of data from fastmem
2) transform it
3) write 256x240x5 bits = 37.5kB of data to chipmem

Step 1 and 3 can not be done in parallel.
Step 2 can mostly be done in parallel with step 1 & step 3.

Step 1 takes perhaps 20 scanlines.
Step 2 takes perhaps 50 scanlines.
Step 3 takes perhaps 125 scanlines.
[the figures above are rough estimates.]

Thus you can expect the full conversion to take 20+125=145 scanlines. The CPU's memory bus will be busy all the time.
The CPU will be doing actual processing during about 50 of those 145 scanlines. Rest of the time is the CPU stalling waiting for the bus interface to finish previous operations.

The above points to the fact that what would make the most difference to you would be to reduce the amount of chipmem writes. Extra computations or extra fastmem accesses (for instance, by doing delta c2p) traded against less chipmem writes might help.
However then a full conversion will be slightly slower than the naive approach.

Your call.

11 April 2010, 00:51	#2
Kalms Registered User Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 237	I'll just take 68060 as an example. a full c2p conversion (without delta etc) need to do 3 things: 1) read 256x240 bytes = 60kB of data from fastmem 2) transform it 3) write 256x240x5 bits = 37.5kB of data to chipmem Step 1 and 3 can not be done in parallel. Step 2 can mostly be done in parallel with step 1 & step 3. Step 1 takes perhaps 20 scanlines. Step 2 takes perhaps 50 scanlines. Step 3 takes perhaps 125 scanlines. [the figures above are rough estimates.] Thus you can expect the full conversion to take 20+125=145 scanlines. The CPU's memory bus will be busy all the time. The CPU will be doing actual processing during about 50 of those 145 scanlines. Rest of the time is the CPU stalling waiting for the bus interface to finish previous operations. The above points to the fact that what would make the most difference to you would be to reduce the amount of chipmem writes. Extra computations or extra fastmem accesses (for instance, by doing delta c2p) traded against less chipmem writes might help. However then a full conversion will be slightly slower than the naive approach. Your call.