04 November 2019, 15:13 | #1 |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Write waitstates on the 68020+
I know that write-accesses to CHIP memory or CUSTOM chip registers incur wait-states. So on the 68020+ it is advised to put other instructions between two write accesses. The CPU can execute while results are being written to memory:
A0=CHIP memory / CUSTOM chip memory Code:
move.l d0,(a0)+ ; store 1st value add.l d2,d0 ; increase 1st value move.l d1,(a0)+ ; store 2nd value add.l d3,d1 ; increase 2nd value A0=CHIP memory / CUSTOM chip memory A1=FAST memory Code:
move.l d0,(a0)+ ; store 1st value not.b 2(a1) ; change Flag move.l d1,(a0)+ ; store 2nd value Is that correct, or am I totally wrong here? |
04 November 2019, 16:31 | #2 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
I think you are making some wrong assumption. AFAIK the 020 will NOT continue execution of the instruction stream while the last written date is waiting for the chipmem to take it. The 060 is the only Motorola 68k that can do that.
The reason to place instructions between two consecutive chipmem writes is that the first chipmem access will stall the CPU until chipmem is ready and then you get some CPU cycles synchronised with chipmem in such a way that the instructions will complete before the next chipmem slot opens. If you then do a second chipmem access, you will waste less cycles waiting for chipmem. Of course, if you stuff too many instructions between the two chipmem accesses, you will waste both a chipmem cycle and a lot of CPU cycles waiting for the next chipmem slot after the one you've just missed. This may look like what you wrote but it is really a different mechanism. The fastmem access between the two chipmem accesses is similar, on the 020/030 it will stall until the chipmem write is completed. Furthermore being a NOT instruction, it will cause a read and a subsequent write. On 030+ the read can cause a burst read of four consecutive longs from fastmem to the 030s data cache. This may take too long to stuff between two chipmem writes. On 040/060 the write to fastmem should finish very quickly which is why on the 060 doing c2p to fastmem and then copying the planar data from fastmem to chipmem in some unrelated work routine working in fastmem makes sense. |
04 November 2019, 16:45 | #3 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Actually, any external memory access (read or write) will incur wait states. But the most wait states typically occur in Chip RAM because that particular RAM was (by design) the slowest RAM in the system.
Motorola/Freescale added features to advanced 68K CPUs to try to improve performance for the condition of external memory access wait states. These features are an instruction cache, data cache, write buffer, copyback, store buffer and non-sequential pipeline execution (Note - These features vary with CPU model). The general idea here is to prevent or at least reduce the occurrence of an execution pipeline stall. If the execution pipeline is kept busy doing things like instruction decode or an effective address calculation (while external memory access is pending or preferably avoiding the external access completely with a cache hit) then overall performance is improved. The CPU only sees one bus. The difference in access speed for different address spaces on the bus is determined in hardware. The custom chips also see one bus which just happens to be a small part of the larger CPU bus. Last edited by SpeedGeek; 04 November 2019 at 17:23. |
04 November 2019, 16:45 | #4 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
However, to activate this push buffer, the caching mode of the chip memory has to be set accordingly, namely to "non-serialized" on the 040 and "imprecise" on the 68060. If the caching mode is "cache inhibited", then writes will also stall the 68040 and 68060 as it then guarantees purely sequential operation. |
|
04 November 2019, 17:01 | #5 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
But my experience with 030 shows that it doesn't work this way. I could pad up to 22 clock cycles after a write to chip, but these must not contain any memory access or it will not work anymore. Writes to fast also give a few extra cycles. While for 020/030 it is not a true push buffer, the cpu is still able to execute instructions when waiting for a memory write to complete. |
|
04 November 2019, 17:09 | #6 | |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
Quote:
|
|
04 November 2019, 17:18 | #7 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
No. The read causes a read burst only if data burst is active - an on my A1230 it usually wasn't.
|
04 November 2019, 17:20 | #8 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,411
|
I can confirm that on my Blizzard 1230MK IV data burst is normally off. I've tried to manually activate it and found no performance benefit whatsoever, so maybe it's off for the reason grond mentioned (i.e. to make chip-fast-chip transfers faster)?
|
04 November 2019, 17:23 | #9 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
How does the 030 then fill a 16byte data cache line? BTW, I also seem to remember finding that activating burst reads didn't seem to make any difference but I think I concluded that, since this didn't fit the theory, my tests were wrong...
|
04 November 2019, 17:38 | #10 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Check out 68030um, section 11.2.5.2, 'Write pending buffer'. The 030 can pass a single write operation to this subsystem in the CPU and get on with processing while the bus microcontroller talks to whatever is connected on the other side (fastmem, chipmem, customchip regs).
Another memory access request while the current is in flight will cause the rest of the 030 core to pause until the in-progress memory access completes. On 030/50, a write takes 2c (or was it 4c?) to complete in the core, and then it goes to the write pending buffer. Outside the CPU, the mem interface for chipmem can accept (start) a new write every 28 cycles. The bus microcontroller will wait until the next such period begins. Then it spends that period performing the transfer. This is why chip/fast/chip accesses are costly - even though the fast access takes less than 28c, the CPU will miss out on a full 28c chipbus 'slot'. |
04 November 2019, 17:44 | #11 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
Interesting! Thanks for the explanation. Do you also happen to know how the 030 will treat filling a dcache line for an uncached fastmem address?
Edited to add: is the 020 equal to the 030 with regards to the "write pending buffer"? Last edited by grond; 04 November 2019 at 17:54. |
04 November 2019, 19:03 | #12 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
The 030 does not need to read full cache lines, longwords inside a cache line are independent (i.e. separately seen as valid or not).
Quote:
In many cases it can be slower due to extra memory accesses. This is why it's not enabled by default. In some cases it can be faster, because burst accesses take (slightly) less clocks than normal accesses. It is possible to optimise for dburst. You have to do serial mem access, and insert register-only instructions in between. It means that : Code:
move.l (a0)+,d0 and.l d5,d0 move.l (a0)+,d1 and.l d5,d1 Code:
move.l (a0)+,d0 move.l (a0)+,d1 and.l d5,d0 and.l d5,d1 IIRC, yes. But A1200's EC020 has a lot less waitstates because of lower clock rate, so not many instructions can fit between writes. |
|
04 November 2019, 19:15 | #13 | |||
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
My main target is not the 68020 it's more the 68040/60. Quote:
Yes, you are right. I noticed the phenomen on the 68020. If I put too many commands between the two CHIP memory writes my routine wasted more rastertime. Two commands seem to be the break even point. With three commands you'll loose. Quote:
Okay, the NOT command is a bad example. A MOVE command to FAST memory would be much clearer. |
|||
04 November 2019, 19:21 | #14 | |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
Thanks for your detailed explanation, SpeedGeek. Your last paragraph is the most interesting statement for me. |
|
04 November 2019, 19:36 | #15 | |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
|
|
04 November 2019, 19:39 | #16 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
In order to do burst accesses, I guess one would want 16 bytes aligned addresses similar to the 040's move16 instruction?
|
04 November 2019, 19:41 | #17 | |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
|
|
04 November 2019, 19:44 | #18 | |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
Okay, a good explanation, thanks Kalms. |
|
04 November 2019, 19:48 | #19 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
However, if said memory isn't in cache then it may experience wait states if there is a write currently being done (as it has single bus). All the question here is when exactly the push buffer will be flushing its data to memory. And having no 060 i can't answer. |
|
04 November 2019, 19:54 | #20 | |
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 256
|
Quote:
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
A1000 68020 | Marchie | support.Hardware | 6 | 10 November 2017 12:08 |
68020+ instruction timings? | oRBIT | Coders. Asm / Hardware | 3 | 23 September 2017 12:38 |
Overclocking 68020? | Marchie | support.Hardware | 8 | 11 October 2016 13:33 |
68020 33 MHz | Leandro Jardim | support.WinUAE | 2 | 02 January 2012 19:21 |
Questions about 68020 CE | Maren | support.WinUAE | 11 | 09 December 2009 21:01 |
|
|