14 October 2016, 20:12 | #1 |
Registered User
Join Date: Sep 2003
Location: germany
Age: 45
Posts: 459
|
68k timing
last time I have thought about 68k timing again.
Whats happening with internal operation during wait states ? e.g. ASL Dx, Dy sequence: prefetch n* n (* means shift count, n -> 2 cycles, prefetch -> 4 cycles (2 cycles to put address on bus, next 2 cycles repeated because of stalling ) ) Assume the prefetch is stalled by wait states means the internal register shifting will be stalled too? It could happen concurrent to bus wait state cycles. Like instruction overlap this feature coud be possible only for 68020 cpus and higher ? Last edited by PiCiJi; 14 October 2016 at 20:23. |
15 October 2016, 19:49 | #2 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Nothing happens during wait states with 68000.
Only 68020+ can do memory access(es) while CPU does internal operations. |
16 October 2016, 13:56 | #3 |
Registered User
Join Date: Sep 2003
Location: germany
Age: 45
Posts: 459
|
seems true for immediate and register ASL, because sequence describes shifting after prefetch
ASL (An) sequence: nr (read from An), np (prefetch), nw (write shifted result) shifting and decoding next opcode happens during prefetch there are no additional 2 clocks for shifting. Last edited by PiCiJi; 16 October 2016 at 14:01. |
16 October 2016, 14:14 | #4 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
I didn't say other things can't happen during memory access (microcode can do ALU operations, condition code setting etc simultaneously) but if memory access takes longer than normal 4 cycles (wait states added), nothing happens during those extra wait states.
|
16 October 2016, 15:14 | #5 |
Registered User
Join Date: Sep 2003
Location: germany
Age: 45
Posts: 459
|
Thanks for clarity.
I am trying to understand prefetches for 68020. It says a memory access costs 3 cycles. It seems a long word access is one bus cycle instead of 2 like the 68000. a few questions 1. Can a long word be read/written within 3 cycles (no wait states) ? 2. each odd prefetch doesn't consume bus cycles because a long word is prefetched from external bus or cache? 3. How much cycles consumes a cache hit? 4. Consumes a cache miss additional cycles besides a external bus access ? Last edited by PiCiJi; 16 October 2016 at 19:56. |
16 October 2016, 20:30 | #6 | ||
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Quote:
Quote:
2: yes, prefetch always loads long aligned long words and even if it is not cached (caches off), it goes to 32-bit prefetch buffer and next word comes from buffer (while the CPU can already start next long prefetch read). So better jump to long aligned addresses to make the best out of it 3: I think cache hit is free. 4: it depends, if CPU has something else to do, it may not cause any extra cycles.. for example longer logic operation, prefetching/decoding prefetch buffered word (or instruction cache). This makes accurate emulation practically impossible without more knowledge of CPU internals. 1 and 2 are quite clearly documented. 3, 4 and it becomes quite fuzzy.. |
||
17 October 2016, 11:19 | #7 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,643
|
|
17 October 2016, 11:47 | #8 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
|
17 October 2016, 12:07 | #9 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
|
17 October 2016, 12:18 | #10 | ||
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Quote:
Quote:
I didn't mention external because it is obvious, just count data pins |
||
17 October 2016, 12:53 | #11 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
Quote:
Where it exactly comes from, is of course another story. Btw. I would like to have more accurate 020/030 timings under winuae for suitable code optimizations, as for now the "cycle exact" timing is about as wrong as in max speed with jit active. It doesn't need to be 100% cycle exact, just better than what we have now. |
|
17 October 2016, 13:11 | #12 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Impossible without more information. 68020/030 documentation is useless for internal timing purposes. Extremely useless for mul/div timing. No one knows the algorithm.
Only "hidden" info that seems to be true is that each prefetch pipeline state change is 2 cycles (when it comes from prefetch buffer = no extra wait states). |
17 October 2016, 14:01 | #13 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
Why would you need internal timing ? Isn't it only externally observable timing that counts ?
|
17 October 2016, 14:31 | #14 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Quote:
Just like it does on 68000, only that 68000 is very simple compared to 68020, timing is always the same, previous or next instruction does not change timing of current instruction. 68000 internal timing is practically 100% accurately emulated. Exact timing when memory access happens needs to be 100% accurate and only way to make it accurate is to emulate all internal cycles. Even 1 cycle difference can make multiple cycle difference in outside world when CPU memory access needs to be aligned to Amiga bus cycles (especially when accessing chip ram, chip registers of CIA). Even tiny error will become huge. It gets even worse with variable cycle instructions (cycle amount depends on both parameters = without knowing the algorithm it is impossible to emulate accurate) like MUL or DIV. Only more simple thing in 68020+ vs 68000 is shifts (I am quite sure I have said something similar about n times already) |
|
17 October 2016, 14:43 | #15 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
I explicitly wrote that it didn't need to be 100% accurate...
|
17 October 2016, 15:27 | #16 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Not being 100% accurate makes it very inaccurate. Those extra or missing internal cycles makes huge difference.
All memory accesses are already accurately emulated. Non-100% internal timing: result is not accurate at all. 68000 is almost accurate even without internal timing because most instruction's cycle time is same as memory cycles (main differences being shifts, mul and div and some EA calculations). 68020 is something totally different. |
17 October 2016, 15:41 | #17 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
I don't get it. As we (asm programmers) can manually count clocks for a specified routine (without knowing the cpu's internals), why wouldn't the emulator be able to do the same ?
|
17 October 2016, 18:03 | #18 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
Because it is not that difficult to optimize some loop for best case, especially if most of the code fits in cache. Or when only fast RAM accesses are done.
It is not going to work with generic situation, most of code is not optimized that way. It must work in all situations. For example this kind of emulation would not help with the worst case situation where code and data is in chip RAM (=unexpanded A1200/CD32 demos and games. The most important reason for me.) And it still does not help with MUL or DIV. |
17 October 2016, 18:41 | #19 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,365
|
Quote:
Current 68030 approximate +0% speed does between 3 times slower and 2 times faster than 50Mhz 68030, depending on what's done. Such a lack of accuracy makes winuae unsuitable for asm cross-development (and it's a pity considering the speeeed at which phxass can assemble stuff there !). You can use worst case timing (e.g. 28c for mul.w). Should be enough. |
|
17 October 2016, 18:47 | #20 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
List few cache fitting routines with cycle counts included and I'll check if there is something obviously wrong. (no MULs or DIVs!)
Quote:
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Copper timing | yaqube | Coders. General | 61 | 08 April 2019 00:41 |
OS 4.1 Timing Issue | Steve | support.WinUAE | 3 | 24 January 2015 20:49 |
How do I know if I need a timing fix? | stu232 | support.Hardware | 4 | 05 October 2013 01:47 |
Even more sound timing issues... | andreas | support.WinUAE | 11 | 30 November 2005 11:23 |
A1200 timing fixes? | icewizard2k5 | support.Hardware | 2 | 28 February 2005 09:37 |
|
|