English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 June 2020, 13:18   #21
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 685
Quote:
Originally Posted by ross View Post
Yes, it is precisely for the reasons you have explained.

Does the second example I listed actually exist? I don't think the superscalar dispatch will parallelize self-modifying code because it will see the dependency caused by the first instruction, unless the dispatch doesn't consider instruction address when looking at data dependencies? It would have to be some situation like where the first instruction caused the copper to modify the second instruction or something in interleaved memory access from CHIP or something. (I'm really stretching my brain to try to find situations where this could happen.)
AmigaHope is offline  
Old 27 June 2020, 13:49   #22
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
Quote:
Originally Posted by meynaf View Post
You mean without OS ?

Not sure it's what you want, but here are a few tricks :
- set some 040+ bit in cacr - if it reads back as 0, then 020-030
- try to access caar register - if not illegal, then 020-030
- trace a trap instruction - if you get to the trap vector instead of trace, then 040+
- try some cpsave opcode (e.g. $FF10) in user mode - if privilege violation instead of line-F, then 020-030

Note that it's possible emulators get caught and won't behave properly
Yes sorry, OS, I used the Italian acronym but it's probably the same in French
Good tips, I'll try something

Quote:
Originally Posted by AmigaHope View Post
Does the second example I listed actually exist? I don't think the superscalar dispatch will parallelize self-modifying code because it will see the dependency caused by the first instruction, unless the dispatch doesn't consider instruction address when looking at data dependencies? It would have to be some situation like where the first instruction caused the copper to modify the second instruction or something in interleaved memory access from CHIP or something. (I'm really stretching my brain to try to find situations where this could happen.)
Just to be sure, I disable it; I haven't stretched my brain for real cases
In any case it sure 'slows down' the 060 a bit.
ross is offline  
Old 27 June 2020, 13:50   #23
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by AmigaHope View Post
Does the second example I listed actually exist? I don't think the superscalar dispatch will parallelize self-modifying code because it will see the dependency caused by the first instruction, unless the dispatch doesn't consider instruction address when looking at data dependencies? It would have to be some situation like where the first instruction caused the copper to modify the second instruction or something in interleaved memory access from CHIP or something. (I'm really stretching my brain to try to find situations where this could happen.)
You don't need to stretch your brain : superscalar or not, the instruction executed will be the one before the alteration.
The 68060 does not in any manner monitor code alterations, nor does any other cpu of the family. This is why self modifying code is seen as a bad practice.
meynaf is offline  
Old 27 June 2020, 13:57   #24
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by ross View Post
Yes sorry, OS, I used the Italian acronym but it's probably the same in French
Missed : in french, would be SE.


Quote:
Originally Posted by ross View Post
Good tips, I'll try something
If you want, i also have a few tricks to detect 020+, even some that don't need trapping.
meynaf is offline  
Old 27 June 2020, 14:33   #25
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
Quote:
Originally Posted by meynaf View Post
Missed : in french, would be SE.
Ah! Système d'Exploitation


Quote:
If you want, i also have a few tricks to detect 020+, even some that don't need trapping.
Yes, publish it, thanks! (probably it concerns an addressing mode ignored by 000 or an instruction executed in a different order, I remember that in the past I also used it, but now I don't remember it )

Back to my goal: I need the simplest and tiny way to skip the CPUSHA only on 020/030 in the blob in message #3 (yes, I hadn't explained everything, but it's all in the thread ).

The cpsave opcode is interesting, but this would not also fail in these 'bad' expansion cards?
The 'trace a trap' could also be interesting, but a bit too convoluted?
CAAR not usable because in all 020+ family.
CACR method is simple and effective but could not create problems in future processors (daydream..)?
ross is offline  
Old 27 June 2020, 15:15   #26
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by ross View Post
Yes, publish it, thanks! (probably it concerns an addressing mode ignored by 000 or an instruction executed in a different order, I remember that in the past I also used it, but now I don't remember it )
Ok, here it goes :
- perform computation with scale factor (*2, *4 or *8), if *1 is used instead then <020
- push a7 with movem.l a7,-(a7), then pop to another register and compare with a7 - if different, then 020+


Quote:
Originally Posted by ross View Post
The cpsave opcode is interesting, but this would not also fail in these 'bad' expansion cards?
In theory, no. It fails because it's privileged instruction and so i don't think it accesses the coprocessor interface (in user mode, that is).
However, it fails under some emulators, like old versions of winuae (at least 3.1.0).


Quote:
Originally Posted by ross View Post
The 'trace a trap' could also be interesting, but a bit too convoluted?
Probably
Also not good on emulators (trace not reliable if jit is active).


Quote:
Originally Posted by ross View Post
CAAR not usable because in all 020+ family.
As far as i know, cache address register has been removed in 68040 (replaced by cpush). If you really can't believe this, you can also try with MSP


Quote:
Originally Posted by ross View Post
CACR method is simple and effective but could not create problems in future processors (daydream..)?
If future processors reuse existing bits for something else, then they're broken
meynaf is offline  
Old 27 June 2020, 15:37   #27
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
Quote:
Originally Posted by meynaf View Post
Ok, here it goes :
- perform computation with scale factor (*2, *4 or *8), if *1 is used instead then <020
- push a7 with movem.l a7,-(a7), then pop to another register and compare with a7 - if different, then 020+


Quote:
As far as i know, cache address register has been removed in 68040 (replaced by cpush). If you really can't believe this, you can also try with MSP
You are right


Quote:
If future processors reuse existing bits for something else, then they're broken
So 060 is broken
BIT13 was reused for FIC, on 030 it was for WA.
ross is offline  
Old 27 June 2020, 15:49   #28
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by ross View Post
So 060 is broken
BIT13 was reused for FIC, on 030 it was for WA.
Yes indeed
060 did a few "not really smart" moves. Hopefully (AFAIK) this one doesn't have a significant impact.
meynaf is offline  
Old 27 June 2020, 16:00   #29
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
Quote:
Originally Posted by meynaf View Post
Yes indeed
060 did a few "not really smart" moves. Hopefully (AFAIK) this one doesn't have a significant impact.
Yes, if I need some fast and dirty 'universal' cache-on setup for all processors family I force BIT13=1, losing half of the Icache on 060.
Actually if code is only in supervisor mode I could set WA=0 even on 030..
ross is offline  
Old 29 June 2020, 00:08   #30
mr.spiv
Registered User
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 48
Posts: 120
Btw.. few related questions for my clarification. How does instruction prefetch behave on 030 or 040 after disabling all possible caches and stuff (using LVOCacheControl)? Does it still have 000 like simple prefetch available? Does UAE emulate 030 or 040 correctly here?
mr.spiv is offline  
Old 29 June 2020, 11:06   #31
Keir
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 606
I had a question about my revised cache-disable code. So I link to it here: https://github.com/keirf/FF_AutoSwap...sable_caches.S

Apologies that it is GAS rather than native syntax!

It does use another 68040+ instruction, and catching a trap, to detect earlier CPUs. If you're brave, perhaps you could assume ITT0 exists implies CPUSHA is valid, and get rid of instructions messing with F-Line vector.

I also have more extensive code to disable MMU (68030+) too.
Keir is offline  
Old 29 June 2020, 19:47   #32
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 685
Quote:
Originally Posted by meynaf View Post
You don't need to stretch your brain : superscalar or not, the instruction executed will be the one before the alteration.
The 68060 does not in any manner monitor code alterations, nor does any other cpu of the family. This is why self modifying code is seen as a bad practice.

OK. I know for sure the dispatch looks to see if the first instruction touches any of the registers or memory addresses used as data in the second instruction, and will not parallelize them if there's a match. It seemed a no-brainer for it to add the address of the instruction itself as one of the things it checked but I didn't know if it did or not.

This is different from instruction/data cache coherency, which it definitely does not check on any 680x0 (which is what I think you're referring to by "other cpu of the family", as the 68060 is the only CPU in the family that is superscalar).

Actually, does superscalar even work if the instruction cache is off?
AmigaHope is offline  
Old 29 June 2020, 20:09   #33
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by AmigaHope View Post
OK. I know for sure the dispatch looks to see if the first instruction touches any of the registers or memory addresses used as data in the second instruction, and will not parallelize them if there's a match. It seemed a no-brainer for it to add the address of the instruction itself as one of the things it checked but I didn't know if it did or not.
Checking the address of the instruction itself means support for SMC and, again, the 68k's philosophy is against it (it's also the reason why PC-relative modes aren't for writing).
A sure thing is that the two can't touch the same memory address, as the 68060 can not dual execute two instructions accessing memory.


Quote:
Originally Posted by AmigaHope View Post
Actually, does superscalar even work if the instruction cache is off?
Why wouldn't it ? They're completely separate features.
meynaf is offline  
Old 29 June 2020, 21:57   #34
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 685
Quote:
Originally Posted by meynaf View Post
A sure thing is that the two can't touch the same memory address, as the 68060 can not dual execute two instructions accessing memory.
Except both instructions do access memory as they both have to be fetched from memory, this being the issue with self-modifying code. My question is if the dispatch will execution instruction B in the same cycle if instruction A writes to the address of instruction B.

Quote:
Why wouldn't it ? They're completely separate features.
Because to execute two instructions at once they would have to be available in the same cycle. That would be hard to do if the instructions couldn't be pulled from memory together. I don't know exactly how the 68060 feeds its pipeline but you definitely couldn't pull a memory-accessing instruction and another instruction from a 32-bit bus in the same execution cycle without burst (given the large number of 1-cycle and 2-cycle instructions on the 68060), and if the instruction cache was turned off, instructions would no longer be loaded in burst mode.
AmigaHope is offline  
Old 30 June 2020, 10:35   #35
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,861
Quote:
Originally Posted by AmigaHope View Post
Except both instructions do access memory as they both have to be fetched from memory, this being the issue with self-modifying code. My question is if the dispatch will execution instruction B in the same cycle if instruction A writes to the address of instruction B.
Even with icache off, even if they don't execute simultaneously, if instruction 1 overwrites instruction 2 then instruction 2 will still execute unmodified, because the write occurs several clocks later than the fetch (and with the push buffer, it can happen even later).
Perhaps you need to read some docs about how pipelines work.


Quote:
Originally Posted by AmigaHope View Post
Because to execute two instructions at once they would have to be available in the same cycle. That would be hard to do if the instructions couldn't be pulled from memory together. I don't know exactly how the 68060 feeds its pipeline but you definitely couldn't pull a memory-accessing instruction and another instruction from a 32-bit bus in the same execution cycle without burst (given the large number of 1-cycle and 2-cycle instructions on the 68060), and if the instruction cache was turned off, instructions would no longer be loaded in burst mode.
Burst mode does not allow getting several longs from memory in a single clock, it just reduces access latency...
There is a small queue between the cache (or memory) and the pipeline, it works similar to the 68000 prefetch queue.
And it's still possible to fit 2 instructions in 32 bits, even if one accesses memory.

Anyway, is it really important ? Turning off caches and/or superscalar does not look like a great idea. Better change the code so that it no longer does stupid things.
meynaf is offline  
Old 30 June 2020, 12:26   #36
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 45
Posts: 23,889
Quote:
Originally Posted by mr.spiv View Post
Btw.. few related questions for my clarification. How does instruction prefetch behave on 030 or 040 after disabling all possible caches and stuff (using LVOCacheControl)? Does it still have 000 like simple prefetch available? Does UAE emulate 030 or 040 correctly here?
No one knows how 68020+ prefetch exactly works (when sequencer decides to do next prefetch. 68020+ sequencer can continue executing instructions even if bus controller is waiting for bus transfer. 68000/010 does nothing if bus transfer takes longer than normal 4 cycle memory access cycle).

This is not documented.

Pipeline is emulated (but see above) if "more compatible" is enabled, assuming documentation diagram is correct. Normally it is not worth the performance loss.

Pipeline either loads words from cache or from memory (via 32-bit prefetch buffer). Cache does not affect pipeline operations.
Toni Wilen is offline  
Old 05 July 2020, 12:51   #37
Keir
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 606
I had another thought about how to conditionally execute CPUSHA, based on testing bit 31 of CACR (Data Cache Enable on 68040+). This makes the code shorter, clearer, and possibly more correct.

Shorter: Saves an AND.w of the condition codes in the illegal-instruction handler

Clearer: Tests are inline in main code, rather than relying on side effects of the illegal-instruction handler.

More Correct: Only evicts dirty lines (via CPUSHA) if the data cache is enabled. I suspect this can avoid the possibility of writing back stale dirty lines. Obviously this situation should be unlikely, but seems to me another benefit.

Here is a code snippet. Disabling preemption, entering supervisor mode, and setting up the illegal-instruction handler are all elided:

Code:
        moveq   #0,d0
        moveq   #0,d1
        dc.l    0x4e7b0801        /* movec d0,vbr  */
        dc.l    0x4e7a1002        /* movec cacr,d1 */
        tst.l   d1                /* Bit 31 set? (68040+ Data Cache Enabled) */
        bpl.b   .skip             /* Skip CPUSHA if not */
        dc.w    0xf478            /* cpusha dc     */ /* 68040+ only */
.skip:  dc.l    0x4e7b0002        /* movec d0,cacr */
        dc.l    0x4e7b0808        /* movec d0,pcr  */
        rte
_illegal_insn:
        addq.l  #4,2(sp)
        rte
EDIT: This new code also assumes that CACR bit 31 implies CPUSHA DC will not trap. This seems a safe assumption, similar to how bit 15 guards CPUSHA IC in the example implementation of ClearICache in the Amiga developer manuals.

Last edited by Keir; 05 July 2020 at 15:00.
Keir is offline  
Old 05 July 2020, 14:00   #38
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
Hi Keir, this is a nice solution

Only two notes:
- it's pretty safe to exclude the
tst.l d1
; (even if the trap is called, the
addq.l #4,2(sp)
cannot set bit31 because >2GB code position is unsupported on any Amiga)
- this do not work if 040+ data cache is prior disabled..

I try to think of a solution.
ross is offline  
Old 05 July 2020, 14:07   #39
Keir
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 606
Quote:
Originally Posted by ross View Post
Hi Keir, this is a nice solution

Only two notes:
- it's pretty safe to exclude the
tst.l d1
; (even if the trap is called, the
addq.l #4,2(sp)
cannot set bit31 because >2GB code position is unsupported on any Amiga and on previous instruction the bit N is cleared)
The
tst.l
is needed because
movec
doesn't set the condition codes. The exception handler doesn't matter because
rte
restores SR.

Quote:
- this do not work if 040+ data cache is prior disabled..
That's a benefit, right? If the cache is disabled then there's nothing but stale dregs in the cache, and you don't want to write those back. That's my thinking, anyway...

Quote:
I try to think of a solution.
Keir is offline  
Old 05 July 2020, 14:18   #40
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,619
You are right


EDIT: this is the effect of not having a coffee after a hearty Sunday meal

Last edited by ross; 05 July 2020 at 14:23.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to disable caches using MOVEC => CACR jotd Coders. Asm / Hardware 12 07 November 2017 20:45
Fool's Errand Catflap support.Games 2 30 April 2015 21:53
How not to flush caches. Toni Wilen Coders. General 18 28 October 2011 10:05
Some fool on ebay selling WinUAE fitzsteve support.WinUAE 28 02 April 2011 20:24
A lost fool Autana project.WHDLoad 9 05 January 2006 16:16

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:32.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.10550 seconds with 15 queries