03 May 2018, 16:53 | #1 |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
A question to coders - 3 op instructions [fpga 68k]
I've started thinking on 68k FPGA again, as before it will probably not result in anything. Except wasting of my time of course.
Thinking about a 3 operand extension where instructions can have two sources and one (independent) destination leads to some practical problems and a question to anybody that knows 68k assembly (and interested in this theoretical game of time-wasting). If anybody is interested in why this is a relevant question I've tried to describe the technical problem below, but it isn't really needed to answer the question. IOW skip to the question unless interested in technical mumbo-jumbo. -- The main problem keeping the old semantics for 3 operand instruction is when going for an out of order execution core. In any practical implementation one have to have register renaming in order to remove false dependencies. Very simplified one can say register renaming is referring to a register as register[rename_table[register_number]] instead of register[register_number]. New results allocate a free physical register leaving the old register content unchanged, this means old instructions can see the old register content and new instructions can see the new register content. This is a great part of what makes instructions capable to execute out of order, with older instructions started after new ones are finished. A 2 operand instruction would be converted to a 3 operand internally: add.b d0, d1 => _add.b d0_old, d1_old, d1_new ; _add is the internal format of the processor And there are no great problems keeping the old semantics, the upper 24 bits of d1 is simply copied unchanged into the newly allocated d1_new register. It isn't so easy when going to 3 operand instructions: add.b d0, d1, d2 => _add.b d0_old, d1_old, d2_new To keep the old semantics here means the upper content of d2 will have to be preserved. However as written the upper content of d2 isn't known. One way to handle this would be expanding this further adding a read of the old d2 value: add.b d0, d1, d2 => _add.b d0_old, d1_old, d2_old, d2_new And copy the upper bits of d2_old to d2_new as in the 2 operand example. However adding more sources adds complications, more register read ports, more wires, more multiplexers, more complex instruction scheduler. Another way would be inserting extra internal operations when needed. So that reading a byte from d2_new would work without problem (as the upper 24 bits aren't relevant) but reading a longword from the same would cause the processor to add an internal operation merging the high bits of d2_old with those of d2_new: __fuse.b d2_new, d2_old, d2_now_extra_new That however adds complications elsewhere in the design. Some x86 processors have used a similar design for a similar problem, they had no other reasonable choice though (ISA specific). But when creating a new extension there is another alternative: by either zeroing or sign extending the (byte/word) result the upper bits of the old register aren't needed anymore. -- The question is relatively simple: if the instruction set is extended to support 3 operands is it in your opinion okay to have 3 operand instructions always zero or sign extend byte/word operations? And as a follow-up question: if not, would it be reasonable for 3 op instructions that doesn't zero/sign extend to be slower? |
03 May 2018, 18:07 | #2 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,322
|
Quote:
Quote:
IMO the only way to implement 3-op in the 68k is to do it "silently" by decoding instruction pairs as if they were single instructions. You do not need to create new encodings at all. In addition old programs don't need to be rewritten to take benefit from this. Note that 3-operand does not occur as frequently as most people appear to believe and is probably not worth the trouble anyway. |
||
04 May 2018, 18:25 | #3 | ||||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Quote:
Quote:
Code:
MOVE.B D1, D0 ADD.B D2, D0 _ADD.B D1_old, D2_old, D0_new One shouldn't discount the problem of MOVE updating condition codes either as they have to be tracked. However there wouldn't be a huge problem to support MOVE fusion for longword operations for a limited subset of following instructions. Quote:
As I'm not coding much nowadays and even less in 68k assembly I thought that the opinions of skilled assembly coders should influence the (vaporware) development. |
||||
04 May 2018, 21:53 | #4 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,322
|
Quote:
If you follow a MOVE by something like ADD, then the MOVE condition codes don't have to be computed at all - because ADD will change them all later. Sub-register updates can be ignored if you decide to support only longword size. Else, well, it's time to be innovative Quote:
I mean, the high part would then be unimportant out of the ALU, meaning you don't need to read D2. But upon writing, you say it's a byte, and the register file masks D2's high part out. Ok, that's just my 2 cents Quote:
Quote:
It might be better to discuss these directly. That said, perhaps it would be better to just implement the full cpu before adding anything new, it's already a daunting task by itself... |
||||
06 May 2018, 13:37 | #5 | ||||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Sure just supporting longs is one option but doesn't feel elegant. However requiring sign/zero extension isn't elegant either, changes the feel of the 68k instruction set. :/ Quote:
ADD.B D0, D1 MOVE.L #$FEEDBEEF, D0 The MOVE uses the same register name (D0) as the previous instruction however it ignores the content in D0. This is a so called false dependency commonly called Write After Read (WAR). When using register renaming the processor handles this by letting the first instruction have a different D0 than the second. So after decoding the internal instructions can be: _ADD.B D0_0, D1_0, D1_1 _MOVE.L #$FEEDBEEF, D0_1 The false dependency is removed and the MOVE can execute in parallel with or before the ADD. Registers are allocated from a larger register file (e.g. x86-64 processors have >160 physical registers but only 16 general registers) and any physical register can be used for any programmer visible register. That means D0_0 is really any physical register and not the same as D0_1, D0_2 etc. So: ADD.B D0, D1 becomes: _ADD.B D0_0, D1_0, D1_1 And here the upper 24 bits can be taken from the old register D1_0 and copied unchanged to D1_1, this as D1_0 is an input. ADD.B D0, D1, D2 would be translated to: _ADD.B D0_0, D1_0, D2_1 ; D2_1 is a previously unused physical register Observe that the "old" D2 isn't available as an input to the instruction, the upper bits are unknown. So why not simply see that D1 is overwritten and map D2_1 to the same physical register as D2_0? OR.B D2, D4 ADD.B D0, D1, D2 _OR.L D2_0, D4_0, D4_1 _ADD.B D0_0, D1_0, D2_0 Because then we'd still have a false dependency between the instructions - the ADD have to be executed after the OR. In other words that solution is one of the ways 3 op instructions that don't extend the result can be run slower: by effectively "disabling" OoO execution. Quote:
Quote:
|
||||
06 May 2018, 17:36 | #6 | |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
|
Quote:
I.e., add.b d0,d1,d2 will leave all 32 bits of d2 set to what d1 would have contained if the third operand hadn't been present? |
|
07 May 2018, 07:36 | #7 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,322
|
Quote:
They also make computing the instruction size one step longer. How do you plan managing this ? Quote:
It seems that if you don't want to change the feel of the 68k instruction set, you will have to read the 3 registers. A few 020+ instructions need reading more than 2 regs already, btw. Or, just drop the idea of 3-op. It wouldn't make the code shorter or easier to write, two conditions that are important for 68k but often overlooked (and that Gunnar never understood !). |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
lotus trilogy question for the coders | turrican3 | support.Games | 13 | 13 June 2014 03:43 |
Question for AmigaDos coders | DeafDaz | Coders. General | 11 | 26 September 2011 07:31 |
Coders challenge: Memcopy | oRBIT | Coders. General | 29 | 28 June 2011 11:57 |
WinUae 68k emulation question. | Thorham | Coders. General | 7 | 15 July 2009 10:31 |
Coders Heaven (PC Development) | Feltzkrone | Coders. General | 5 | 15 November 2004 10:08 |
|
|