06 August 2016, 11:09 | #21 | ||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
Quote:
Quote:
Quote:
Current compilers are able to use the MOVEQ+AND trick, so there is little use for AND.L #i16. And anyway if we really want this, it takes a very small encoding space. Quote:
Is it worth creating an addressing mode that only a handful instructions will use ? If we list all instructions where the immediate mode is available, we won't get a lot of them. Quote:
Code:
move.l d1,d3 eor.l d2,d1 and.l d0,d1 eor.l d1,d2 eor.l d3,d1 What type of algorithms, well, aside of the classic c2p/p2c it's for whenever you need to exchange selected bits, i.e. extract a bit field or separate bits, while keeping the old value somewhere. Many cases would go away if we had a BFEXG, though. Quote:
|
||||||||
06 August 2016, 13:34 | #22 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,790
|
Quote:
http://www.ti.com/lit/gpn/tms320c30 |
|
08 August 2016, 04:55 | #23 | ||||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Any sample code for SELcc would be artificial as the instruction does not exist anywhere. There are 2 variations.
if (cc) var=val1 SELcc EA,d0 Code:
cmp ? bcc .skip move.l EA,d0 .skip: if (cc) var=val1 else var=val2 SELcc EA,d1,d0 Code:
cmp ? bcc .skip1 move.l EA,d0 bra .skip2 .skip1: move.l d1,d0 .skip2: Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
EB=0 IB=0 BFTST EB=1 IB=0 BFEXTS/U EB=0 IB=1 BFINS EB=1 IB=1 BFEXG Of course, you may want BFEXG to swap bits between 2 registers with the same offset and width. I believe this kind of BFEXG would be less general purpose and not work as well on bit streams. Any bit offset >31 would be useless without being able to specify 2 bit fields which is too expensive. Last edited by matthey; 08 August 2016 at 05:19. |
||||||
08 August 2016, 06:26 | #24 | |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
|
|
08 August 2016, 09:43 | #25 | |||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
Code size is exactly the same (4+ea). Quote:
However if D1 needs to be full EA then you're caught. Same for D0. Immediates (which are the most common case) can't be used (well, not for both operands). This severely limits the number of potential cases. I'm afraid that it'll end up with the coder saying "oh no i can't use it" in 90% cases. This is why i suggest having a look in real life code to find use cases of it. I considered this kind of instruction long ago, had a look and didn't find any, but you might be more lucky. IOW studying complete routines would bring more info. Quote:
Quote:
REVL maybe ? Quote:
Quote:
In addition, the short immediate being an addressing mode, the target must be a register ('xcept for move). It means you can't use short immediates for memory. It would be kinda strange to be able to do ADD.L #$1234.W,D0 and not ADD.L #$1234.W,(A0), where we can do ADD.L #$1234,(A0). No good for orthogonality - if you care about that. Quote:
Actually it could have been reduced by half if using the trick to use '1111' as register for operations that don't use it (bitfields can be useful for An but not A7). This means we would only have 4 opcodes : 00 BFEXTU/BFTST, 01 BFEXTS/BFCHG, 10 BFINS/BFCLR, 11 BFFFO/BFSET. Version without the register is selected if that register is A7. Then we could add new bitfield ops : BFREV, BFEXG, BFCMP, maybe even BFASL, BFLSR. Or a simple BFEXT which extracts the field without extending it, keeping the other bits in the target. But the BF are complicated enough the way they are (for HW), so i believe we're ok with what we already have... Quote:
That would be called hardware autovectorization. Then it would be potentially beneficial to every program, not just ones that make the effort to use those filthy vector extensions. And the next gen could have better performance without rewriting any program. Just dreaming... |
|||||||||
08 August 2016, 21:24 | #26 | |||
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
Quote:
Quote:
Code:
jmp ([d16,An]) jsr ([d16,An]) True but they result in a bigger encoding (not taking any other supporting code into account). In my own code i just jump into a table of branches. I know it is not top for performance but it is great for flexibility, and sometimes i just really want to be able to do that sort of thing. Quote:
As for the whole branch prediction subject, i did wonder if different condition codes have different branch frequencies. Does a BNE get taken more often than a BEQ? Or a BVS? |
|||
08 August 2016, 22:23 | #27 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
False conditions are probably taken more frequently than true conditions in average but i doubt it goes very far. |
|
09 August 2016, 00:18 | #28 | ||||||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
Quote:
Quote:
The assembler peephole optimizer under most compilers will only do optimizations where the cc flags are set the same. The vbcc 68k backend suffers because Volker assumed the assembler could do many peephole optimizations which ended up not being possible. The 68k setting the cc all the time is good for code density but makes peephole optimizing and instruction scheduling more challenging. Quote:
Quote:
Quote:
Quote:
|
||||||||
09 August 2016, 01:10 | #29 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
(bd16,PC,Rn.Size*Scale) range is +32767 to -32768 bytes (bd20,PC,Rn.Size*Scale) range is +524287 to -524288 bytes (bd32,pc,Rn.Size*Scale) range is 2147483647 to -2147483648 (bd20,PC,Rn.Size*Scale) would allow about a 1MB all PC relative executable compared to about a 65kB all PC relative executable with (bd16,PC,Rn.Size*Scale) while giving the same size instruction. True but they result in a bigger encoding (not taking any other supporting code into account). Code:
move.l (d16,An),An ; 4 bytes jmp (An) ; 2 bytes Code:
jmp ([d16,An]) ; 6 bytes Quote:
Always not taken ~40% correct Always taken ~60% correct BTFN ~65% correct Semi-Static hint bit with profiling ~75% correct |
||
09 August 2016, 09:30 | #30 | |||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
Quote:
Quote:
Not necessarily. Sounds like BTST but doesn't operate on a single bit. Quote:
Quote:
Quote:
But how many times is AND.L of a small constant needed ? Do you have some statistics on this ? Quote:
Quote:
Quote:
Quote:
Current SIMD needs extra-large registers which can't be feed with the DCache (too large) and have problems with memory latency, nullifying a large part of their potential. Furthermore they're used on fixed size data, which doesn't match real life needs where data isn't necessarily a nice multiple of your SIMD size. They rely on either handwritten asm (a dead end as it makes asm writing more complicated), cumbersome vector datatypes (another dead end as the casual programmer won't use them), or autovectorization features of the compiler (which can only do trivial cases, when it can do something). I don't know exactly how some DSP's hardware loops work, but they look and feel like SIMD without extra instructions. Does that mean that the hint bit only provides a 10% gain ? |
|||||||||||
09 August 2016, 11:05 | #31 | ||||
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
Quote:
If you load an executable into an AMOS memory bank, you only get the first code hunk. It doesn't process the RELOC table. And even if it did, it throws it away so there is no way to re-RELOC it when you save your program and load it in again. Also when writing AMOS extensions, the compiler will pull only the extension functions that are actually used out of the executable and concatenate them, with no RELOC data, you use special macros to define branches to one function from another. Actually it is horrible, because it just goes through the file looking for some specific codes, so some of your data might accidentally match! But this is what i've got to work with... Quote:
Quote:
Quote:
|
||||
09 August 2016, 11:28 | #32 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
What about starting by using word size offsets instead ? Quote:
This won't help much though, as BEQ is the most occuring branch type whereas BVS is relatively rare. |
|||
09 August 2016, 11:36 | #33 | ||
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
Quote:
As for extensions, it might be worthwhile to create OS libraries with all the functionality and then just have a thin wrapper as the AMOS extension. Quote:
|
||
09 August 2016, 11:49 | #34 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
Trivial yes, but for what gain ? BVS is one branch out of something like 2000. |
||
09 August 2016, 12:21 | #35 | |||
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
Quote:
But yeah. Roll on AMOS 3 which isn't terrible in myriad ways? I dunno. The personal answer must be "stop using it" but it's convenient as a development environment. Quote:
Quote:
But anyway, supposing we had instruction that did this in 4 bytes: Code:
add.w d16(An),An jmp (An) Last edited by Mrs Beanbag; 09 August 2016 at 12:28. |
|||
09 August 2016, 17:33 | #36 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,332
|
Quote:
Quote:
Did you mean, rather : Code:
add.w (An,Dn.w*2),An jmp (An) Code:
lea table(pc),An add.w (An,Dn.w*2),An jmp (An) Code:
jmpt table(pc),Dn |
||
09 August 2016, 19:30 | #37 | ||
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
Quote:
Quote:
Code:
move.l (An),Am add.w d16(Am),Am jsr (Am) Also if inheritance is used, the relevant function might not actually be in the same class or module as the vtable. Where a base class function is not overridden, the vtable for the derived class might point directly to the base class functions. So relative offsets may not be the best choice in this scenario. Although if these kinds of objects are to be loaded and linked dynamically, i suppose more than just the OS reloc tables will be needed. Also i have problems in my own code, aside from AMOS's foolishness, i compress my executables in my own format, so how to do relocs then? I can process the reloc tables myself into whatever format i can use, i know it's not a hugely complex procedure being as it's just going through a list of offsets and adding on the base address, but so far, i have not bothered to do it since i haven't needed to. |
||
09 August 2016, 22:56 | #38 | ||||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Maybe. Consistent and predictable performance is more important for embedded processors where most modern processors have good peak performance but can spend many cycles delaying without executing much code at all. A branch hint bit may help smooth out performance and save cycles for repetitive and predictable tasks common in the embedded market. Is an enhanced 68k CPU more likely to be used for embedded or desktop purposes? Quote:
I've seen worse . Quote:
Quote:
My statistics were more like a random sampling than a full statistical study. I recall cases where I saw 5%+ code density improvements with vbcc generated code with a combination of OP.L #data.w,Dn and MVS/MVZ where SAS/C generated code sometimes didn't even have a 1% improvement. I was searching for particular instruction pairs so it is likely that more gains would be possible with a compiler and peephole assembler aware of the new functionality. Quote:
Quote:
There are potentially more gains than the branch prediction success gains. Code can be better organized to fall through the branch increasing the instructions between branches, make better use of the ICache and improves code density in some cases. This is what happens when BTFN is the correct prediction but this is only ~65% correct. |
||||||
09 August 2016, 23:20 | #39 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
andi.l #$ff,Dn can be used to extend an unsigned byte to longword size. then again a special instruction for that could be better. ori.l #data.w,Dn, on the other hand, would be perfectly pointless.
add/sub.l #data.w,Dn would be very useful, however. I often need to do this. |
10 August 2016, 01:11 | #40 | |||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
Quote:
Quote:
The key here is that the cc is set the same allowing for a "safe" peephole optimization which can be used for compilers. This is true of the addressing mode I proposed which also sets the cc the same way. The cc of AND is commonly used. In fact, it is not unusual for the data to be thrown away. We could have an AND Dn,#data which would preserve the register but set the cc. It would probably be more common than the BTST Dn,#data which meynaf would like to extend to other sizes. Quote:
The new OP.L #data.w,Dn addressing mode would work with the following. ADD.L, AND.L, CMP.L, DIVx.L, MOVE.L, MULx.L, OR.L, SELcc, SUB.L It would not work with any of the OPI.L encodings but OPI.L #data,Dn could be converted to OP.L #data.w,Dn. |
|||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
BOOM (DOOM Enhanced) port to 68k | NovaCoder | News | 155 | 05 May 2023 12:26 |
ISA Ethernet Cards | jmmijo | support.Hardware | 13 | 03 February 2015 11:04 |
Any ISA Mach64 Information? | CU_AMiGA | support.Hardware | 21 | 09 September 2007 22:17 |
Help converting an 8bit ISA slot to 16bit ISA slot | Smiley | support.Hardware | 4 | 25 April 2006 11:20 |
A2000 ISA slots | Unknown_K | support.Hardware | 1 | 20 March 2005 09:48 |
|
|