Enhanced 68k ISA - Page 3

Mrs Beanbag · 10 August 2016, 11:36

Quote:

Originally Posted by matthey

You would need a special profiler for this. I'm not sure the right tool exists for the 68k. Motorola may have had tools like this but we are unlikely to ever see them.

hmm, it should be easy enough to build these kinds of tools into an emulator though, right?

meynaf · 10 August 2016, 14:01

Quote:

Originally Posted by matthey

Is an enhanced 68k CPU more likely to be used for embedded or desktop purposes?

It's not likely to be used anywhere at all, actually

Quote:

Originally Posted by matthey

Why not always append the .L and have REV.L? I really don't know what kind of data I'm reversing when I see REVL or REV.L though. BREV is better for me. If BREV (Bit Reverse) sounds like one bit to you then shouldn't BITREV be BITSREV and BYTEREV be BYTESREV as well?

Oh, damn it to hell. Let's just keep BITREV/BYTEREV and that's all

Quote:

Originally Posted by matthey

ARM's approach of declaring some encodings as undefined/reserved but not trapping is probably the most efficient. Trapping has its advantages and is more 68k like. It should be enough to be clear that other variations of the addressing mode are undefined/reserved whether they are trapped or not.

ARM's approach is quite chaotic. Features are there, or maybe not there, depending on the model. If on top of this they don't trap, it's gonna be very bad for them. They are lucky there's not much competition in their market segment and that their architecture is relative young.

Quote:

Originally Posted by matthey

My statistics were more like a random sampling than a full statistical study. I recall cases where I saw 5%+ code density improvements with vbcc generated code with a combination of OP.L #data.w,Dn and MVS/MVZ where SAS/C generated code sometimes didn't even have a 1% improvement. I was searching for particular instruction pairs so it is likely that more gains would be possible with a compiler and peephole assembler aware of the new functionality.

If you counted MVS/MVZ together with the short immediates, it's not very useful. We all know MVS/MVZ would bring extra density by themselves.

Quote:

Originally Posted by matthey

BFFFO is probably the most powerful BF instruction though. It removes a whole loop (BFCNT would have similar complexity and advantages). It would be possible to do a BFEXT first but it would make TLSFMem type memory handling less optimal. BFFFO is possible in 1-2 cycles which allows dynamic memory allocations to be 5x faster on the Apollo-core. This alone would be worth the effort to implement BFFFO if it could be implemented in the OS which is blocked in the case of the AmigaOS. AROS may seize the opportunity as AmigaOS dies. We need 3 incompatible flavors of AmigaOS after all

.

Do you have many code examples where BFFFO isn't applied on full 32 bits ? Personnally i have none.

Quote:

Originally Posted by matthey

DSPs are highly tuned, specialized and difficult to use also. A general purpose superscalar processor with some DSP like instructions can usually keep up but with more cost in resources.

DSPs aren't this different, are they ? At least for me they don't look very special. There's no reason something that works in a small DSP can't work in a fully featured cpu.

Quote:

Originally Posted by matthey

An SIMD is for when you go big into data processing and needs to process a lot of data at once to be worthwhile. It makes sense not to cache big data streams in most cases.

Current cpus have megabytes of cache, so it's better to use that rather than supporting memory latencies.

Quote:

Originally Posted by matthey

There are potentially more gains than the branch prediction success gains. Code can be better organized to fall through the branch increasing the instructions between branches, make better use of the ICache and improves code density in some cases. This is what happens when BTFN is the correct prediction but this is only ~65% correct.

The branch hint bit won't help organizing the code. The fact moving code around has drawbacks doesn't make the hint bit more efficient.

Mrs Beanbag · 10 August 2016, 20:55

Quote:

Originally Posted by meynaf

It's not likely to be used anywhere at all, actually

i was thinking the same, but didn't want to say it

pandy71 · 10 August 2016, 23:53

Quote:

Originally Posted by matthey

Hardware repeat/loop registers are good for performance because the register can't be changed in the loop unlike a general purpose (GP) register. However, they are generally less flexible and the register lost may be better used as GP. Loops not of the decrement and branch type are better off with another GP register. The 68k already has a DBcc instruction but it is challenging for performance because the loop register can change inside the loop. There are several possibilities for dealing with this. Maybe the now unused lowest bit in the displacement could be set which would tell the CPU that the loop counter is not changed in the loop and optimizations can be made (perhaps the CPU itself could also set the bit after using loops which don't change the counter). We have the cards we were dealt with with the 68k. We also need good compatibility. I would rather focus on improving the performance of what we have before bolting on a bunch of foreign loop instructions and making processors deal with too many loop variations.

My point was that such feature together with for example FMADD can be used to perform very fast common DSP tasks - https://en.wikipedia.org/wiki/Multip...ly.E2.80.93add
in fact this is one of the most important instructions (common operations in DSP world) - hope this can open more advanced DSP possibilities for developers... Adding to this circular buffer addressing and reverse order addressing may remove necessity to use external DSP.

Mrs Beanbag · 11 August 2016, 12:53

FMA is useful in a whole range of contexts other than DSP, it can be used in 3D graphics to do all sorts of vector arithmetic: dot products, matrix multiplication..

pandy71 · 11 August 2016, 16:36

Quote:

Originally Posted by Mrs Beanbag

FMA is useful in a whole range of contexts other than DSP, it can be used in 3D graphics to do all sorts of vector arithmetic: dot products, matrix multiplication..

But all this is DSP...

I think that few instructions together with repeat instruction(s) may be highly beneficial for lack of hardware functionality i.e. virtualizing hardware in some form of the software but efficient DMA-like behaviour - substitute for lack of dedicated hardware.

Mrs Beanbag · 11 August 2016, 19:21

DSP = Digital Signal Processing ?

anyway i kind of fell like this sort of stream processing could be done better off-chip, like some kind of blitter.

matthey · 11 August 2016, 20:22

Quote:

Originally Posted by meynaf

It's not likely to be used anywhere at all, actually

Unfortunately

.

Quote:

Originally Posted by meynaf

Oh, damn it to hell. Let's just keep BITREV/BYTEREV and that's all

Names are easy to change so no worries. We should have a larger group to decide names by poll if necessary.

Quote:

Originally Posted by meynaf

ARM's approach is quite chaotic. Features are there, or maybe not there, depending on the model. If on top of this they don't trap, it's gonna be very bad for them. They are lucky there's not much competition in their market segment and that their architecture is relative young.

ARM's approach is fast and it works even if it is crude. ARM's low end processors are very efficient and difficult to compete with. They made a name for themselves in power efficiency. Performance processors with power efficiency are another matter but they still have the name and sometimes a name is still difficult to compete with. The ARM architecture is not really that young as it started in the 1980s.

Quote:

Originally Posted by meynaf

If you counted MVS/MVZ together with the short immediates, it's not very useful. We all know MVS/MVZ would bring extra density by themselves.

Yes. I should have separated out the results but at the time I wanted to get an idea of how much overall code density improvement there would be by enhancements.

Quote:

Originally Posted by meynaf

Do you have many code examples where BFFFO isn't applied on full 32 bits ? Personnally i have none.

TLSFMem and my builtin.lib for .w size of clz().

Quote:

Originally Posted by meynaf

DSPs aren't this different, are they ? At least for me they don't look very special. There's no reason something that works in a small DSP can't work in a fully featured cpu.

Some DSPs are more specialized and some are more general purpose. Usually the more specialized ones are more difficult to program but cheaper as they can do more with less. There is not a big difference between the more general purpose DSPs and a general purpose CPU. Some general purpose processors have added cheap DSP extensions like ARM. Later they added SIMD varieties which make good high end DSPs.

Quote:

Originally Posted by meynaf

Current cpus have megabytes of cache, so it's better to use that rather than supporting memory latencies.

Prefetch is good but it is nice not to kick out all the variable DCache when loading large streams.

Quote:

Originally Posted by meynaf

The branch hint bit won't help organizing the code. The fact moving code around has drawbacks doesn't make the hint bit more efficient.

A hint bit can help organize the code because the branch logic can be reversed and the fall through case used if it is known. It can result in more readable code while optimizing branches. I did some branch optimization of code on the 68060 which looked like spaghetti while being larger but tested as the fastest.

The most important aspect of the hint bit is that it is practically free. Also, nobody has to use it unless they want to.

Mrs Beanbag · 11 August 2016, 20:28

Quote:

Originally Posted by matthey

The most important aspect of the hint bit is that it is practically free. Also, nobody has to use it unless they want to.

it uses a bit which could be used instead to encode another whole set of instructions... but never mind that

All the condition codes also have an "opposite" condition code, don't they? GE <> LT, CC <> CS &c... so if you want a branch "reversed" without having to change the actual direction of the branch and the resultant spaghetti, you can just use the opposite condition code instead, right? I don't see the problem here. I do see the problem of compilers not being able to tell at compile time whether a branch is likely to be taken or not... whichever technique is used i suspect it is only of use to ASM coders.

meynaf · 11 August 2016, 21:03

Quote:

Originally Posted by matthey

ARM's approach is fast and it works even if it is crude. ARM's low end processors are very efficient and difficult to compete with. They made a name for themselves in power efficiency. Performance processors with power efficiency are another matter but they still have the name and sometimes a name is still difficult to compete with. The ARM architecture is not really that young as it started in the 1980s.

Let's see what it will give in the long term. ARM is young in the sense it doesn't have a huge history behind it. I don't think having a variable instruction set like this, is good for compiler support and that's something it'll pay later.

Quote:

Originally Posted by matthey

Yes. I should have separated out the results but at the time I wanted to get an idea of how much overall code density improvement there would be by enhancements.

Now that you have the rough idea, it may be time to separate the results

Quote:

Originally Posted by matthey

TLSFMem and my builtin.lib for .w size of clz().

May i see the actual code ?

Quote:

Originally Posted by matthey

Prefetch is good but it is nice not to kick out all the variable DCache when loading large streams.

You have to have the data ready when it's needed. If you don't prefetch in some way it won't. If you have to transfer that data back to normal registers because some op isn't supported, it'll go again in memory and this has better be cache.
Another problem is that the memory must be able to fulfill the bandwidth requirements. If some work is made at copymem speeds, the number of inner instructions doesn't count much.

Quote:

Originally Posted by matthey

A hint bit can help organize the code because the branch logic can be reversed and the fall through case used if it is known. It can result in more readable code while optimizing branches. I did some branch optimization of code on the 68060 which looked like spaghetti while being larger but tested as the fastest.

You can reorganize the code, reverse the branch logic, and whatever, without any hint bit at all.

Quote:

Originally Posted by matthey

The most important aspect of the hint bit is that it is practically free. Also, nobody has to use it unless they want to.

I wouldn't use it as an asm programmer and i doubt any compiler will.

matthey · 11 August 2016, 21:43

Quote:

Originally Posted by pandy71

My point was that such feature together with for example FMADD can be used to perform very fast common DSP tasks - https://en.wikipedia.org/wiki/Multip...ly.E2.80.93add
in fact this is one of the most important instructions (common operations in DSP world) - hope this can open more advanced DSP possibilities for developers... Adding to this circular buffer addressing and reverse order addressing may remove necessity to use external DSP.

I'm not so sure repeat instructions would be any faster. Primitive processor designs may be able to go into a simple loop mode but more advanced designs have more room for loop optimizations.

Yes, I had added FMADD/FMSUB evaluation instructions to the ISA worded vaguely. The purpose of FMA is clear enough for me but the usefulness on the 68k FPU is in question. The purpose of FMA is to get better accuracy faster for some algorithms. The 68k FPU calculates in extended precision which means calculations should have more accuracy than double precision FMA gives. It is faster to do the FMA intermediate round though. There is software which uses FMA and expects FMA so it would be nice to give them FMA. Implementing FMA in extended precision would give more accuracy yet for the same algorithms. It may shorten some floating point math support algorithms. There is a cost to FMA though. It needs 3 read ports and 1 write port which is more expensive than most instructions and more encoding space than most instructions. I'm not sure the best way to implement or use FMA on a 68k FPU and the FMSUB may be unnecessary. I updated the ISA on the first post to specify an IEEE 754 fused multiply-add for FMADD/FMSUB but it is still for evaluation.

I have FDIM, FSIGN, FMIN and FMAX documented as C99 function compatible hardware instructions which I believe would be good for DSP, 3D and general purpose uses. They are simple, common enough for general purpose use, easy to use and make compiler support easier.

Quote:

Originally Posted by Mrs Beanbag

FMA is useful in a whole range of contexts other than DSP, it can be used in 3D graphics to do all sorts of vector arithmetic: dot products, matrix multiplication..

Yes. Much more useful than just for DSP. They can also be used to improve the performance a little for Multiply+Add where maximum precision is not needed. These are common instruction pairs found in many places.

Quote:

Originally Posted by Mrs Beanbag

it uses a bit which could be used instead to encode another whole set of instructions... but never mind that

That would be dirty.

Quote:

Originally Posted by Mrs Beanbag

All the condition codes also have an "opposite" condition code, don't they? GE <> LT, CC <> CS &c... so if you want a branch "reversed" without having to change the actual direction of the branch and the resultant spaghetti, you can just use the opposite condition code instead, right? I don't see the problem here. I do see the problem of compilers not being able to tell at compile time whether a branch is likely to be taken or not... whichever technique is used i suspect it is only of use to ASM coders.

Yes, branches can and are turned around like this all the time (it does not work for floating point branches though). Compiler generated code would not be as pretty with one RTS at the bottom of functions if they didn't do this. The problem is, the compiler generated code is not branch optimized or it would often be ugly. Compilers usually don't have the information to branch optimize though. There are intrinsics which can tell the compiler like GCC's __builtin_expect() and profiling information can be provided (GCC's -fprofile-use and -fbranch-probabilities). Compiler generated code would generally keep the same flow as now which is optimal for fall through but generate hint bits as needed to improve branch prediction when the information is available.

P.S. I added 2 new evaluation addressing modes which give (d32,PC) and (d24,PC,Rn.Size*Scale). The encodings are a word shorter than the Full Extension Word encodings and should be significantly faster to decode. I did not add the new addressing modes to any instructions in the documentation nor did I give any info that these should be preferred where possible. I updated the ISA docs in the first post of this thread.

meynaf · 11 August 2016, 22:22

Quote:

Originally Posted by matthey

That would be dirty.

Do you really care about this ?

Mrs Beanbag · 11 August 2016, 22:57

i'll be honest i never really liked the Full Extension Word addressing modes anyway, you need to read the first extension word to know if it is Full or Brief, so you need to look in two or even three different words to know the full length of the instruction. Not very nice. Oh well.

Any (d32,An) and (d24,An,Rn.Size*Scale) ?

matthey · 12 August 2016, 01:28

Quote:

Originally Posted by meynaf

Let's see what it will give in the long term. ARM is young in the sense it doesn't have a huge history behind it. I don't think having a variable instruction set like this, is good for compiler support and that's something it'll pay later.

Yea, ARM has too many variants and so does x86/x86_64. We should have a CPUID instruction or register(s) but I hope it never becomes like them. Standardization is good and allows lower spec hardware to compete by software being closer to the hardware.

Quote:

Originally Posted by meynaf

May i see the actual code ?

TLSFMem you would have to disassemble but it uses many different variations of BFFFO. My builtin.lib use is rather simple.

Code:

clz16:
   bfffo d0{16:16},d0

It is just like a FF1.W D0. Sometimes those .w and .b sizes are useful

.

Quote:

Originally Posted by meynaf

You have to have the data ready when it's needed. If you don't prefetch in some way it won't. If you have to transfer that data back to normal registers because some op isn't supported, it'll go again in memory and this has better be cache.
Another problem is that the memory must be able to fulfill the bandwidth requirements. If some work is made at copymem speeds, the number of inner instructions doesn't count much.

It depends on the workload and type of processing. Prefetching is tricky but can make a huge difference in performance. Knowing what to do requires a low level view of every memory access making it difficult for compilers and programmers to get right.

Quote:

Originally Posted by meynaf

You can reorganize the code, reverse the branch logic, and whatever, without any hint bit at all.

You can't get the fall through path and branch prediction right every time though. Ok, there are a few nasty branches but that is where it is best to look for other options like Scc, BScc, SELcc type instructions and calculations. Profiling may be needed to find these branches.

Quote:

Originally Posted by meynaf

I wouldn't use it as an asm programmer and i doubt any compiler will.

The general support is already in GCC. It isn't even that difficult to use.

1) compile the program with -pg to turn on profiling
2) run the program to create a profile
3) compile with -fprofile-use=<profile_path> to use the profile

Step #3 turns on -fbranch-probabilities which is what optimizes branches. This is where the hint bit would be set if supported. A programmer might want to take a look at the profile with gprof to look for those nasty branches also.

Quote:

Originally Posted by meynaf

Do you really care about this ?

Yes. Absolutely. Where do I use a bit which changes the size of an instruction or split an instruction encoding into multiple sizes? Some of my encodings may be ugly but I hope none of them are dirty. I did fix DBcc.L to your encoding suggestion but it was only ugly and not dirty before.

Quote:

Originally Posted by Mrs Beanbag

i'll be honest i never really liked the Full Extension Word addressing modes anyway, you need to read the first extension word to know if it is Full or Brief, so you need to look in two or even three different words to know the full length of the instruction. Not very nice. Oh well.

The 68k developers mentioned one of their biggest mistakes as adding too many addressing modes when RISC processors were deleting them. Addressing modes can be very powerful and give more performance/cycle though. The biggest problem was how they encoded the new addressing modes. The Full Extension Word does require 2 words to be read to determine the instruction length but that was not as bad as the complex I/IS field which is ugly and has too many variations. Most of the issues can be worked around though. I have no complaints about the 68060 timings for the more complex addressing modes. Intel processors have their warts which are slower but they make the common case fast. I believe this is possible with the 68k as well.

Quote:

Originally Posted by Mrs Beanbag

Any (d32,An) and (d24,An,Rn.Size*Scale) ?

No. It is not possible the way I encoded them.

(d16,An) uses EA mode/register of 101 reg
(d8,An,Rn.Size*Scale) uses 110 reg
(bd,An,Rn.Size*Scale) uses 110 reg

(d16,PC) uses EA mode/register of 111 010
(d8,PC,Rn.Size*Scale) uses 111 011
(bd,PC,Rn.Size*Scale) uses 111 011
(d32,PC) uses 111 110
(d24,PC,Rn.Size*Scale) uses 111 111

There were 2 free modes but there is no room to encode a register. This is basically how meynaf suggested to encode (d32,PC) but I used a different slot as his used my OP.L #data.w,Dn addressing mode and this is more consistent since I added (d24,PC,Rn.Size*Scale) as well. These encodings seem natural and are very easy to decode. The instruction length can be determined from the instruction word which is better even than (d8,An,Rn.Size*Scale) which requires looking at 2 words.

meynaf · 12 August 2016, 10:30

Quote:

Originally Posted by matthey

It is just like a FF1.W D0. Sometimes those .w and .b sizes are useful

.

It could just have been :

Code:

 bfextu d0{16:16},d0
 ffo d0

This code doesn't look performance critical. D0 could be extended before (at no cost if it's not in a loop).
You're using bfffo because it's there, but you wouldn't have asked for it if it weren't.
Anyway we have it now, so unless we go the incompatible way this is useless talk.

Quote:

Originally Posted by matthey

You can't get the fall through path and branch prediction right every time though. Ok, there are a few nasty branches but that is where it is best to look for other options like Scc, BScc, SELcc type instructions and calculations. Profiling may be needed to find these branches.

Branches consist of 10% of overall instructions. You can't avoid them. So it's better to concentrate the resources on their implementation, rather than waste silicon on ways to avoid them.

Quote:

Originally Posted by matthey

The general support is already in GCC. It isn't even that difficult to use.

1) compile the program with -pg to turn on profiling
2) run the program to create a profile
3) compile with -fprofile-use=<profile_path> to use the profile

Step #3 turns on -fbranch-probabilities which is what optimizes branches. This is where the hint bit would be set if supported. A programmer might want to take a look at the profile with gprof to look for those nasty branches also.

That's not something the average coder will do and if that "general support" doesn't have the hint bit already, you can't count on GCC people to add it.

Quote:

Originally Posted by matthey

Yes. Absolutely. Where do I use a bit which changes the size of an instruction or split an instruction encoding into multiple sizes? Some of my encodings may be ugly but I hope none of them are dirty. I did fix DBcc.L to your encoding suggestion but it was only ugly and not dirty before.

So you don't see the relative branch with bit #0 special vs d16(pc) inconsistency as dirty ? Personnally i do.
In addition i don't differentiate ugly and dirty. For me if it's ugly, it's dirty.

Anything that adds special cases is to be avoided if possible.

By having SELcc move the condition field in an unusual position you create a special case.
By adding the hint bit you change the way PC-relative displacements are interpreted but not always.
By adding an addressing mode for short displacements you add a mode that's only valid in a few cases (i see the regular immediate addr mode as a bad choice as well).

Mrs Beanbag · 12 August 2016, 11:09

Quote:

Originally Posted by meynaf

Anything that adds special cases is to be avoided if possible.

By having SELcc move the condition field in an unusual position you create a special case.
By adding the hint bit you change the way PC-relative displacements are interpreted but not always.
By adding an addressing mode for short displacements you add a mode that's only valid in a few cases (i see the regular immediate addr mode as a bad choice as well).

Quote:

Originally Posted by meynaf

Quote:

Originally Posted by Mrs Beanbag

Quote:

Originally Posted by meynaf

This means you can now do MOVEA.B to use the same extend trick as MOVEA.W. Note : unsigned extend (more common for bytes).

Maybe more common but then inconsistent with d8(An,Dn) addressing mode, if you care about that.

It's true that i care more about usefulness than consistency.

hmm...

meynaf · 12 August 2016, 13:13

Quote:

Originally Posted by Mrs Beanbag

hmm...

You think you've spotted a contradiction ?
It's not all black and white, you see.
I won't let a small inconsistency reduce the usefulness, hence the unsigned An.B (and if we had d8(An,Dn.B) the d8 would be signed but Dn.B unsigned).
I don't see SELcc as really useful. I don't see the hint bit as useful at all. I am not against short immediates but against the way they are encoded.
So i maintain everything. I care more about usefulness than consistency and anything that adds special cases is to be avoided if possible.

Mrs Beanbag · 12 August 2016, 13:46

Quote:

Originally Posted by meynaf

You think you've spotted a contradiction ?

I don't much care for the word "contradiction", i prefer to call things a puzzle, or paradox.

Quote:

It's not all black and white, you see.
I won't let a small inconsistency reduce the usefulness, hence the unsigned An.B (and if we had d8(An,Dn.B) the d8 would be signed but Dn.B unsigned).
I don't see SELcc as really useful. I don't see the hint bit as useful at all. I am not against short immediates but against the way they are encoded.
So i maintain everything. I care more about usefulness than consistency and anything that adds special cases is to be avoided if possible.

Well i do agree with you on the hint bit in particular, i don't think it is necessary or useful (and if that bit is to be used at all i would suggest it to encode conditional BSR instead, and it need not be such a special case, since PC can just have its low bit wired to zero, which i'd be surprised if it isn't already).

But do tell me what exactly is the use of a byte sized address.

meynaf · 12 August 2016, 17:02

Quote:

Originally Posted by Mrs Beanbag

I don't much care for the word "contradiction", i prefer to call things a puzzle, or paradox.

Is this puzzle/paradox solved now ?

Quote:

Originally Posted by Mrs Beanbag

Well i do agree with you on the hint bit in particular, i don't think it is necessary or useful (and if that bit is to be used at all i would suggest it to encode conditional BSR instead, and it need not be such a special case, since PC can just have its low bit wired to zero, which i'd be surprised if it isn't already).

PC doesn't have its low bit wired to zero. If you do a jump or a branch to an odd address, then you'll get an address error. Same if an RTS pops an odd value. These are the only causes of address errors on 020+, btw.
Bugs causing jumps to bogus addresses, which end up into data such as text, with all these 6x codes, is a common cause for 80000003 errors.
I've already warned about the use of this bit, which would break any program using an odd address branch to deliberately trigger an exception.

Quote:

Originally Posted by Mrs Beanbag

But do tell me what exactly is the use of a byte sized address.

For a byte address, absolutely none. For a byte offset, a bit more. For byte data, a lot.
Remember that i am for more data uses for An registers - as i'm quite often out of data regs, but more rarely of address regs. I've even used address regs to represent R,G,B values

If you don't like the data/address register split, you should understand the use for this quite easily.

pandy71 · 12 August 2016, 19:51

Quote:

Originally Posted by Mrs Beanbag

DSP = Digital Signal Processing ?

anyway i kind of fell like this sort of stream processing could be done better off-chip, like some kind of blitter.

Yes, this DSP... hope you not consider matrix multiplication as non DSP task... same for dot product - in fact lot of graphical hardware was build around DSP.
DSP is not only audio processing.

And off CPU - yes but his add complexity and also usually is outdated (blitter 30 years ago was breakthrough - today most of CPU's are faster performing those OP's by software - as such i think adding some features that will improve CPU performance as pseudo HW is right direction)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
BOOM (DOOM Enhanced) port to 68k	NovaCoder	News	155	05 May 2023 12:26
ISA Ethernet Cards	jmmijo	support.Hardware	13	03 February 2015 11:04
Any ISA Mach64 Information?	CU_AMiGA	support.Hardware	21	09 September 2007 22:17
Help converting an 8bit ISA slot to 16bit ISA slot	Smiley	support.Hardware	4	25 April 2006 11:20
A2000 ISA slots	Unknown_K	support.Hardware	1	20 March 2005 09:48

11 August 2016, 12:53	#45
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	FMA is useful in a whole range of contexts other than DSP, it can be used in 3D graphics to do all sorts of vector arithmetic: dot products, matrix multiplication..

11 August 2016, 19:21	#47
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	DSP = Digital Signal Processing ? anyway i kind of fell like this sort of stream processing could be done better off-chip, like some kind of blitter.

11 August 2016, 22:57	#53
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	i'll be honest i never really liked the Full Extension Word addressing modes anyway, you need to read the first extension word to know if it is Full or Brief, so you need to look in two or even three different words to know the full length of the instruction. Not very nice. Oh well. Any (d32,An) and (d24,An,Rn.Size*Scale) ?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)