Apollo Team has new Cyclone 5 FPGA accellerator cards - Page 13

Megol · 22 August 2015, 12:37

Quote:

Originally Posted by meynaf

So once MOVEM.L is done, doing MOVEM.W is peanuts ? (As the special format is what's tough here, when you have it, you can reuse it, right ?)

Yes it is easy. The only complication is the sign extension but that is required elsewhere.

Quote:

But many instructions are in this case. They require a special handling that's not reused elsewhere, e.g. DIV is in this case.
Others like LINK need to use several µops.

DIV doesn't touch any critical part of the pipeline so that isn't a problem. In my design it was actually handled a lot like a load missing the cache, this means there is no extra hardware needed for the variable latency operation and starting the execution would be a "store-data" operation in the integer unit.

I never started implementing LINK.

Quote:

Of course if the problem comes from the total number of subops you have (i.e. the total for all instructions), i can understand it becomes a big problem.

If one have proper microcode support MOVEP isn't hard to execute without extra hardware but it would be slow. Unlike x86 the 68k doesn't really need full microcode support and adding it complicates a critical part - the decoder.

Quote:

MOVEM needs to be reasonably fast, while MOVEP does not. Isn't it easier when it can be slow ?
Basically it's just a bunch of shift + move.b, and these already exist in the cpu. It's not like if we want it to run in 1 clock.

IIRC the byte store/load starts with the MSB so one either have to do a BSWAP (x86 instruction - translates between little endian and big endian formats) or a rotate to place the data in the right position.
Or one could extend the ld/st unit to support byte operations targeting the MSB of a register. Even further the ld/st unit could be extended to support loading/storing an arbitrary byte of a register.

In a speed demon design changing the ld/st unit could lead to lower clock frequency as it touches a time critical part of the pipeline.
If the same design then doesn't have proper microcode support then it is very hard to execute MOVEP at all. Not because it is really hard per se but because it is a very bad fit for the design.

meynaf · 22 August 2015, 20:09

Quote:

Originally Posted by Megol

Yes it is easy. The only complication is the sign extension but that is required elsewhere.

So Gunnar's excuses for rejecting my MOVEM.B idea were invalid

(and the removal of MOVEM.W in the coldfire doesn't look very smart either)

Quote:

Originally Posted by Megol

DIV doesn't touch any critical part of the pipeline so that isn't a problem. In my design it was actually handled a lot like a load missing the cache, this means there is no extra hardware needed for the variable latency operation and starting the execution would be a "store-data" operation in the integer unit.

If DIV is no big deal, would an integer SQR be a problem ?

Quote:

Originally Posted by Megol

I never started implementing LINK.

Too bad. Why did you stop doing your 68k implementation, btw ?

Quote:

Originally Posted by Megol

If one have proper microcode support MOVEP isn't hard to execute without extra hardware but it would be slow. Unlike x86 the 68k doesn't really need full microcode support and adding it complicates a critical part - the decoder.

Other 68k instructions need microcode as well. Oh, wait. They're the 020+ insns everyone removes too

matthey · 23 August 2015, 03:43

Quote:

Originally Posted by meynaf

So Gunnar's excuses for rejecting my MOVEM.B idea were invalid

(and the removal of MOVEM.W in the coldfire doesn't look very smart either)

MOVEM.B is not particularly difficult to implement but has other potential issues.

1) Is there a logical encoding for it and is the encoding space taken worth the space used?
2) Is it consistent with the 68k? No other instructions allow sign extending a byte to a longword for addresses register destinations. Only allowing data register destinations for MOVEM.B really limits its value.
3) Would it be used enough to be worth implementing (cost benefit analysis)? Can compilers make good use of it? Does it save cycle or improve code density in practice?
4) Are there as many resources available for byte to longword extending as word to longword (less resources generally equates to less optimization possibilities)? The EA units allow only word to longword sign extension and allowing byte to longword extending in the EA may increase the mux size and slow the EA calculation.

Quote:

Originally Posted by meynaf

If DIV is no big deal, would an integer SQR be a problem ?

I doubt it would be a problem for most designs but wouldn't this introduce fixed point integers which aren't used anywhere else in the 68k? The range of fixed point integers is more limited than fp where the decimal point can float. These types of instructions generally take a lot of logic also. I can't say I've needed an integer square root very often either.

meynaf · 23 August 2015, 09:18

Quote: