68k details - Page 29

idrougge · 23 October 2018, 15:01

Quote:

Originally Posted by litwr

It sounds contradictory for me. Everybody had been happy with 68k and after a few moments they became unhappy... Why?! I can again say about quite popular Apple Macintosh which could successfully compete with IBM PC. Atari and Amiga had their respectable ecosystems. There was a world of 68k based Unix workstations. So your argument rather contrived for me.

What is contrived about it? The world chose the 68000 architecture, and when the world moved on, it didn't move on to the x86 architecture.

Quote:

Originally Posted by litwr

IMHO it is quite natural to consider that ARM began the era of fast processors and Motorola couldn't convert its huge ISA for the new technology fast enough. Motorola wanted to share DEC VAX success in the late 70s but it caused the necessity to share its failure too in the beginning of 90s.

1. The ARM has never been a fast processor. It has only been fast in relation to the engineering resources and transistors needed. Which is fine in itself.
2. The VAX was not a big factor in the 68000 design — it was one example of a design onto which most CPUs converged. In comparison, the x86 wanted to share the success of the 8080 of the 70s, a design which belonged in the 70s.

Quote:

Originally Posted by litwr

I can repeat Intel and ARM didn't follow DEC or IBM/370 - they just created better processors. On the contrary Motorola and National Semiconductor tried to create a processor with ISA similar to VAX.

The ARM is still more similar to the 68000 than to the x86.

The IBM/370 is a different beast altogether, and I begin to question your insight into CPUs when you repeatedly try to bunch it together with the VAX or 68000.

Quote:

Originally Posted by litwr

BTW computers for Unix also needed something better than 68k so Sun developed its famous SPARC processor for them and became the leader for that market. It didn' use x86 even despite of the presence of a terrible monster "IBM PC".

Exactly — Sun didn't go for the x86 despite the PC. Or, they actually did. The Sun/i386 was a very short-lived machine at the intersection between the 68k Sun/3 series and the Sparc Sun/4 series, but was not deemed as good enough.

Really, from a CPU design perspective, what makes you think that the x86 with its non-orthogonal design, modes-upon-modes of compatibility layers and jungle of instructions is more scalable than the 68000?

Tomislav · 27 October 2018, 13:33

Quote:

Originally Posted by meynaf

Yes 68020 does not have built-in memory management but a 68451 can be added to fix this - using that ability to connect coprocessors that you seem to dislike.

68451 is for use with 68000 and 68010. And 68851 is for 68020.

touko · 27 October 2018, 15:16

Quote:

1. The ARM has never been a fast processor. It has only been fast in relation to the engineering resources and transistors needed. Which is fine in itself.

What ??

If the ARM is not fast, what about the 68k then ??
The 68k is slow, it has a really good ISA for sure, but it's slow .

A 8mhz ARM2 can deliver 4 MIPS, i'll don't call this, not fast .

plasmab · 27 October 2018, 15:53

Quote:

Originally Posted by touko

What ??

If the ARM is not fast, what about the 68k then ??
The 68k is slow, it has a really good ISA for sure, but it's slow .

A 8mhz ARM2 can deliver 4 MIPS, i'll don't call this, not fast .

Yup. The 68K is much slower. The ARM has appalling code density but it’s much much faster. No microcode, fast interrupts, 3 stage pipeline, barrel shifter on every ALU instruction for free. Not sure who thinks this is slow?

chb · 27 October 2018, 17:42

Quote:

Originally Posted by plasmab

Yup. The 68K is much slower. The ARM has appalling code density but it’s much much faster. No microcode, fast interrupts, 3 stage pipeline, barrel shifter on every ALU instruction for free. Not sure who thinks this is slow?

If you compare ARM processors to other manufacturer's in CPUs from the same time frame, they are not particularly fast. The 1987 ARM2 is not faster than the 1987 68030/80386, which came at higher clock speeds. Same for the ARM3 compared to to 486 and 040. Clock-for-clock perfomance is impressive, but it's true to say that ARMs were never the fastest CPUs of their time. Even if they were extremely fast in relation to their transistor count - an ARM has about as many transistors as a 8086 and is about 10x faster at the same clock speed.
Oh, and at least early ARM CPUs do use some microcode, even if it is quite minimal:
http://www.righto.com/2016/02/revers...rocessors.html

plasmab · 27 October 2018, 19:37

Ok. So are what are we comparing here? 030 isn’t as nicely pipelined, there isn’t a barrel shifter on every instruction. It’s got more complex addressing modes and horrible interrupt latency compared to ARM. ARMv2 does 1 instruction per clock when optimised properly... 030 doesn’t come close to that.

So what if the ARMv1 was microcoded. It was never released!! ARMv2 was the first production chip.. except for a handful of BBC B evaluation coprocessor boxes.

Sure the 030s could reach 25MHz (vs 8MHz) but that’s mostly due to the manufacturing processes used .. VLSI vs Motorola rather than chip design.

litwr · 27 October 2018, 19:42

@Kalms Thank you very much for your story. I think I have almost the same point.

@Bruce Abbott Thank you very much for your information. Some details of it are new for me. BTW Amiga 1000 with HDD, indeed, was a fantastic machine in the 80s. What a stupidity was made Commodore to miss its chance to spread this computers more! I agree CPC Basic was very fast but Mallard Basic for PCW was even much faster! I am impressed by your z80 experience I also have some - http://litwr2.atspace.eu/xlife/retro/xlife8.html - I hope to find enough spare time to write an Amiga version too. IMHO to program z80 is much more difficult than 8086 or even 6502.

I do not understand your idea about Apple ][ to IBM PC transition. IMHO Apple ][ user went for Macintosh. IBM PC users were more natural from CP/M or TRS-80 users. IMHO the flexibility and power of the IBM PC architecture allow it be actual for more about 40 years!

@meynaf 236 bytes - it is quite a result! I need time to work with it.

Quote:

Originally Posted by roondar

As far as I understand it, the comparison is mostly litwr's attempt at proving the x86 has better code density than 68k. To do so, he has rewritten his example so that it uses a specific OS that happens to have a feature that allows him to get a much smaller program than otherwise would've been possible.

It is completely untrue. It was meynaf who used high level OS calls to make 68k code smaller. So he used a call to OS routines while I had to use codes for those routines which doesn't present in DOS. Then I rewrote my code eliminating some service auxiliary functions making the condition more equal for the both platforms.

Quote:

Originally Posted by idrougge

2. The VAX was not a big factor in the 68000 design — it was one example of a design onto which most CPUs converged. In comparison, the x86 wanted to share the success of the 8080 of the 70s, a design which belonged in the 70s.

VAX ISA is very similar to 68k: orthogonality, addressing modes with auto-increment and decrement, bit field instruction, double indirect addressing, ... - such things missed x86 completely. Indeed, x86 used the popularity of 8080 - what is wrong with it? IMHO Intel made a trap for Motorola, National Semiconductor and maybe even DEC itself - it announced "very powerful" iAPX 432...

Quote:

Originally Posted by idrougge

The ARM is still more similar to the 68000 than to the x86.

I agree they share some details but it is rather not intentionally. RISC ppl wanted the fastest CPU while 68k tried to reach several goals simultaneously. Intel in this sense is close to ARM because they wanted just to get the winner.

Quote:

Originally Posted by idrougge

The IBM/370 is a different beast altogether, and I begin to question your insight into CPUs when you repeatedly try to bunch it together with the VAX or 68000.

IMHO IBM/370 was a good mainframe, true 32-bit but very expensive. I have some experience of working with it when I was a student. Its descendants are still produced and have popularity. IMHO it was not the best way just to try to copy this architecture at a 8-bit or even 16-bit level.

Quote:

Originally Posted by idrougge

Really, from a CPU design perspective, what makes you think that the x86 with its non-orthogonal design, modes-upon-modes of compatibility layers and jungle of instructions is more scalable than the 68000?

x86 design could successfully convert to 6502/ARM-like fast electronics but 68k could not. Grond mentioned that Moto got a lot of money in the 70s and 80s but it couldn't provide the vision of the future for its processors. Apple and Unix-workstations producers chose other processors.

BTW. I wrote a drawline routine (http://eab.abime.net/showpost.php?p=...&postcount=463) for ARM processors. It takes 100 bytes for it (68000 - 72 bytes, 8086 - 84 bytes).

Code:

; r0/r1 - x0/y0, r2 - c, r3/r4 - dx/dy, r5/r6 - x1/y1, r7 - err, r8 - e2, r9/r10 - sx/sy
drawline:
       STMFD r13!,{R0-R10,R14}
       add r5,r0,r3   ;x1 = dx + x0
       add r6,r1,r4   ;y1 = dy + y0
       adds r3,r3,0   ;dx > 0
       mvnmi r9,-1    ;sx = -1
       movpl r9,1     ;sx = 1
       rsbmi r3,r3,0  ;dx = abs(dx)
       adds r4,r4,0   ;dy > 0
       movmi r10,-1   ;sy = -1
       movpl r10,1    ;sy = 1
       rsbpl r4,r4,0  ;dy = -abs(dy)
       add r7,r3,r4   ;err = dx + dy
loop:  bl putpixel
       cmp r5,r0      ;x0 == x1
       cmpeq r6,r1    ;y0 == y1
       LDMFDEQ r13!,{R0-R10,R15}       ;break
       add r8,r7,r7   ;e2 = 2 * err
       cmp r8,r4      ;e2 >= dy
       addge r7,r7,r4 ;err += dy
       addge r0,r0,r9 ;x0 += sx
       cmp r8,r3      ;e2 <= dx
       addle r7,r7,r3 ;err += dy
       addle r1,r1,r10  ;y0 += sy
       b loop

Yet another part of my article is ready. It is about Intel 8080 which is still produced and used! https://litwr.livejournal.com/2918.html It was not so good as 6502 but much better than Moto's 6800.

EDIT. The size of the code for ARM is 96 bytes now. Indeed it is larger than for 80386 or 68020 but it contains less instructions.

litwr · 27 October 2018, 19:48

@menaf I remember your difficulty about disassembly of x86 code. I have just found a help tool for you - http://shell-storm.org/online/Online...-Disassembler/

It's sad that it has missed 68k.

meynaf · 27 October 2018, 20:14

Quote:

Originally Posted by litwr

@menaf I remember your difficulty about disassembly of x86 code. I have just found a help tool for you - http://shell-storm.org/online/Online...-Disassembler/

It's sad that it has missed 68k.

Like the others, it does not seem to be always correct. For example, it shows no difference between F2 7x xx and 7x xx (bnd prefix).
This one does, however :
https://onlinedisassembler.com/odaweb/

But regardless of which disassembler i try, there are always some ambiguous opcodes or some unsupported extensions.

chb · 27 October 2018, 21:40

Quote:

Originally Posted by plasmab

Ok. So are what are we comparing here?

Well, my point was that in terms of absolute performance (not per clock) ARM CPUs were never among the fastest of their time. I guess that wasn't their goal anyway.

Quote:

Originally Posted by plasmab

030 isn’t as nicely pipelined, there isn’t a barrel shifter on every instruction. It’s got more complex addressing modes and horrible interrupt latency compared to ARM. ARMv2 does 1 instruction per clock when optimised properly... 030 doesn’t come close to that.

All true - and the ARM2 is clock-for-clock faster than the 030, no question. But it is limited by DRAM access speed (more exactly - cycle speed) - the original Archimedes 300 used 120ns RAMs at 8MHz, so already quite fast. For 16 Mhz 60ns DRAMs would be needed, not available in 1987 at consumer prices. ARM solved this with the ARM3 in 1989 by including 4kb of cache. But at this point already 486 and 040 were available, which are different beasts again, the 040 does most simple instructions in 1 clock, too... and the ARM3 did include neither FPU nor MMU btw. Not to speak of the SPARCs,MIPS' and PA-RISCs of the time.

Quote:

Originally Posted by plasmab

So what if the ARMv1 was microcoded. It was never released!! ARMv2 was the first production chip.. except for a handful of BBC B evaluation coprocessor boxes.

Well, the ARM1 was released as an evaluation system (add-on for the BBC micro), but that's nitpicking for sure. More important, ARMv1 and ARMv2 are very close - one of the new things in ARMv2 was a multiplication instruction, which was - surprise - implemented as a looping microcode sequence.

To quote wikichip:

Quote:

Originally Posted by https://en.wikichip.org/wiki/acorn/microarchitectures/arm2

The reason the decode is implemented in a number of separate units is because the ARM2 makes use of microcode ROMs (PLA). Each instruction is decoded into up to four µOP signal-wise. In other words, the ARM instructions are broken down into up to four sets of internal-µOP signals indicating things such as which registers to select or what value to shift by. For some complex operations such as block-transfers, the microsequencer also performs a looping operation for each register.

There's by the way nothing bad about using microcode in a CPU IMHO, where it is usefull.

Quote:

Originally Posted by plasmab

Sure the 030s could reach 25MHz (vs 8MHz) but that’s mostly due to the manufacturing processes used .. VLSI vs Motorola rather than chip design.

No, it's because DRAM speed could not cope with an 25 MHz ARM2 without cache - or performance per clock would be drastically degraded. It's also a different design philosophy, bit like Z80 and 6502.

plasmab · 27 October 2018, 22:44

Quote:

Originally Posted by chb

Well, my point was that in terms of absolute performance (not per clock) ARM CPUs were never among the fastest of their time. I guess that wasn't their goal anyway.

Well, the ARM1 was released as an evaluation system (add-on for the BBC micro), but that's nitpicking for sure. More important, ARMv1 and ARMv2 are very close - one of the new things in ARMv2 was a multiplication instruction, which was - surprise - implemented as a looping microcode sequence.

To quote wikichip:

There's by the way nothing bad about using microcode in a CPU IMHO, where it is usefull.

No, it's because DRAM speed could not cope with an 25 MHz ARM2 without cache - or performance per clock would be drastically degraded. It's also a different design philosophy, bit like Z80 and 6502.

Having implemented an ARMv2 CPU in verilog there isnt anything to microcode really. each instruction is practically bare inputs into the ALU. Maybe the LDx/STx instructions could cope with a tiny bit of microcoding.. but they are basically the exceptional instructions anyways.

The DRAM point is correct. Fixed in ARMv3.

Compare an ARMv3 with an 030.

idrougge · 28 October 2018, 02:53

Quote:

Originally Posted by touko

What ??

If the ARM is not fast, what about the 68k then ??
The 68k is slow, it has a really good ISA for sure, but it's slow .

A 8mhz ARM2 can deliver 4 MIPS, i'll don't call this, not fast .

I never mentioned the 68000, did I. Please compare the ARM of any generation to other contemporaneous RISC processors, not to the 68000.

litwr · 28 October 2018, 09:59

Quote:

Originally Posted by meynaf

EDIT: done - spigot version in the zone, 236 bytes - outputs 1000 digits, does not use OS formatting routines.
As you can see i can play your feature reduction game too.

Sorry but you have used unfair tricks again. My spigot implementation claims that it is the fastest but to prove this it requires a timer result. You have cut it!

So you code have proved nothing. The 80386 is still unbeaten.

With the help of ARM experts I have just improved my codes for the ARM's line drawing routine.

Code:

drawline:
       stmfd r13!,{r0-r10,r14}
       add r5,r0,r3   ;x1 = dx + x0
       add r6,r1,r4   ;y1 = dy + y0
       movs r9,r3,asr 31  ;dx > 0, sx = -1
       movpl r9,1     ;sx = 1
       rsbmi r3,r3,0  ;dx = abs(dx)
       movs r10,r4,asr 31  ;dy > 0, sy = -1
       movpl r10,1    ;sy = 1
       rsbpl r4,r4,0  ;dy = -abs(dy)
       add r7,r3,r4   ;err = dx + dy
loop:  bl putpixel
       cmp r5,r0      ;x0 == x1
       cmpeq r6,r1    ;y0 == y1
       ldmfdeq r13!,{r0-r10,r15}  ;break
       add r8,r7,r7   ;e2 = 2*err
       cmp r8,r4      ;e2 >= dy
       addge r7,r7,r4 ;err += dy
       addge r0,r0,r9 ;x0 += sx
       cmp r8,r3      ;e2 <= dx
       addle r7,r7,r3 ;err += dy
       addle r1,r1,r10  ;y0 += sy
       b loop

It is only 88 bytes now! The 80386's routine takes 84. ARM code uses only one jump but 80386 and 68000 codes - 7. So ARM has quite good code density when it is programmed properly. It is also notable that one assembly line almost always corresponds one line in the C-source code. What a beauty! BTW I have tested this code with my Raspberry Pi so it is 100% correct.

litwr · 28 October 2018, 10:22

I have checked the 386 code and without timer support it can be less than 180 bytes. BTW I have published the code for 80286 by a mistake, the fair code for 386 is less than 386 bytes.

plasmab · 28 October 2018, 10:34

Quote:

Originally Posted by idrougge

I never mentioned the 68000, did I. Please compare the ARM of any generation to other contemporaneous RISC processors, not to the 68000.

Its still unfair. ARMv2 is essentially the first revision of a CPU designed by two people. Whereas the 030 & contemporary x86 CPUs had much more experience to draw on and many more revisions to iron out mistakes.

On any given day (certainly once we were into the 90s) the Intel chips always kicked everything else's butt because as @grond pointed out they had better fabrication labs and could put more transistors on there.

It wasnt until the StrongARM that someone put good fab techniques into the ARM CPU. And boy did that make a difference.

Again always comparing apples and oranges. Are we comparing chips on a given day? iterations of an ISA? the ISA potential? or the ability of a manufacturer to access the best fab techniques? Pick one. Otherwise its all rhetoric.

However intel suffered from the same thing as the english language... because its so widely used it is the worst language with the longest history, most development and largest technical debt.

Thats unavoidable in any industry where you are successful because nobody wants to change a winning formula.. even if its got piles of crap in there.

litwr · 28 October 2018, 11:15

@plasmab English is very tough. IMHO nobody knows it perfectly even the native speakers!

With help I have made other improvements to ARM code - it is only 80 bytes now - it is less than the code for 80386 and close to size size of 68000 codes (72 bytes)!

Code:

drawline:
       stmfd r13!,{r0-r9,r14}
       add r5,r0,r3   ;x1 = dx + x0
       add r6,r1,r4   ;y1 = dy + y0
       movs r9,r3,asr 31  ;dx > 0, sx = -1
       movpl r9,1     ;sx = 1
       rsbmi r3,r3,0  ;dx = abs(dx)
       movs r8,r4,asr 31  ;dy > 0, sy = -1
       movpl r8,1    ;sy = 1
       rsbpl r4,r4,0  ;dy = -abs(dy)
       add r7,r3,r4   ;err = dx + dy
loop:  bl putpixel
       cmp r5,r0      ;x0 == x1
       cmpeq r6,r1    ;y0 == y1
       ldmfdeq r13!,{r0-r9,r15}  ;break
       cmp r4,r7,lsl 1   ;err*2 >= dy
       addlt r7,r7,r4 ;err += dy
       addlt r0,r0,r9 ;x0 += sx
       cmp r3,r7,lsl 1   ;err*2 <= dx
       addgt r7,r7,r3 ;err += dy
       addgt r1,r1,r8  ;y0 += sy
       b loop

plasmab · 28 October 2018, 12:56

Quote:

Originally Posted by litwr

@plasmab English is very tough. IMHO nobody knows it perfectly even the native speakers!

With help I have made other improvements to ARM code - it is only 80 bytes now - it is less than the code for 80386 and close to size size of 68000 codes (72 bytes)!

Code:

drawline:
       stmfd r13!,{r0-r9,r14}
       add r5,r0,r3   ;x1 = dx + x0
       add r6,r1,r4   ;y1 = dy + y0
       movs r9,r3,asr 31  ;dx > 0, sx = -1
       movpl r9,1     ;sx = 1
       rsbmi r3,r3,0  ;dx = abs(dx)
       movs r8,r4,asr 31  ;dy > 0, sy = -1
       movpl r8,1    ;sy = 1
       rsbpl r4,r4,0  ;dy = -abs(dy)
       add r7,r3,r4   ;err = dx + dy
loop:  bl putpixel
       cmp r5,r0      ;x0 == x1
       cmpeq r6,r1    ;y0 == y1
       ldmfdeq r13!,{r0-r9,r15}  ;break
       cmp r4,r7,lsl 1   ;err*2 >= dy
       addlt r7,r7,r4 ;err += dy
       addlt r0,r0,r9 ;x0 += sx
       cmp r3,r7,lsl 1   ;err*2 <= dx
       addgt r7,r7,r3 ;err += dy
       addgt r1,r1,r8  ;y0 += sy
       b loop

I make that 84 bytes.. 21 instructions x 32bits. Buts its still only 21 instructions. 68000 needs more *instructions*.

I'd be interested to see that implemented in thumb mode.

litwr · 28 October 2018, 16:15

Quote:

Originally Posted by chb

Well, my point was that in terms of absolute performance (not per clock) ARM CPUs were never among the fastest of their time. I guess that wasn't their goal anyway.

Indeed, systems with 486@25MHz or 68040@20MHz were faster than ARM based at 12MHz but their prices at 90 or 91 were above $10000. We can introduce new measurement - a performance unit per dollar. With it ARM systems were much better too.

@plasmab Indeed, it is 84, the same as the code for x86.

EDIT. The 80 bytes are real - https://stardot.org.uk/forums/viewto...=15941#p218910

plasmab · 28 October 2018, 16:48

Quote:

Originally Posted by litwr

Indeed, systems with 486@25MHz or 68040@20MHz were faster than ARM based at 12MHz but their prices at 90 or 91 were above $10000. We can introduce new measurement - a performance unit per dollar. With it ARM system were much better too.

@plasmab Indeed, it is 84, the same as the code for x86.

EDIT. The 80 bytes are real - https://stardot.org.uk/forums/viewto...=15941#p218910

The Acorn A4000 was a 12Mhz ARM250. It was released in 1992 and it cost 999 GBP. I have one of these machines.. Very simple.. IDE interface on board. Max 4Mb ram.

The A310 was released in 1987, was 8Mhz and was also under £1000.

grond · 28 October 2018, 17:53

Quote:

Originally Posted by plasmab

Having implemented an ARMv2 CPU in verilog there isnt anything to microcode really.

Well, ldm, stm, mul and divs all do well with microcode. Not much but not nothing either. Thanks God or rather Sophie Wilson that they did use microcode and not insist on the programmer using a bunch of primitive instructions to implement these instructions in macrocode...

I think MIPS and also the earlier SPARC required the programmer to use a sequence of shift-add/subtract instructions in order to make multiplications and divisions.

27 October 2018, 19:37	#566
plasmab Banned Join Date: Sep 2016 Location: UK Posts: 2,917	68k details Ok. So are what are we comparing here? 030 isn’t as nicely pipelined, there isn’t a barrel shifter on every instruction. It’s got more complex addressing modes and horrible interrupt latency compared to ARM. ARMv2 does 1 instruction per clock when optimised properly... 030 doesn’t come close to that. So what if the ARMv1 was microcoded. It was never released!! ARMv2 was the first production chip.. except for a handful of BBC B evaluation coprocessor boxes. Sure the 030s could reach 25MHz (vs 8MHz) but that’s mostly due to the manufacturing processes used .. VLSI vs Motorola rather than chip design.

28 October 2018, 10:22	#574
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	I have checked the 386 code and without timer support it can be less than 180 bytes. BTW I have published the code for 80286 by a mistake, the fair code for 386 is less than 386 bytes. Last edited by litwr; 28 October 2018 at 11:15.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

27 October 2018, 19:48	#568
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	@menaf I remember your difficulty about disassembly of x86 code. I have just found a help tool for you - http://shell-storm.org/online/Online...-Disassembler/ It's sad that it has missed 68k.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)