68k details - Page 51

JimDrew · 19 December 2018, 10:12

Quote:

...Over time full-software virtualization was available, but a ROM image was still necessary. Example virtualization software include ShapeShifter (not to be confused with the third party preference pane ShapeShifter), later superseded by Basilisk II (both by the same programmer who conceived SheepShaver, Christian Bauer), Fusion and iFusion (the latter ran classic Mac OS by using a PowerPC "coprocessor" accelerator card).
Virtual machines provide equal or faster speed than a Macintosh with the same processor, especially with respect to the m68k series due to real Macs running in MMU trap mode, hampering performance...

Wow... whoever wrote this has no clue what they are talking about... FUSION is a lot faster than the equiv. speed Mac because of me replacing MacOS traps with specialized code, and letting the Amiga handle certain things. The Mac is not some victim held hostage by the MMU.

In fact, FUSION is much faster with the MMU enabled because memory used for video storage can be mapped (4K pages) so a dirty bit is set when the Mac changes the video memory. Checking a bit for every 4K page (and updating as necessary) is way faster than blindly updating all of the video memory every refresh cycle! The MMU can remap the lower (CHIP) memory into fast memory as well to increase performance.

I happen to know the x86 and 68K extremely well, having written a full 68040 emulator for the x86, and a x86 emulator for the 68K - both of which were in written in assembly for their native CPUs. I also wrote iFUSION for the PPC. You can certainly kill a x86 program by having an INT function change a segment register. This is how some of the first virus programs worked. Calling a segment register a "MMU" is like calling an Atari ST an Amiga.

Litwr - It seems that you are a proponent of ARM and DOS, and are a bit blind to the facts that people (like Meynaf) are stating here. This is the wrong crowd to be arguing with on this particular subject - there are those here that have extensive backgrounds in CPU design and implementation. My expertise is in microcode level emulation of CPUs, with a history of working on CPU core projects for Motorola - besides all of the various software based emulations I have done for different computer systems. Maybe you are getting ARM and PPC confused, because that is one CPU where you can do bit manipulations (like invoke a shift) as an extended instruction operand. My 2 cents...

litwr · 23 January 2021, 18:46

Two years ago we had a fantastic thread which contains a magic number of posts - 1001.
Sorry I was very busy almost all this time and could not afford to continue this discussion. However I have gathered information for its continuation. I have made a lot of corrections in my blog entry about the 68k - I am sure that even people with good knowledge about the 68k can find several new pieces of information there.
I don't reply to some statements which seem wrong for me because I am not sure that the people who made them are here now. However if they appeared at EAB I would write responses. I would be very glad if meynaf could attend this new round again. His stubborn passionate position has impressed me very much.

I can only dare to point two topics now:
1) the role of MMU;
2) the code density and performance.
It is well known that the main function of MMU is the memory relocation. The memory protection is not necessary if we have correct software, it is rather a luxury which become cheap. The MMU relocation ability allows to make the fork()-call easily. The 8086 has albeit very poor such relocation functionality, but the 68000 doesn't have any of it. BTW MMUs in the Commodore 128 or Apple III have only functionality for the relocation. The MMU can also provide virtual memory support but it is the third its functionality which was out of our discussion.
I have just finished my project which I use as a source of experience about the Amiga and 68k, and as a source of data for performance and code density comparisons. It is on aminet, youtube and gifs for it, and some statistics.

The results show that for extensive work with tables and bits the 68000 is only slightly faster than the 8086 and for this type of processing the 68020 is much slower than the 80286. The code density of the 68000 is slightly worse than for the 8086. My conclusion is the x86 has slightly better code density in real mode and slightly worse in protected mode than the 68k.
I am just the truth seeker. I hope my materials help us to understand the 68k better. Thank you.

roondar · 23 January 2021, 18:59

Quote:

Originally Posted by litwr

Two years ago we had a fantastic thread

I wouldn't qualify that thread as fantastic. More as a massive dumpster fire which I sadly put far too much effort into. Won't be making that mistake twice.

Quote:

1) the role of MMU;

This is a short one to talk about, the 68000 was designed to be coupled to an external MMU (the Motorola 68451) in case such functionality was desired so it does not have any MMU instructions of it's own. This is common knowledge and, frankly, I don't buy for a second you didn't already know this.

Quote:

I am just the truth seeker. I hope my materials help us to understand the 68k better. Thank you.

It would be nice if this time round you actually did want the truth, as you certainly didn't want the truth last time.

And I want to make sure everyone in this thread understands that before it even starts: last time you were very, very clearly only interested in proving that the 8086 was better than the 68000 (as well as the 80286 being better than the 68020, etc). Anything that proved the opposite you either ignored or claimed to be false/irrelevant without a shred of evidence supporting you.

chb · 23 January 2021, 19:59

Quote:

Originally Posted by litwr

It is well known that the main function of MMU is the memory relocation. The memory protection is not necessary if we have correct software, it is rather a luxury which become cheap. The MMU relocation ability allows to make the fork()-call easily. The 8086 has albeit very poor such relocation functionality, but the 68000 doesn't have any of it.

Which operating systems that ran on the 8086 made use of fork()? Which operating systems that ran on the 8086 used relocation?

BTW, it's rather trivial on the 68000 to write relocatable code if you limit yourself to 32k per segment (similar to the 8086) and make all memory accesses relative to pc or an address register. AFAIK on the classic MacOS programs were completely relocatable using a similar technique (a5 being the segment base register). Oh, and then code is perfectly relocatable to any address, not just multiples of 64k.

Quote:

Originally Posted by litwr

The results show that for extensive work with tables and bits the 68000 is only slightly faster than the 8086 and for this type of processing the 68020 is much slower than the 80286. The code density of the 68000 is slightly worse than for the 8086. My conclusion is the x86 has slightly better code density in real mode and slightly worse in protected mode than the 68k.

That's a deeply flawed comparison, as you're comparing systems, not CPUs, but still pretending to do the latter. You mention it in your very own text that the 68020's speed is very much dependent on memory performance. Furthermore, it's of no use to just calculate a generations/MHz figure, because RAM speed does not scale with CPU clock speed. A 80286@16 MHz wouldn't be 2.5x faster than the 80285@6 MHz for that very reason.

Quote:

Originally Posted by litwr

I am just the truth seeker. I hope my materials help us to understand the 68k better.

Honestly, I have my doubts.

meynaf · 23 January 2021, 20:53

Quote:

Originally Posted by litwr

I would be very glad if meynaf could attend this new round again.

Why would I ? You will not change your mind if i tell you that you're wrong, will you ?

Quote:

Originally Posted by litwr

His stubborn passionate position has impressed me very much.

I don't think i was the stubborn one there. Neither passionnate, actually.
So unless you're fancy a multi-cpu code contest, nothing of interest will happen.

Quote:

Originally Posted by litwr

The results show that for extensive work with tables and bits the 68000 is only slightly faster than the 8086 and for this type of processing the 68020 is much slower than the 80286.

Nah, 68020 isn't slower than the 80286, let alone much slower. Actually, even 68000 is faster than similarly clocked 80286.

Quote:

Originally Posted by litwr

The code density of the 68000 is slightly worse than for the 8086. My conclusion is the x86 has slightly better code density in real mode and slightly worse in protected mode than the 68k.

It's not that simple.
For very small programs, yes, 68000 is slightly worse (sometimes). But the bigger the program becomes, the worse the x86's code density is, and it can easily reach 1.5 times the code size of similar 68k program.

Quote:

Originally Posted by litwr

I am just the truth seeker.

To me it looks more like bashing.

Quote:

Originally Posted by litwr

I hope my materials help us to understand the 68k better. Thank you.

We have enough 68k understanding here.
I am also afraid that the goal of your material isn't really to understand...

The situation is quite simple. If you say that the x86 has better implementation than 68k, then it might well be true. But if you're saying it has better instruction set, then sorry but this is just plain wrong.

Thomas Richter · 23 January 2021, 21:06

Quote:

Originally Posted by litwr

It is well known that the main function of MMU is the memory relocation.

No, there is much more about it. Memory relocation is one thing it can perform, but it also provides (optional) write-access to pages, defines the caching mode of pages such that chip or I/O registers are not cachable and is able to detect illegal accesses.

Quote:

Originally Posted by litwr

The memory protection is not necessary if we have correct software, it is rather a luxury which become cheap.

We don't have correct software. Humans make errors. Tools like MuForce do not exist without reason. They help you to improve the correctness of software, so to say, by reporting illegal accesses that would go into the wild otherwise.

Quote:

Originally Posted by litwr

The MMU relocation ability allows to make the fork()-call easily.

On ix-type operating sytems, yes. On the amiga, we don't have fork(), and on windows, we don't have fork(). Multi-threading works quite differently.

robinsonb5 · 24 January 2021, 00:57

It would be interesting to port the xlife program to an abstract hardware platform which offers just a UART to receive the pattern, a framebuffer to display it and a timer to time it - removing all other system-specific details from the equation.

It's then easy to compile the codebase as a virtual ROM image for various CPUs and compare the resulting sizes, without them being skewed by differing platform-specific system code. (I did something similar in the past - though this, too, is flawed: I only tested with one codebase, and I'm testing the compiler's code generation as much as the actual ISA. Doing science right is hard! http://retroramblings.net/?p=1414 )

To make a meaningful speed comparison it's necessary to time the computation in isolation, to avoid the results once again being skewed by the platform-specific display code - but even then you have to understand that you're testing the CPU and memory bus of a particular computer, and not the CPU itself.

litwr · 25 January 2021, 15:25

Quote:

Originally Posted by roondar

I wouldn't qualify that thread as fantastic. More as a massive dumpster fire which I sadly put far too much effort into. Won't be making that mistake twice.

But how could a fantastic number 1001 appear then? Such things can't be just accidental.

Quote:

Originally Posted by roondar

This is a short one to talk about, the 68000 was designed to be coupled to an external MMU (the Motorola 68451) in case such functionality was desired so it does not have any MMU instructions of it's own. This is common knowledge and, frankly, I don't buy for a second you didn't already know this.

https://apple.fandom.com/wiki/Motorola_68451 clear states that the 68451 was produced for use with the Motorola 68010 processor. So it is about 1982 and later time. Most popular systems that used MMU didn't use this chip. I am about the Apple Lisa, Tandy 16, Sun workstations, Apollo Computer, ... So I don't completely understand what are you about.

I have also just read some details about the 68451 - it was quite simple segmented MMU. IMHO Moto made it too late and it was probably expensive or not accessible in volumes.

Quote:

Originally Posted by roondar

It would be nice if this time round you actually did want the truth, as you certainly didn't want the truth last time.

And I want to make sure everyone in this thread understands that before it even starts: last time you were very, very clearly only interested in proving that the 8086 was better than the 68000 (as well as the 80286 being better than the 68020, etc). Anything that proved the opposite you either ignored or claimed to be false/irrelevant without a shred of evidence supporting you.

I am really sorry that you accepted my position this way. I just tried to show that the 8086 has its strong points too. Maybe my point that the main virtue of any CPU is its speed can irritate some old school theory followers. Sorry again. But it is just my position, anybody can have other. However even the 68k enthusiasts rather accepted the transition to the PPC architecture because it just allowed to emulate the 68k instructions because of the PPC higher performance...

Quote:

Originally Posted by chb

Which operating systems that ran on the 8086 made use of fork()? Which operating systems that ran on the 8086 used relocation?

BTW, it's rather trivial on the 68000 to write relocatable code if you limit yourself to 32k per segment (similar to the 8086) and make all memory accesses relative to pc or an address register. AFAIK on the classic MacOS programs were completely relocatable using a similar technique (a5 being the segment base register). Oh, and then code is perfectly relocatable to any address, not just multiples of 64k.

The 8086 was designed in an epoch when the PDP-11 mini-computers were the most popular computers in the world. The PDP-11 architecture can use only 64 KB for direct addressing. The best PDP-11 models can use two 64KB segments, for one data and one for code, and this expand direct address area to 128KB. Those best models can also use 18- or 22-bit address bus but it was rather intended to use with multitasking where each process size has a limit to 64 or 128 KB. There was possible to use code and data larger than 64 KB in one process but this feature was much less flexible than the 8086 segment register usage. So the 8086 was very similar to the PDP-11 in ways to work with memory. Sorry I know little about Unix variants for the 8086. Indeed segment registers could help to realize semi-Unix but they were not a complete MMU. However there were several such variants including a variant of famous Xenix. But MS-DOS very effectively uses relocation for its COM-files. We discussed this in detail in the previous round.

Even for the EXE format, segments can make relocation easier. BTW if you want cheap and effective MMU you just need two segment registers: for code and data. This was very successfully used on the most mass produced Unix computer, the Tandy-16 which was based on the 68000. The Apple Lisa seems had the same type of MMU too, but I am not sure. You don't have to think about relocation of your code if you have an MMU in your system.

Indeed better location independence can make object files less and linker work faster but all this is rather minor advantages. The Macintosh II and later models used MMU therefore.

Quote:

Originally Posted by chb

That's a deeply flawed comparison, as you're comparing systems, not CPUs, but still pretending to do the latter. You mention it in your very own text that the 68020's speed is very much dependent on memory performance. Furthermore, it's of no use to just calculate a generations/MHz figure, because RAM speed does not scale with CPU clock speed. A 80286@16 MHz wouldn't be 2.5x faster than the 80285@6 MHz for that very reason.

Why doesn't a 80286@15 MHz 2.5x faster than the 80285@6 MHz if it has RAM fast enough?! IRC the A1200 doesn't have wait states during memory access. The Xlife-8 blank screen mode, which was used to get final results about CPU performance, doesn't use any graphic or io operations.

Quote:

Originally Posted by chb

Honestly, I have my doubts.

I hope to dispel them.

Quote:

Originally Posted by meynaf

Why would I ? You will not change your mind if i tell you that you're wrong, will you ?

I am really happy that you here! You are a good coder and it covers other disadvantages.

Quote:

Originally Posted by meynaf

So unless you're fancy a multi-cpu code contest, nothing of interest will happen.

It would be nice to have an opportunity to have such a contest.

Quote:

Originally Posted by meynaf

Nah, 68020 isn't slower than the 80286, let alone much slower. Actually, even 68000 is faster than similarly clocked 80286.

Maybe for some artificially created tasks it can be true. If we need constantly to process a seven 32-bit component vector or to do often access to a large array then the 68000 can be a bit faster. It is like the case when for several contrived tasks the Z80 can be faster than the 6502 but it is generally accepted that the 6502 is about 2.5 times faster. For real tasks like your line drawing algo, my pi-spigot or Xlife-8 the 68k speed superiority is a clear fantasy. We have proved this in our previous round. Maybe you have other algos to implement? For a contest?

Quote:

Originally Posted by meynaf

It's not that simple. For very small programs, yes, 68000 is slightly worse (sometimes). But the bigger the program becomes, the worse the x86's code density is, and it can easily reach 1.5 times the code size of similar 68k program.

I agree that it is not an easy subject. But I can't understand why larger programs for the x86 GENERALLY ought to have worse code density, for me cases when x86 code is 1.5 larger are rather examples of poor x86 coding. Maybe somebody always for some kind of simplicity or personal laziness loads a segment register for every variable access... Let me also give you a cite "One drawback of linear addressing is that the code density of the actual machine instructions may suffer from the presence of many long addresses (the top byte of which is nearly always 0). This problem is alleviated somewhat by short (l6-bit) relative addressing modes. However, UNIX implementations prefer separate instruction and data areas, and, consequently, the relative addressing modes are used infrequently" - it from a solid Byte magazine - http://marc.retronik.fr/motorola/68K...d_the_MC68000_[BYTE_1986_12p].pdf

Quote:

Originally Posted by meynaf

To me it looks more like bashing.

It is sad to hear this from you. We have had some contests...

Quote:

Originally Posted by meynaf

We have enough 68k understanding here. I am also afraid that the goal of your material isn't really to understand...

Nobody is perfect I even dare to think it can be applied even to you.

Quote:

Originally Posted by meynaf

The situation is quite simple. If you say that the x86 has better implementation than 68k, then it might well be true. But if you're saying it has better instruction set, then sorry but this is just plain wrong.

My claim always is different. I have stated that the 68k instruction set has several flaws which I try explain as clear as possible in my updated blog entry. So there is no direct connection to the x86 architecture. However some of those flaws do not actually present in the x86 instruction set. Someone can conclude that the x86 is better therefore, but it is not my claim. It is because that we all know very well about the 68000 advantages over the 8086: larger number of registers, faster work with big arrays, faster 32-bit arithmetic, some useful addressing modes, ... However the 80286 advantages were too strong: faster memory access cycle, instant EA calculations, fast multiplications, super-fast division, built-in MMU. The 80286 became about 3 times faster than the 8086. It was a great leap. The 68020 became only about 2 times faster than the 68000. IRC we have discussed all this matter.

Now it is time for my reply to your 2-year old statements.

Quote:

Originally Posted by meynaf

But as soon as you want to bypass the 64k limit, you face the horror of it. Code is no longer position-independent like it used to be. Do a CALL FAR : you need a reloc.
In short, x86 segments were a nice idea only on the paper. In reality, they sucked and even intel understood this.

The idea of segment memory is quite good, I add several paragraph to my blog entry - https://litwr.livejournal.com/436.html - to explain this in detail. However it was fully realized only in the 80286. Xenix which used it was quite successful for rich customers. However MS-DOS was much cheaper, this made it much more popular and made our history. IMHO Intel's ppl couldn't imagine that 1 MB of memory would be so cheap in the mid 80s. It is also an irony that the 68k was intended primarily for use in expensive mini-computers but they finished as calculator chips in the 90s.

Quote:

Originally Posted by meynaf

Reading SR can't affect user code, that's true.
But supervisor code running in a VM as user code can be badly affected. This has been explained to you but you refuse to understand.

Normal supervisor code normally just set modes, it doesn't need to check it. So it is all contrived and far-fetched.

Quote:

Originally Posted by meynaf

Alas for you this doesn't work the way you think it does. Seems this "ARM shift effect" is your invention.

What the above code does in reality isn't what you believe it does, and is in fact even meaningless : you write with "decrement before" but read back with "increment before" and thus the block is meaningless because you don't read what you have written.

Real effect of the above (contents of memory) :
(R1 after write points here)
(long) written R2
(long) written R3
(long) written R4
(initial R1 points here, end R1 too)
(long) read R2
(long) read R3
(long) read R4

So of course there is no such effect as a shift you pretended.

Thanks, it is a case when I can clear show you that you are in error.

Let's execute the STMDB R1!,{R2,R3,R4} instruction. Let's assume that R1 has value 60. So at first R1 becomes equal to 56. Then R2 will be placed at location 56, then R1 will become equal to 52, then R3 will be placed at location 52, then R1 will become equal to 48, and finally R4 will be placed at location 48. This instruction is done. Let's check LDMIB R1!,{R2,R3,R4}. At first R1 becomes equal to 52. Then R4 will take the value from location 52 (here we lies the value of R3), then R1 will become equal to 56, then R3 will take the value from location 56 (here we lies the value of R2), then R1 will become equal to 60 and finally R2 will get value from location 60. So we have an obvious registers shift: R3 gets value of R2 and R4 gets value of R3. If you want the perfect cyclic shift just add one more instruction:

STR R4,[R1]
STMDB R1!,{R2,R3,R4}
LDMIB R1!,{R2,R3,R4}

Though we can make such a trick with the 68k too. It is

MOVE.L D2,-(SP)
MOVEM.L D2/D3/D4,-(SP)
ADDA.L #4,SP
MOVE.L (SP)+,D2/D3/D4

So my example doesn't show something very interesting and therefore I was rather wrong too.

However it is clear that the ARM has more flexible instructions than MOVEM.

I couldn't persuaded you that the 68040 was slightly slower than the 80486. Let's analyze data. Check http://www.lowendmac.com/benchmarks/ - it gives us the 68040:68030 performance ratio is close to 2.1:1. It is well known that the 80486:80386 performance ratio is close to 2-2.5:1. The 80386 is slightly faster than the 68030 because of the faster memory access cycle, instant EA calculation, faster division and multiplication. So we have the only conclusion... BTW the PowerPC 601 was at least 1.4 times faster than the 68040 at the same freq (2.9 for math co-pro) according to lowendmac benchmarks.

You also doubted that the ARM was slightly faster than the 80486. Let's check this - http://www.cpushack.com/CPU/cpu4.html#Sec4Part9 I can also add that the ARM could run DOOM like the 80486 could.

Quote:

Originally Posted by Thomas Richter

No, there is much more about it. Memory relocation is one thing it can perform, but it also provides (optional) write-access to pages, defines the caching mode of pages such that chip or I/O registers are not cachable and is able to detect illegal accesses.

Thank you about details concerning caching. Indeed it is very important too, especially on modern fast systems.

Quote:

Originally Posted by Thomas Richter

We don't have correct software. Humans make errors. Tools like MuForce do not exist without reason. They help you to improve the correctness of software, so to say, by reporting illegal accesses that would go into the wild otherwise.

Well but let's apply this logic to Amigas.

Quote:

Originally Posted by Thomas Richter

On ix-type operating sytems, yes. On the amiga, we don't have fork(), and on windows, we don't have fork(). Multi-threading works quite differently.

Sorry I know little about Microsoft Windows ways implementation of multitasking but historically for the 68k architecture based systems the fork-call was the top priority.

Quote:

Originally Posted by robinsonb5

It would be interesting to port the xlife program to an abstract hardware platform which offers just a UART to receive the pattern, a framebuffer to display it and a timer to time it - removing all other system-specific details from the equation.

It's then easy to compile the codebase as a virtual ROM image for various CPUs and compare the resulting sizes, without them being skewed by differing platform-specific system code. (I did something similar in the past - though this, too, is flawed: I only tested with one codebase, and I'm testing the compiler's code generation as much as the actual ISA. Doing science right is hard! http://retroramblings.net/?p=1414 )

To make a meaningful speed comparison it's necessary to time the computation in isolation, to avoid the results once again being skewed by the platform-specific display code - but even then you have to understand that you're testing the CPU and memory bus of a particular computer, and not the CPU itself.

Sources of Xlife are open, so it is not very difficult. Xlife depends only on stdlibc++ and Xlib. We need just to remove all graphics and io. Maybe things can be easy if to use an older and simpler versions of Xlife. Xlife-8 can be handled this way too.

Thanks for the link! It is interesting that the result for the 68k is exactly between the x86 and x86-64.

There is a screen blank mode for this case. It is pure 100% CPU calculations, no anything platform specific - the ER values calculated only based on results got from this mode.

roondar · 25 January 2021, 15:55

Quote:

Originally Posted by litwr

But how could a fantastic number 1001 appear then? Such things can't be just accidental.

Mainly because you consistently refused to accept any position or facts you didn't agree with, kept moving the goal posts and kept saying things that needed to be countered to keep some level of truth in that thread.

A less cynical poster than me might just say that you were rather expertly trolling some of us (myself included).

Quote:

https://apple.fandom.com/wiki/Motorola_68451 clear states that the 68451 was produced for use with the Motorola 68010 processor.

Meanwhile, other sources don't agree with this. Here's a quote from Wikipedia (which doesn't make the claim that it's just for the 68010):

Quote:

Originally Posted by wikipedia

The MC68451 supported a 16 MB address space and provided a MC68000 or a MC68010 with support for memory management and protection of memory against unauthorized access.

And here's a link to the original data sheet for the MC68451 by Motorola, which doesn't even mention the 68010 (indicating it was probably written prior to its release).

https://www.datasheetarchive.com/pdf...M&term=MC68451

Sadly, Google doesn't appear to be able to pinpoint the exact year in which the 68451 was released. That would've been useful information in this regards, but sadly I can't find it.

Quote:

I am really sorry that you accepted my position this way. I just tried to show that the 8086 has its strong points too.

Which has no relation to "68000 details" at all. This rather neatly proves my point. Thank you for finally admitting your original thread never was about learning interesting facts about the 68000 for you to begin with.

a/b · 25 January 2021, 16:46

Quote:

MOVE.L D2,-(SP)
MOVEM.L D2/D3/D4,-(SP)
ADDA.L #4,SP
MOVE.L (SP)+,D2/D3/D4

What?

If you want to solve that problem with a shovel:

Code:

  MOVE.L D2,-(SP)
  MOVEM.L D3/D4,-(SP)
  MOVEM.L (SP)+,D2/D3/D4

Or simply, the M68K way:

Code:

  exg d2,d3
  exg d3,d4

meynaf · 25 January 2021, 17:06

Quote:

Originally Posted by litwr

I am really happy that you here! You are a good coder and it covers other disadvantages.

Sigh.

Quote:

Originally Posted by litwr

It would be nice to have an opportunity to have such a contest.

IIRC last time you said you didn't have enough time to write the code.

Quote:

Originally Posted by litwr

Maybe for some artificially created tasks it can be true. If we need constantly to process a seven 32-bit component vector or to do often access to a large array then the 68000 can be a bit faster.

68000 is faster for the simple reason it has superior instruction set and addressing modes - and more registers.

Quote:

Originally Posted by litwr

It is like the case when for several contrived tasks the Z80 can be faster than the 6502 but it is generally accepted that the 6502 is about 2.5 times faster.

6502 faster than z80 ? First time i read this. Looks false, z80 has more complete instruction set. A friend's Amstrad CPC also was quite faster than my old Oric.

Quote:

Originally Posted by litwr

For real tasks like your line drawing algo, my pi-spigot or Xlife-8 the 68k speed superiority is a clear fantasy. We have proved this in our previous round.

For your pi-spigot you did outright cheating, my line drawing code was clearly better as written for 68k, and your xlife-8 i haven't the time to check but you seem to have benchmarked your own code - not well written i assume (and perhaps intentionnally).

Quote:

Originally Posted by litwr

Maybe you have other algos to implement? For a contest?

That's very possible. But we need something big enough for this.

Quote:

Originally Posted by litwr

I agree that it is not an easy subject. But I can't understand why larger programs for the x86 GENERALLY ought to have worse code density, for me cases when x86 code is 1.5 larger are rather examples of poor x86 coding. Maybe somebody always for some kind of simplicity or personal laziness loads a segment register for every variable access...

x86 has less registers and not enough flexibility in addressing modes. Consequently it takes more data moving. As simple as that.

Quote:

Originally Posted by litwr

Let me also give you a cite "One drawback of linear addressing is that the code density of the actual machine instructions may suffer from the presence of many long addresses (the top byte of which is nearly always 0). This problem is alleviated somewhat by short (l6-bit) relative addressing modes.

This is also why x86 in protected mode has even worse code density.

Quote:

Originally Posted by litwr

However, UNIX implementations prefer separate instruction and data areas, and, consequently, the relative addressing modes are used infrequently" - it from a solid Byte magazine - http://marc.retronik.fr/motorola/68K...d_the_MC68000_[BYTE_1986_12p].pdf

Your link is broken (error 404).

Quote:

Originally Posted by litwr

It is sad to hear this from you. We have had some contests...

What could i say. You wrote so many things in the past that are just wrong. And it seems your blog didn't change much since.

Quote:

Originally Posted by litwr

Nobody is perfect I even dare to think it can be applied even to you.

Listen, pal. I don't think there is a lot of people who know the 68k shortcomings better than i do. It's not for nothing i have written my own instruction set and implemented it in a vm. You can't call me biased toward 68k or something like this. Nor can you charge me of lack of knowledge. Too much experience, you see.

Quote:

Originally Posted by litwr

My claim always is different. I have stated that the 68k instruction set has several flaws which I try explain as clear as possible in my updated blog entry. So there is no direct connection to the x86 architecture. However some of those flaws do not actually present in the x86 instruction set. Someone can conclude that the x86 is better therefore, but it is not my claim. It is because that we all know very well about the 68000 advantages over the 8086: larger number of registers, faster work with big arrays, faster 32-bit arithmetic, some useful addressing modes, ... However the 80286 advantages were too strong: faster memory access cycle, instant EA calculations, fast multiplications, super-fast division, built-in MMU. The 80286 became about 3 times faster than the 8086. It was a great leap. The 68020 became only about 2 times faster than the 68000. IRC we have discussed all this matter.

Memory access speed, EA calculation time, speed of mul & div, are all purely speed issues and have absolutely nothing to do with instruction set qualities.
And also remember that 80286 is a buggy cpu - once you enter the protected mode it's impossible to leave it.

Quote:

Originally Posted by litwr

Now it is time for my reply to your 2-year old statements.

I don't even remember where it was written...

Quote:

Originally Posted by litwr

The idea of segment memory is quite good, I add several paragraph to my blog entry - https://litwr.livejournal.com/436.html - to explain this in detail. However it was fully realized only in the 80286. Xenix which used it was quite successful for rich customers. However MS-DOS was much cheaper, this made it much more popular and made our history. IMHO Intel's ppl couldn't imagine that 1 MB of memory would be so cheap in the mid 80s.

If the idea of segment memory is quite good, would you defend replacing all linear 64-bit addresses in modern programs by segments with 32-bit offsets ?

Quote:

Originally Posted by litwr

It is also an irony that the 68k was intended primarily for use in expensive mini-computers but they finished as calculator chips in the 90s.

A friend of mine told me he found a 68060 in a washing machine ;-)
Anyway, the 68k has been used in just about every possible application that exists. Home computers, game consoles, embedded, big machines, even aircraft and missiles.

Quote:

Originally Posted by litwr

Normal supervisor code normally just set modes, it doesn't need to check it. So it is all contrived and far-fetched.

I won't insist, it's clear you failed to understand the point and this hasn't changed.

Quote:

Originally Posted by litwr

Thanks, it is a case when I can clear show you that you are in error.

Nope. Because what you pretend below is wrong.

Quote:

Originally Posted by litwr

Let's execute the STMDB R1!,{R2,R3,R4} instruction. Let's assume that R1 has value 60. So at first R1 becomes equal to 56. Then R2 will be placed at location 56, then R1 will become equal to 52, then R3 will be placed at location 52, then R1 will become equal to 48, and finally R4 will be placed at location 48. This instruction is done. Let's check LDMIB R1!,{R2,R3,R4}. At first R1 becomes equal to 52. Then R4 will take the value from location 52 (here we lies the value of R3), then R1 will become equal to 56, then R3 will take the value from location 56 (here we lies the value of R2), then R1 will become equal to 60 and finally R2 will get value from location 60. So we have an obvious registers shift: R3 gets value of R2 and R4 gets value of R3. If you want the perfect cyclic shift just add one more instruction:

STR R4,[R1]
STMDB R1!,{R2,R3,R4}
LDMIB R1!,{R2,R3,R4}

Except that this is not how a move register multiple instruction works, even on the arm.
In your example, you get the following instead in memory :
48 R2
52 R3
56 R4
(or something like that, they're written in this order because otherwise pushing/popping would be inconsistent)

Quote:

Originally Posted by litwr

Though we can make such a trick with the 68k too. It is

MOVE.L D2,-(SP)
MOVEM.L D2/D3/D4,-(SP)
ADDA.L #4,SP
MOVE.L (SP)+,D2/D3/D4

So my example doesn't show something very interesting and therefore I was rather wrong too.

However it is clear that the ARM has more flexible instructions than MOVEM.

Wrong again, ARM does not have more flexible instructions, far not.

First, the above could have been :

Code:

MOVE.L D2,-(SP)
MOVEM.L D2/D3/D4,-(SP)
MOVEM.L (SP)+,D1/D2/D3/D4

Just 3 instructions, and 2 bytes shorter.
EDIT: a/b was faster

But if all you want is to have D4=old D3, D3=old D2, D2=old D4 then it's pretty easy with only 2 EXG instructions.

Now try this on ARM :

movem.w (a0,d1.w*4),d1-d4

.
I can't even imagine how many instructions that would take.

Quote:

Originally Posted by litwr

I couldn't persuaded you that the 68040 was slightly slower than the 80486. Let's analyze data. Check http://www.lowendmac.com/benchmarks/ - it gives us the 68040:68030 performance ratio is close to 2.1:1. It is well known that the 80486:80386 performance ratio is close to 2-2.5:1. The 80386 is slightly faster than the 68030 because of the faster memory access cycle, instant EA calculation, faster division and multiplication. So we have the only conclusion...

But 80386 isn't faster than the 68030 because the 68030 needs less instructions for doing the same work.
Actually, 68030 is much faster than 80386 even if some timings suggest otherwise.

Quote:

Originally Posted by litwr

BTW the PowerPC 601 was at least 1.4 times faster than the 68040 at the same freq (2.9 for math co-pro) according to lowendmac benchmarks.

NOT of the same freq, no.
68040 has IPC of 1.
PPC needs 4-5 instructions when 68040 needs 1.
It can not be faster clock-by-clock. Simply impossible.

Quote:

Originally Posted by litwr

You also doubted that the ARM was slightly faster than the 80486. Let's check this - http://www.cpushack.com/CPU/cpu4.html#Sec4Part9

That depends on which ARM model we consider, the frequency they use, etc.
But perhaps the 486 is worse than i thought, it's very possible

Quote:

Originally Posted by litwr

I can also add that the ARM could run DOOM like the 80486 could.

Even a 68030 can run DOOM.

Thomas Richter · 25 January 2021, 20:15

Quote:

Originally Posted by robinsonb5

It would be interesting to port the xlife program to an abstract hardware platform which offers just a UART to receive the pattern, a framebuffer to display it and a timer to time it - removing all other system-specific details from the equation.

Side remark: Something like this was done - the LIFE automaton was build in hardware with gates, and the gate lookup tables were programmed in FORTH. Actually, an entire book was written about the results: "Toffoli & Margolus: Celluar automata machines".

The VideoEasel program (on Aminet) is my "interpretation" of this book.

Thomas Richter · 25 January 2021, 20:30

Quote:

Originally Posted by meynaf

6502 faster than z80 ? First time i read this. Looks false, z80 has more complete instruction set. A friend's Amstrad CPC also was quite faster than my old Oric.

On a per-cycle basis, the 6502 can handle more instructions than the Z80. The 6502 uses on average 3 cycles, the Z80 needs a minimum of 7 cycles per instruction. This is no surprise, the 6502 is hardwired, the Z80 is microcoded. The 68000 is also microcoded.

However, the instruction set of the 6502 is rather prmitive, and it is rather ill-suited for higher programming languages. Its stack is too small, and its rather clumpsy if recursion is required - there are no usable primitives for stack handling and argument passing.

Quote:

Originally Posted by meynaf

And also remember that 80286 is a buggy cpu - once you enter the protected mode it's impossible to leave it.

I wouldn't call this a bug. It was considered unnecessary to go back to a legacy mode which was intended only to boot up the system. Later, people found a way around this limitation with "creative use" of peripheral I/O.

Quote:

Originally Posted by meynaf

NOT of the same freq, no.
68040 has IPC of 1.
PPC needs 4-5 instructions when 68040 needs 1.
It can not be faster clock-by-clock. Simply impossible.

That was the whole idea of the PPC line - primitive instructions, allow a simpler design, allow higher clockrates, let the compiler do the work. Didn't work out too well, but there was some idea behind this.

Quote:

Originally Posted by meynaf

Even a 68030 can run DOOM.

Even a 6502 can run DOOM, but in which speed? (-;

Thomas Richter · 25 January 2021, 20:40

Quote:

Originally Posted by roondar

This is a short one to talk about, the 68000 was designed to be coupled to an external MMU (the Motorola 68451) in case such functionality was desired so it does not have any MMU instructions of it's own.

I afraid that wouldn't work. The problem is that the 68000 cannot recover savely from bus faults, though such exceptions are required to load (manually !) a new page descriptor into the 68451. The rather ancient 68451 cannot walk an "MMU tree" itself, it it is rather an associative page descriptor array that must be reloaded by the CPU.

This was fixed with the 68010, which was the first device the 68451 would work with (and the last one, too).

Admittedly, there were a couple of very "creative" ideas to work around the 68000 inabilities, including a system with two 68Ks on board, one running a cycle ahead of the other such that it could stop the second (main) 68K before it receives an access error, and the first would then re-load the MMU.

Very kludgy, of course (and expensive).

Quote:

Originally Posted by roondar

And I want to make sure everyone in this thread understands that before it even starts: last time you were very, very clearly only interested in proving that the 8086 was better than the 68000 (as well as the 80286 being better than the 68020, etc). Anything that proved the opposite you either ignored or claimed to be false/irrelevant without a shred of evidence supporting you.

Better by which means? It was certainly cheaper, and thus "better" for the accounting department. Which was the reason why IBM picked the intels for the "budget system" that became the PC.

Better as in "engineering design" - well, I would disagree.

roondar · 25 January 2021, 21:08

Quote:

Originally Posted by Thomas Richter

I afraid that wouldn't work. The problem is that the 68000 cannot recover savely from bus faults, though such exceptions are required to load (manually !) a new page descriptor into the 68451. The rather ancient 68451 cannot walk an "MMU tree" itself, it it is rather an associative page descriptor array that must be reloaded by the CPU.

I'm aware of these issues, yes. My actual point here, which got a bit muddled because of how I wrote this text (as well as being muddled further by the discussion that followed) was that the 68000 itself does not have an MMU. Therefore a discussion about the MMU functionalities of the 68000 is a bit pointless. Perhaps it would have been better to not name the 68451 to begin with as it seems what I tried to actually bring across seems to have been lost by now.

Quote:

Better by which means? It was certainly cheaper, and thus "better" for the accounting department. Which was the reason why IBM picked the intels for the "budget system" that became the PC.

Better as in "engineering design" - well, I would disagree.

Well, I'm not the one making these claims (litwr is) and to be 100% clear: I certainly don't agree with most/all of them. But that said, litwr has been claiming in the old 68000 details thread (and has by now repeated most of the same claims in this new thread) that...

code density on x86 is better
performance on x86 is better across the board
the instruction set on x86 is better
68000 and successors have all kind of flaws, x86 doesn't really
memory segmentation on x86 is better than flat addressing space
PC relative code is worse than x86 segmentation
etc, etc

In all cases, these points have been used by him to either claim the 68000 is not really a good CPU (but rather one that holds too much to theory instead of practicality), or that x86 is better in these areas. Anything that shows the reverse (i.e. stuff the 68000 does better) is almost universally ignored or claimed to be pointless by him.

a/b · 25 January 2021, 21:08

Quote:

Originally Posted by Thomas Richter

Better by which means? It was certainly cheaper, and thus "better" for the accounting department. Which was the reason why IBM picked the intels for the "budget system" that became the PC.
Better as in "engineering design" - well, I would disagree.

This is not quite true, as far as I know. The reason was IBM wanted working chips and Moto couldn't provide them at the time, they had still been actively developed (they needed like extra half a year). So IBM had to choose best of the worst: bad Texas Intruments' wanna-be 16-bit cpu, Zylog's extended 8-bit to be barely 16-bit cpu, and intel outside 8088.
And made the most catastrophic mistake in digital history thus far, creating monsters out of m$ (being too obsesed with their big iron to give them that sweet deal; worked pretty good for them buying pc-dos for 50k bucks, renaming a few things and selling to IBM) and intel in the process.

a/b · 25 January 2021, 21:15

Quote:

Originally Posted by roondar

code density on x86 is better
performance on x86 is better across the board
the instruction set on x86 is better
68000 and successor have all kind of flaws, x86 doesn't really
memory segmentation on x86 is better than flat addressing space
PC relative code is worse than x86 segmentation
etc, etc

Ah, he's the guy who was on a crusade about code density like a year or two ago?
Sure, x86 has good density if you stick with 8/16-bit, 64k segments and dos, but once you step out its density goes to the crapper.

roondar · 25 January 2021, 21:26

Quote:

Originally Posted by a/b

Ah, he's the guy who was on a crusade about code density like a year or two ago?
Sure, x86 has good density if you stick with 8/16-bit, 64k segments and dos, but once you step out its density goes to the crapper.

Yup, he was. And several other crusades about how Intel x86's always were faster than the equivalent MC68K chip and how the Amiga should've used a 4MHz 6502 instead of a 68000...

meynaf · 26 January 2021, 08:27

Quote:

Originally Posted by Thomas Richter

On a per-cycle basis, the 6502 can handle more instructions than the Z80. The 6502 uses on average 3 cycles, the Z80 needs a minimum of 7 cycles per instruction. This is no surprise, the 6502 is hardwired, the Z80 is microcoded. The 68000 is also microcoded.

However, the instruction set of the 6502 is rather prmitive, and it is rather ill-suited for higher programming languages. Its stack is too small, and its rather clumpsy if recursion is required - there are no usable primitives for stack handling and argument passing.

That's the point - better instruction set can make the cpu faster even if individual instructions are slower.
Z80 is also typically clocked faster than 6502.

Quote:

Originally Posted by Thomas Richter

I wouldn't call this a bug. It was considered unnecessary to go back to a legacy mode which was intended only to boot up the system. Later, people found a way around this limitation with "creative use" of peripheral I/O.

This is what i would call "broken by design". But we seem to have different conception of this.

Quote:

Originally Posted by Thomas Richter

Even a 6502 can run DOOM, but in which speed? (-;

Speed on 68030 seems ok. I haven't run DOOM itself, but Breathless and other similar games were good enough.
On a 6502 you would have trouble with the memory size limits long before you'll need to worry about the speed.

meynaf · 26 January 2021, 08:32

Quote:

Originally Posted by Thomas Richter

Better by which means? It was certainly cheaper, and thus "better" for the accounting department. Which was the reason why IBM picked the intels for the "budget system" that became the PC.

I've read that the real reason why IBM picked 8088 rather than 68000 is because the 68000 wasn't ready at the time. If a cpu isn't on the market yet, you can't use it.

23 January 2021, 18:46	#1002
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	68k details (round #2) Two years ago we had a fantastic thread which contains a magic number of posts - 1001. Sorry I was very busy almost all this time and could not afford to continue this discussion. However I have gathered information for its continuation. I have made a lot of corrections in my blog entry about the 68k - I am sure that even people with good knowledge about the 68k can find several new pieces of information there. I don't reply to some statements which seem wrong for me because I am not sure that the people who made them are here now. However if they appeared at EAB I would write responses. I would be very glad if meynaf could attend this new round again. His stubborn passionate position has impressed me very much. I can only dare to point two topics now: 1) the role of MMU; 2) the code density and performance. It is well known that the main function of MMU is the memory relocation. The memory protection is not necessary if we have correct software, it is rather a luxury which become cheap. The MMU relocation ability allows to make the fork()-call easily. The 8086 has albeit very poor such relocation functionality, but the 68000 doesn't have any of it. BTW MMUs in the Commodore 128 or Apple III have only functionality for the relocation. The MMU can also provide virtual memory support but it is the third its functionality which was out of our discussion. I have just finished my project which I use as a source of experience about the Amiga and 68k, and as a source of data for performance and code density comparisons. It is on aminet, youtube and gifs for it, and some statistics. The results show that for extensive work with tables and bits the 68000 is only slightly faster than the 8086 and for this type of processing the 68020 is much slower than the 80286. The code density of the 68000 is slightly worse than for the 8086. My conclusion is the x86 has slightly better code density in real mode and slightly worse in protected mode than the 68k. I am just the truth seeker. I hope my materials help us to understand the 68k better. Thank you.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

24 January 2021, 00:57	#1007
robinsonb5 Registered User Join Date: Mar 2012 Location: Norfolk, UK Posts: 1,154	It would be interesting to port the xlife program to an abstract hardware platform which offers just a UART to receive the pattern, a framebuffer to display it and a timer to time it - removing all other system-specific details from the equation. It's then easy to compile the codebase as a virtual ROM image for various CPUs and compare the resulting sizes, without them being skewed by differing platform-specific system code. (I did something similar in the past - though this, too, is flawed: I only tested with one codebase, and I'm testing the compiler's code generation as much as the actual ISA. Doing science right is hard! http://retroramblings.net/?p=1414 ) To make a meaningful speed comparison it's necessary to time the computation in isolation, to avoid the results once again being skewed by the platform-specific display code - but even then you have to understand that you're testing the CPU and memory bus of a particular computer, and not the CPU itself.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)