68k details - Page 45

roondar · 25 November 2018, 16:52

Quote:

Originally Posted by litwr

Thanks to our discussion I have just added the next paragraph to the article.
In some 80286, as in 8086/8088, work with interrupts was not implemented 100% correctly, that in very rare cases could lead to very unpleasant consequences. For example, the POPF command in the 80286 always allowed interrupts during its execution, and when executing a command with two prefixes (as an example, you can take REP ES:MOVSB) on 8086/8088/80286 after the interrupt call, one of the prefixes was lost. The bug in the POPF was only in the early 286 processors.

Quote:

The article states quite clearly that there are problems in MB not CPU.

That's halfway true, it does state that. However, it also states that this problem exists because Intel's CPU ran at a too high bus speed for motherboards of the era to keep up with. You can argue semantics here, but the implication from this article (as well as sources such as Wikipedia) is clearly that this is problem with the CPU.

After all, creating a CPU that can't be reliably coupled with the available support hardware of the time (in this case due to running at too high a bus speed to work properly) isn't a sign of doing the job of designing a CPU particularly well. You might disagree, but apparently this is how the market viewed it, as seen here:

Quote:

Originally Posted by 486 Wikipedia

...
One of the few 80486 models specified for a 50 MHz bus (486DX-50) initially had overheating problems and was moved to the 0.8-micrometre fabrication process. However, problems continued when the 486DX-50 was installed in local-bus systems due to the high bus speed, making it rather unpopular with mainstream consumers, as local-bus video was considered a requirement at the time
...

Quote:

My point is that for plain programming this bug is almost impossible to reveal. It can give some problem with some tricky codes. However we have not heard that some PC was affected by it - it is rather theoretical. There were several BIOS producers and I doubt that all of them knew about this bug. In addition, only early 80286 were affected.

My point was that an OS that doesn't have a fix for this can crash and more crucially, the OS can't see which type of 286 it's running so it has to use suboptimal code for all 286's, even those that are not affected.

I can't find it now, but during my Googling on this bug and it's impact I did run into people talking about needing to work around it.

Quote:

Having no FP support at all is not a problem?! You can't run some programs even having the FP emulation software.

Of course such bugs helped to finish 68k.

Yup, it's not a big deal. Most (or even almost) all 68k software in the 1990's that could benefit from an FPU came in 'FPU enabled' and non-FPU versions in the same box (even prior to the 68LC040). You simply almost never actually needed a FPU emulator on 68k systems.

I'll give one example (there are many, many more), look at the popular 3D rendering program Imagine. This always came in both FPU enabled and non-FPU versions included. You could even check which version you were running. See a screenshot of what I mean here: http://obligement.free.fr/articles/imagine50.php

Now, I am only talking about 68k based software here. I have no clue how x86 software dealt with this. Maybe on x86 FPU emulators were more frequent, but on 68k they were very rare. I used Amiga's for years back then and didn't even know these things existed.

Quote:

I have to use 4-byte data for pointers because in my program it is smaller and faster than code with 2-byte data for the address registers.

If you put all code and data in the lower 64kb of RAM then you in fact do not need to do so. This has been the point all along - the 68000 supports word based addresses as long as all absolutely addressed code & data are in that area.

I'll agree that this isn't a common thing to do, but it is 100% possible.

Quote:

I meant 4510 with 20 or 24 address lines or at least a plain 65816. But 65816 architecture was designed in Apple which didn't want to have a competitor for its 68k based computers.

I'm still not convinced, really. As far as I've been able to find, a 65816 just isn't faster than a 68000, especially when running at half the clock speed. In fact, I'm a tiny bit surprised here - your own Pi Spigot benchmark page shows the C64/C128 with a 20MHz SuperCPU and optimised code being less than 2x the speed of an A500, while having 3x the CPU speed. That alone should tell you that a 4MHz 65816 is not going to win.

Quote:

it is a very strange claim. IMHO it was quite popular for x86.

It's not a strange claim, FP calculations on non-FPU based system are much, much, much slower. Seriously, the speed difference can run up to 30x or more.

Quote:

It is interesting. However I can't understand why are you so sure about it. There were thousands programs for Mac and it was quite easy to get one of them which required FPU (or its emulation).

Well, I am so sure for a few reasons. The first is that I owned a FPU based 68k system in the late 90's and always had to pick which of the two executables to start. The second is that I did some Googling and the only mentions I could find of this being a problem were about a really tiny and specific subset of software (apparently some, but not all, statistical analysis software on the Mac required a FPU and was not available in non-FPU versions) and some posts about Linux FPU emulation in 2013.

Also, note that I'm not saying it never was a problem, just that it only rarely was one - even for software that would benefit from a FPU. And again, for those few affected - Motorola would exchange the CPU for free.

Quote:

POPF inside an interrupt handler disables interrupts?! I have never met such an exotic code. Indeed it is possible but rather too exotic.

It is fairly rare for normal user code to be like this, yes. But it's actually quite common for an OS to do things like this.

Quote:

Because it was no place on the die for it!

Because of too large and complex ISA. How can you imagine FP without sines, logarithms, powers, ... ? I can't.

1) You lack imagination then, as the 68040 proved such things work out just fine - almost no software used said functions through the FPU, as I quoted before

2) Your evidence for the ISA being the problem here is based solely on your opinion. A far more likely reason for Motorola running out of transistors can probably be found in the speed difference of the stuff that was kept in - as we've discussed, the 68040 is faster than the 486 at the same clock speed. And the primary way to make a CPU do the same things as before, but faster, is to throw more transistors at the problem (this is a very widely known fact and allows us to explain the differences we see without relying on flawed opinions).

Quote:

I can repeat IMHO Intel was stuck in time doing the best things for every time. Moto tried to leap higher than it actually could. And it often tried to use old designs (PDP-11, VAX, ...) instead of inventing new.

Intel wasn't doing the best things. They were doing things the cheap way and managed to luck out when IBM chose the worst of their x86 CPU's for their as-cheap-as-we-can-make-it PC project. Meanwhile, for all the 'failure' of the 68k ISA, it gave us a CPU architecture that was widely used for 30+ years. Pretty impressive for a 'failure'.

Quote:

Indeed 8086 needed a major upgrade in 1982 and it got it!

Just a few posts back you claimed that 8086 as is was 'fine' until the late 1980's. Now you admit it was already out of date by 1982. Quite the change of mind there.

Oh and it got said upgrade by implementing a new model that was incompatible with the 8086. This stupidity had resulted in every single x86/x64 CPU ever released still starting up like it's 1978 and then having to switch to something that is actually useful.

Quote:

You can read MSW of 80286 or even Pentium from the real mode.

Irrelevant. You claimed the reasons behind Motorola's move were purely theoretical and not useful in the real world. However, in the real world Intel & AMD later both altered their CPU's to be able to comply with these 'useless theories'. This shows Intel & AMD disagree with you.

Quote:

I have already written about unreal mode... We have virtual 8086 mode...

Virtual 8086 mode is a fancy way of saying 'the processor can either rapidly switch modes or emulate part of real mode'. It is not 8086 code running in protected mode. The very fact you need to activate it separately for 8086 code to work at all already shows this.

Note that virtual 8086 mode is a form of hardware emulation that requires the OS to have a special '8086 handler'. Without the handler 8086 code wouldn't work.

The very nature of virtual real mode prove my point: the processor needs to emulate an 8086 in order to be able to execute real mode code. And again: if this 'virtual real mode' is not used, 8086 code will crash the machine.

Quote:

It is a fact that most DOS programs can be used within the protected mode. There were DESQview, Microsoft Windows, ... Thus you claimed very odd matter. I did benchmarks. The speed of the programs were the same in the protected mode and in real. The significant difference was only with i/o-operations (with discs, for example) - it was about 2 times slower in the protected mode.

DOS programs can only be used in protected mode because the OS running the real mode code does a extra work to get it to be able to run. No matter how often you claim it, real mode code will not run in protected mode without this extra work. Which means it is not compatible.

This is extremely widely known and I really don't understand why you keep repeating clearly false information. Here then some quotes from the protected mode wiki to show what I mean:

Quote:

Originally Posted by Wikipedia

...
If an application utilized or relied on any of the techniques below, it wouldn't run:[26]

Segment arithmetic
Privileged instructions
Direct hardware access
Writing to a code segment
Executing data
Overlapping segments
Use of BIOS functions, due to the BIOS interrupts being reserved by Intel[27]

In reality, almost all DOS application programs violated these rules.
...
Virtual 8086 mode, however, is not completely backwards compatible with all programs. Programs that require segment manipulation, privileged instructions, direct hardware access, or use self-modifying code will generate an exception that must be served by the operating system
...
Due to these limitations, some programs originally designed to run on the 8086 cannot be run in virtual 8086 mode

Some of the things mentioned above is fairly common in 8086 code (such as segment manipulation) and won't work without the OS intervening whenever such a thing is done. Case in point: Windows NT does not have a full exception handler for such programs and it's compatibility with real mode programs is much lower.

By the way, benchmarks would be very interesting (do you have these results somewhere?), but note that even as you claim there is no difference, you also claim that IO is half speed - which is a speed difference.

Quote:

I referred to the unreal mode - I even attach my tiny code which allows to switch into it. It allows to use 4 GB segments. It was quite possible to make a DOS variant for it but large companies wanted more complex (and safe) software.

The fact you need extra code to do this proves my point. If protected mode and real mode were fully compatible, no extra code would be needed at all - there would not even be two modes.

Quote:

Thank you! I have never used VLINK before but I could manage to compile last Don's code with just VASM.

No problem.

Quote:

ARM, MIPS, ... survived having much less support than 68k.

The 68k ISA survived for quite a while after the 1990's and was impacted much more severely by the later ARM CPU's than anything Intel did (see, Motorola didn't just sell CPU's to computer manufacturers - it sold quite a number of 68k CPU's outside of that market). In some minor ways it still does.

Bruce Abbott · 26 November 2018, 02:24

Quote:

Originally Posted by litwr

Apple had already realized the limitations and risks of its dependency upon a single CPU vendor at a time when Motorola was falling behind on delivering the 68040 CPU. it is a cite from Wiki.

https://en.wikipedia.org/wiki/Motoro...econd-sourcing

Quote:

Several other companies were second-source manufacturers of the HMOS 68000. These included Hitachi (HD68000), who shrank the feature size to 2.7 µm for their 12.5 MHz version,[4] Mostek (MK68000), Rockwell (R68000), Signetics (SCN68000), Thomson/SGS-Thomson (originally EF68000 and later TS68000), and Toshiba (TMP68000). Toshiba was also a second-source maker of the CMOS 68HC000 (TMP68HC000).

Meanwhile...

Intel and the x86 Architecture: A Legal Perspective

Quote:

At a time when the microprocessor market was still crowded with a panoply of competing architectures, IBM selected Intel’s 8086 processor as the “brain” of its computer. However, IBM required that Intel find a second-source supplier because production had to be guaranteed and it was too risky to rely on a single company as the sole source of its chips. Intel approached Advanced Micro Devices (AMD), a startup chipmaker that was founded by fellow Fairchild Semiconductor alumni. The companies signed a technology exchange agreement in 1982, which the Ninth Circuit described as, “in effect, a reciprocal second sourcing agreement: if one party wanted to market parts the other party had developed, it could offer parts that it had developed in exchange.”...

Recognizing the phenomenal growth in the PC clone market, Intel saw an opportunity to exclude AMD and keep to itself the impending exponential increase in PC sales...

Intel Sues Everyone for Everything

Intel did not stop at frustrating its agreement with AMD to defend its turf in the x86 processor industry; it began to pursue legal action against all threats to its market share. Intel had been actively purging the microprocessor market of other manufacturers of x86-compatible processors.

frost242 · 26 November 2018, 08:44

68k is still alive in the embedded world: https://www.nxp.com/products/process...ldfire:PC68KCF

grond · 26 November 2018, 11:12

Quote:

Originally Posted by litwr

I again want to clarify my point. I have been curious why 68k failed.

68k did not fail, it was decided to replace it by the PowerPC architecture. The PowerPC architecture was quite successful and to this day is a strong competitor to the x86 architecture in the high-performance computing sector in its POWER variant. Hence, there really is no failure here.

Motorola felt that it alone could not compete with Intel because Intel had so many more resources than Motorola due to the success of the IBM-compatible PC. Hence, they were looking for alliances and teamed up with IBM. They probably thought that teaming up with IBM who were at the root of Intel's success due to the choice of the Intel CPU for the "IBM-compatible" was a good idea. And there was no better option. IBM has also been very popular for alliances in the semiconductor fabrication department. There it is just the same: Intel vs IBM et al.

Interestingly at the time the PowerPC was introduced it was said and believed that the PowerPC was a CPU very good at emulating other CPU architectures. It was even believed that due to the PowerPC being RISC it would soon be able to emulate not only 68k (that was assumed to be a given) but also x86 faster than the original chips ran. So you could argue that the PowerPC was designed with a "universal compatibility mode" in mind.

Quote:

They were quite popular in the 80s but quickly vanished in the 90s. I could find out only that that 68k ISA since 68020 became too heavy for the transition into a RISC-like design in the early 90s.

That is just wrong. The 68k ISA is much more orthogonal than the x86 ISA with its many modes and special registers. Intel managed to run x86 code on what basically was a RISC-CPU core starting with the PentiumPro and the same would have been possible for the 68k ISA. In fact the x86 ISA with its non-aligned instructions has been quite limited for a long time in the decoder stage because decoding complexity for superscalar CPUs was exponentially more complex than for CPUs with 16bit or even 32bit aligned instructions.

You may find it interesting that the Apollo Team's 68080 core used in the Vampire accelerator cards is in its core a RISC-CPU having a three-operand ALU and running 68k code. It is more or less using the implementation ideas of the PentiumPro to Pentium III CPUs with some even more modern additions. If Motorola had decided to stick with the 68k ISA, they could just as well have designed a 68k CPU using the same tricks as Intel. There is nothing "too heavy" about the 68k ISA to be implemented on top of a RISC core.

Quote:

Only historical 68k uses address registers. If they were so good then other CPU would use them too.

You really have no clue about processor architectures. In fact, address register have been used in a very large range of CPU architectures. Almost all CISC-architectures use address registers. The NS32x32 series is a very prominent example. It is merely by coincidence that useful data and address sizes have been the same through the 32bit and 64bit era. You may find it interesting that the designers of the original 68000 had planned to make the data registers only 16bit but the address registers 32bit. They only added the 32bit data part of the 68k ISA when they found that they still had space left to implement it. This means that at the time the 68000 was designed, having wider address registers than data registers made sense to the 68k designers. You, on the other hand, have repeatedly argued that 32 bit addresses were a waste of space. Obviously, 4 GB of memory weren't viable at that time but they sure needed at least 24 bit addresses. Obviously, they wanted to keep 16bit alignment of their code and data so they chose 32bit. A very wise decision and certainly not the reason why the "68k architecture failed" (using your words).

Moreover, if you think that dividing the register file into data and address registers doesn't make sense, then please consider that in today's processors peak loads are handled in 128bit and 256bit registers (everything SSE and all those cryptographic execution units have much wider registers than 32 or 64 bits) effectively introducing another arbitrary separation in the register file.

Quote:

You have no any limits with COM-format except its size. The matter is completely different fo 68k.

No, it is not completely different for 68k, you just make that up! Headerless formats are easy for any single-tasking operating system regardless of the CPU ISA. The OS designers of the early 80s already found single-tasking operating systems too limiting. Only the Microsoft world kept using single-tasking OSs until 1995. That was a severe anachronism, nothing else.

Quote:

Having no FP support at all is not a problem?! You can't run some programs even having the FP emulation software.

It was no problem. I never even heard of that bug before it came up here. Up to the early 90s you bought a computer having an FPU if you wanted to run FPU code. Nobody used FPU emulators. There never even was one for the Amiga until very recently. Since you insist on FPU being a very important feature of a CPU in the 1980s and 1990s, please tell me why until about ten years ago there were hardly any ARM processors having FPU support, yet, as you like to point out, the ARM architecture has been very successful. In the ARM world you had to use the soft_fpu compiler flag to make the compiler generate integer code emulating FPU instructions. There was no FPU emulation available.

Quote:

Of course such bugs helped to finish 68k.

Nonesense. You really believe that anyone stopped designing 68k-based computers because there was an early mask of the 68040 that could not do FPU-emulation which nobody even needed? If you wanted to run FPU code, you selected a 68040 with an FPU inside.

Quote:

Because it was no place on the die for it!

Because of too large and complex ISA. How can you imagine FP without sines, logarithms, powers, ... ? I can't.

Perhaps you want to check modern FPU designs and count how many of them have these instructions in hardware. Start with ARM...

Not executing them in hardware (where they are microcoded anyway) has some advantages, most notably the ability to select precision of the operation and running integer code in parallel.

Quote:

I can repeat IMHO Intel was stuck in time doing the best things for every time. Moto tried to leap higher than it actually could. And it often tried to use old designs (PDP-11, VAX, ...) instead of inventing new.

This theory of yours seems to be exclusively believed by you and nobody else. And it doesn't make any sense because up to the 040 the Motorola processors were always faster than their contemporary x86 counterparts with Motorola having very little need to update their ISA. Thus, Intel was always only as good as it needed and the competition having much less resources than Intel were nontheless always ahead of Intel. When Intel got so powerful and the advantages of the 68k ISA were getting smaller with the advent of the 386, Motorola gave up on the 68k in favour of an alliance including one of its most important customers. The RISC ISA was believed to have as many advantages over the updated x86 ISA as the 68k ISA had over the older x86 ISAs. That is all about the "failure" of the 68k ISA.

roondar · 26 November 2018, 11:50

Quote:

Originally Posted by grond

In fact, address register have been used in a very large range of CPU architectures. Almost all CISC-architectures use address registers.

My gut feeling (note I am a programmer, not a CPU designer

) is that the rise of processor architectures with fully general purpose registers was a result of overall technological improvements and CPU designers looking at what people/compilers actually wanted/needed out of architectures. The progression from accumulator + special purpose registers into data/address registers which then later progressed into fully general purpose registers fits that idea fairly well.

It feels like a fairly logical progression to me, anyway.

meynaf · 26 November 2018, 12:05

Actually the D/A split is a very clever idea, it basically allows having 16 registers with the cost of 8.
Imagine what the encoding would be with a 4-bit register field instead of a 3-bit one. You have either to make instructions larger, killing the code density completely, or to severely trim the instruction set, or a combination of both. In either case, it's not nice.
But of course many programmers don't know how instructions are encoded and they see that only as an annoyance. They forget that other families with 16 registers either use large code words (like arm) or ugly prefixes (like x86), while on the 68k a small 16-bit opcode can access them all and still provide a great instruction set.

grond · 26 November 2018, 13:11

Quote:

Originally Posted by roondar

My gut feeling (note I am a programmer, not a CPU designer

) is that the rise of processor architectures with fully general purpose registers was a result of overall technological improvements and CPU designers looking at what people/compilers actually wanted/needed out of architectures. The progression from accumulator + special purpose registers into data/address registers which then later progressed into fully general purpose registers fits that idea fairly well.

Addresses and data ended up having the same width on 32bit and 64bit processors which allowed using the same register for them. There were 16d/32a processors before the 32d/32a and 64d/64a processors common in the last twenty years and now there are many processors with far wider data registers while address registers will never exceed the 64 bit width which we have today. For some time it was more flexible and more orthogonal to use General Purpose Registers and a natural choice because address and data widths were the same. I think that nowadays the disadvantages of the GPR approach are more pronounced. One advantage of having dedicated address registers is that it is easier to make out-of-order memory operations.

Quote:

Originally Posted by meynaf

Actually the D/A split is a very clever idea, it basically allows having 16 registers with the cost of 8.
Imagine what the encoding would be with a 4-bit register field instead of a 3-bit one.

The same could be said about 2-operand vs. 3-operand code and yet everybody uses (at least) three operands today. With 3-operand code, one bit for implying A or D registers and a (very useful) S-flag you'd be left with just 5 bits for encoding instructions which would have to be enough to also encode CISC address modes with e.g. a source/destination flag and some ea-type flag. At this point it is evident that you either stick with 2-operand code if you really want to keep 16 bit instruction words or use 32 bit instruction words. But anyway, let's not turn this into a debate about how an ideal ISA should look like, we have had enough of those. Let me stress again that I do agree that separate A and D registers have some important advantages over GPRs. Personally I like GPRs better than a fixed split between A and D type registers because I find the flexibility of e.g. using 12 registers for data and three for addresses more important. I have been in need of more than eight D registers many times and hardly ever needed seven A registers in speed critical code.

litwr · 01 December 2018, 11:47

I have added to the article the next text:

The oddities with flags do not finish at this. For some unknown reason, many commands, including even MOVE, null the flags of carry (C) and overflow. Another oddity is that the command to save the state of arithmetic flags, which worked normally at 68000, was made privileged in all processors starting at 68010.

Interestingly, IBM simultaneously with the development of the PC led the development of the System 9000 computer based on the 68000, which was released less than a year after the PC.

So IBM worked with 68k but it was too expensive for a mass computer until 1984. Apple Lisa proved this too.

A man sent several more messages to meynaf - https://litwr.livejournal.com/2509.h...ead=4813#t4813 - maybe it will be worth to remove his ban?

Quote:

Originally Posted by meynaf

A PIC with 68k -- you mean a microcontroller ? Apparently this has never annoyed cpu-32 users. There is no instruction you can not use there. Otherwise give examples.

PIC - it is a common abbreviation means position independent code. 68k code is not PIC but x86 is if it has size less than 64 KB.

Quote:

Originally Posted by meynaf

Why would it be easy to cheat and hard to prove ? You can't read 68k code ? Then sorry, but i can't read x86 code either so we're equal.
But ok, you want something small that shows x86 sucks in comparison to 68k ?
Following code is just 4 lines, 8 bytes, and it converts 16 pixels from Atari ST's alternate bitplanes to Amiga's separate bitplanes :

Code:

 move.w (a0)+,(a1)+
 move.w (a0)+,(a2)+
 move.w (a0)+,(a3)+
 move.w (a0)+,(a4)+

This is not the fastest way but it is the shortest one. It's easy to see neither x86 nor arm can cope with that well.
So now what will you say about it ? That it is a special case ? All code is a special case. But it appears special cases better on 68k are strangely easy to find...

I can repeat it can be very difficult to compare C-sources with corresponding assembly sources. Please give your complete subroutine and its C-source too. Your code requires 5 address register but 8086 and 286 has only 4 index registers so the code will be, indeed, not so good. But with 386 you can use 7 index registers and you can write the similar code

Code:

   movsw
   lodsw
   mov [ebx],ax
   add 2,ebx
   lodsw
   mov [ecx],ax
   add 2,ecx
   lodsw
   mov [edx],ax
   add 2,edx

Quote:

Originally Posted by meynaf

The trick of using memory without having allocated it.

Dear meynaf, nobody allocates memory with the DOS programming. It is sometime useful to shrink size of the memory used by a program to make it TSR.

Quote:

Originally Posted by roondar

Yup, it's not a big deal. Most (or even almost) all 68k software in the 1990's that could benefit from an FPU came in 'FPU enabled' and non-FPU versions in the same box (even prior to the 68LC040). You simply almost never actually needed a FPU emulator on 68k systems.

Sorry, I know little about 68k co-pros. Your words show that they were rather very expensive and rare. So the bug terrible in x86 world could be tolerated in 68k world.

Quote:

Originally Posted by roondar

I'm still not convinced, really. As far as I've been able to find, a 65816 just isn't faster than a 68000, especially when running at half the clock speed. In fact, I'm a tiny bit surprised here - your own Pi Spigot benchmark page shows the C64/C128 with a 20MHz SuperCPU and optimised code being less than 2x the speed of an A500, while having 3x the CPU speed. That alone should tell you that a 4MHz 65816 is not going to win.

It is because 65816 doesn't have a hardware division. With 16-bit arithmetic it is about 50% faster than 6502 with the same frequency. 65816 has also fast memory block movement instructions and several other very good features. So it is generally faster than 6502 and sometime much faster. You agreed that 6502 @4MHz can be faster than 68k at 8 MHz for byte processing. IMHO 65816 can be faster and with 16-bit processing.

Quote:

Originally Posted by roondar

It's not a strange claim, FP calculations on non-FPU based system are much, much, much slower. Seriously, the speed difference can run up to 30x or more.

Indeed but it is better than nothing. I remember that I had to run this emulation in the early 90s because required software used co-pro.

Quote:

Originally Posted by roondar

Well, I am so sure for a few reasons. The first is that I owned a FPU based 68k system in the late 90's and always had to pick which of the two executables to start. The second is that I did some Googling and the only mentions I could find of this being a problem were about a really tiny and specific subset of software (apparently some, but not all, statistical analysis software on the Mac required a FPU and was not available in non-FPU versions) and some posts about Linux FPU emulation in 2013.

So there was a software for FPU only.

Quote:

Originally Posted by roondar

1) You lack imagination then, as the 68040 proved such things work out just fine - almost no software used said functions through the FPU, as I quoted before

I can repeat it sounds very odd. Why do not use faster sines or exponents? It sound like somebody says that he doesn't need multiplication because he always can get the same results with addition only.

Quote:

Originally Posted by roondar

2) Your evidence for the ISA being the problem here is based solely on your opinion. A far more likely reason for Motorola running out of transistors can probably be found in the speed difference of the stuff that was kept in - as we've discussed, the 68040 is faster than the 486 at the same clock speed. And the primary way to make a CPU do the same things as before, but faster, is to throw more transistors at the problem (this is a very widely known fact and allows us to explain the differences we see without relying on flawed opinions).

It sounds odd again. Moto had to cut a useful part of its chip. There is nothing good about that.

Quote:

Originally Posted by roondar

Just a few posts back you claimed that 8086 as is was 'fine' until the late 1980's. Now you admit it was already out of date by 1982. Quite the change of mind there.

There is always high-end and low-end system. 8088 based systems were quite good as a middle class of computer to the end of 80s. What is wrong with it? Amstrad PCW with z80 at effective 3.2 MHz could be quite successful to the start of 90s. Indeed for top systems there were 80286 and 80386. 80486 was available in PC since 1989, Moto had nothing comparable in 1989. 68040 with slow FPU can be generally better for top systems only until 1991 when 486dx/2 beat it forever.

Quote:

Originally Posted by roondar

Oh and it got said upgrade by implementing a new model that was incompatible with the 8086. This stupidity had resulted in every single x86/x64 CPU ever released still starting up like it's 1978 and then having to switch to something that is actually useful.

You claimed very odd thing. I can run old DOS software with my modern multi-core PC. I have even a bootable DOS partition on my HDD. Indeed some programs can be incompatible with virtual 8086 mode without proper OS support. However DOS software for work usually worked well with Windows or Desqview. Problems could be mostly with some games. Though even almost all games can run fine with modern virtual machines. You have pointed some real mode instructions that require emulation when using virtual 8086 mode. Agreed, but there are only a few of them, they are just emulated, and they are quite rare so the performance almost not affected. Only i/o became notably slower. If you need benchmarks just run a DOS benchmark program in the real mode and with virtual machine on modern Linux or Microsoft Windows. I worked with my emulator for Commodore plus/4 computer, it worked with almost identical speed in the both environments.

Quote:

Originally Posted by roondar

The 68k ISA survived for quite a while after the 1990's and was impacted much more severely by the later ARM CPU's than anything Intel did (see, Motorola didn't just sell CPU's to computer manufacturers - it sold quite a number of 68k CPU's outside of that market). In some minor ways it still does.

It doesn't change anything. 68k PDP-11 like ISA couldn't compete with more modern technologies. Indeed there is some charm in the retro style of old things but they are rather for museums now.

Quote:

Originally Posted by Bruce Abbott

Meanwhile...

It is interesting when Motorola started to have second sources? Was it 1979 or much later?

Quote:

Originally Posted by frost242

68k is still alive in the embedded world: https://www.nxp.com/products/process...ldfire:PC68KCF

In that world can survive almost everything beginning with 8080 which is still produced!

Quote:

Originally Posted by grond

68k did not fail, it was decided to replace it by the PowerPC architecture. The PowerPC architecture was quite successful and to this day is a strong competitor to the x86 architecture in the high-performance computing sector in its POWER variant. Hence, there really is no failure here.

Indeed Motorola had failed to produce successful CPU with 68k ISA and it was replaced by PowerPC. It was quite painful because 68k was quite popular and had a lot of software for it. Even ARM could compete with Intel but the giant Motorola couldn't - it is illogical. There were a lot of attempts to outperform x86 by the emulation, the latest known to me is still a bit hidden history of Transmeta. However Intel could eventually get faster CPU.

Quote:

Originally Posted by grond

That is just wrong. The 68k ISA is much more orthogonal than the x86 ISA with its many modes and special registers. Intel managed to run x86 code on what basically was a RISC-CPU core starting with the PentiumPro and the same would have been possible for the 68k ISA. In fact the x86 ISA with its non-aligned instructions has been quite limited for a long time in the decoder stage because decoding complexity for superscalar CPUs was exponentially more complex than for CPUs with 16bit or even 32bit aligned instructions.

Orthogonality is rather a weakness in the world of modern specialized technology. Look at http://www.vcfed.org/forum/showthrea...et-DEC-T11-Cpu - a man accidentally found out that ancient PDP-11 and 68k ISA are almost identical. However 68k is not so orthogonal as PDP-11. Only 68k's MOVE instruction is close to full orthogonality and this gave 68k some advantage over PDP-11 designed in 1969.

Quote:

Originally Posted by grond

There is nothing "too heavy" about the 68k ISA to be implemented on top of a RISC core.

Moto was incapable to do this with 68k heavy ISA in time. BTW Intel was not too stuck in PC. 80186 was incompatible with IBM PC architecture.

Quote:

Originally Posted by grond

You really have no clue about processor architectures. In fact, address register have been used in a very large range of CPU architectures. Almost all CISC-architectures use address registers. The NS32x32 series is a very prominent example.

I had some experience with NS32016 - all its 8 main registers are GPR although there are several registers for bases. These base registers used in a VAX way for complicated work with subroutine frames, for stacks and for something so exotic that it was ignored by known OS. And NS32x32 is an example of rather unsuccessful architecture, too VAX-like, too big, ...

Quote:

Originally Posted by grond

Perhaps you want to check modern FPU designs and count how many of them have these instructions in hardware. Start with ARM...

It was late 80s, the technologies was not so good as today. For that time 68k ISA was too big. If it had not been so then Moto would have made a CPU which could compete x86.

Quote:

Originally Posted by grond

This theory of yours seems to be exclusively believed by you and nobody else. And it doesn't make any sense because up to the 040 the Motorola processors were always faster than their contemporary x86 counterparts with Motorola having very little need to update their ISA. Thus, Intel was always only as good as it needed and the competition having much less resources than Intel were nontheless always ahead of Intel. When Intel got so powerful and the advantages of the 68k ISA were getting smaller with the advent of the 386, Motorola gave up on the 68k in favour of an alliance including one of its most important customers. The RISC ISA was believed to have as many advantages over the updated x86 ISA as the 68k ISA had over the older x86 ISAs. That is all about the "failure" of the 68k ISA.

It is not true. Excuse me but I have to repeat some words specially for you. 8088+8087 was much faster with FP than 68000. 8088 can outperform 68008. 80286 could be faster than 68020 with 8- and 16-bit data processing. 80386 could be faster than 68030 and with 32-data processing. 80486 was much faster than 68030. 80486 was much faster than 68040 with some common FP calculations. 486DX2 just killed 68040 in 1991 - Moto had to give up.

meynaf · 01 December 2018, 13:53

Quote:

Originally Posted by litwr

The oddities with flags do not finish at this. For some unknown reason, many commands, including even MOVE, null the flags of carry (C) and overflow.

Moving data on the 68k works like TST'ing it, which is like CMP'ing it with zero. A side-effect of this, is that overflow and carry always get cleared (because CMP with 0 can't set them).
Check what MOVE+BLE (or BGT) combination could do if MOVE didn't clear V.

Quote:

Originally Posted by litwr

Another oddity is that the command to save the state of arithmetic flags, which worked normally at 68000, was made privileged in all processors starting at 68010.

The MOVE SR instruction isn't for just saving the state of arithmetic flags - it touches the whole SR.
On 68010 was created MOVE CCR for the arithmetic flags.

This was to fix a design bug !
The issue is this : allowing to read the whole SR makes virtualization more tricky, if not impossible. A sandbox needs the sandboxed program to "believe" it is actually the supervisor, when in reality it is not. So it will run in user mode and the sandbox will "emulate" all his supervisor stuff.
Now let's consider what happens if the hosted program saves SR for whatever reason, then restores it.
It will save SR with S=0 (user mode) because the sandbox can't catch it, then restore it with S=0 (user mode again). The sandbox will then consider it wants to return to user mode... which wasn't the case. Result : crash'n'burn.

On the other hand, if MOVE SR is privileged, the sandbox will emulate the call and return a correct value with S=1.
I don't know if my explanation is clear. Do you see the problem ?

So, ok, it's true the average Amiga user does not need that. But workstations might.

Quote:

Originally Posted by litwr

Interestingly, IBM simultaneously with the development of the PC led the development of the System 9000 computer based on the 68000, which was released less than a year after the PC.

So IBM worked with 68k but it was too expensive for a mass computer until 1984. Apple Lisa proved this too.

Interesting indeed, but honestly the price of the 68000 in the early 80's isn't in my preoccupations...

Quote:

Originally Posted by litwr

A man sent several more messages to meynaf - https://litwr.livejournal.com/2509.h...ead=4813#t4813 - maybe it will be worth to remove his ban?

He really got haywire before he got banned, but if i had been the moderator it wouldn't have been a permanent ban.
Thanks for pointing me to his replies, btw.
I hope we can discuss in a more friendly manner in the future, even if we clearly don't agree.

Quote:

Originally Posted by litwr

PIC - it is a common abbreviation means position independent code. 68k code is not PIC but x86 is if it has size less than 64 KB.

68k code can be PIC if you write it to be so. Like x86 code can be not PIC if you write it to don't be so. I don't see why you insist on seeing an advantage where there is none.
You say some instructions can't be used there, but i don't see any. So, examples please.

Quote:

Originally Posted by litwr

I can repeat it can be very difficult to compare C-sources with corresponding assembly sources. Please give your complete subroutine and its C-source too.

In many cases i just don't have any C source.

Quote:

Originally Posted by litwr

Your code requires 5 address register but 8086 and 286 has only 4 index registers so the code will be, indeed, not so good.

And it was just a simple example. Imagine what happens for more complex code.

Quote:

Originally Posted by litwr

But with 386 you can use 7 index registers and you can write the similar code

Code:

   movsw
   lodsw
   mov [ebx],ax
   add 2,ebx
   lodsw
   mov [ecx],ax
   add 2,ecx
   lodsw
   mov [edx],ax
   add 2,edx

Wow. 10 instructions instead of 4

(And many more than 8 bytes).
Now consider a c2p : 1 source, 8 dest, 8 data, 1 loop counter...

Quote:

Originally Posted by litwr

Dear meynaf, nobody allocates memory with the DOS programming. It is sometime useful to shrink size of the memory used by a program to make it TSR.

It does not make it less dirty. Some Atari ST programs do exactly the same, and they're a pain to get ported.

roondar · 01 December 2018, 18:32

Quote:

Originally Posted by litwr

Sorry, I know little about 68k co-pros. Your words show that they were rather very expensive and rare. So the bug terrible in x86 world could be tolerated in 68k world.

The Motorola coprocessors were roughly the same price as their x86 counterparts.

The real reason they're rare is that, on average, most software didn't actually need an FPU back in those days. This goes for both x86 systems and 68k systems. I knew quite a few people with a PC and no-one had a 8087/80287. One or two did have a 80387 later on, but only because it came with the system - they didn't actually need one either.

You're simply vastly overstating the usefulness of an FPU in the 1980's and early 1990's to the average user. An 8087 coprocessor box from Intel from 1987 (or later) pointed to all of 160 programs that could benefit. Wow, what a large number out of the thousands and thousands of programs out there at the time. It sure sounds like a must have accessory

https://www.worthpoint.com/worthoped...sor-1856611681

Quote:

It is because 65816 doesn't have a hardware division. With 16-bit arithmetic it is about 50% faster than 6502 with the same frequency. 65816 has also fast memory block movement instructions and several other very good features. So it is generally faster than 6502 and sometime much faster. You agreed that 6502 @4MHz can be faster than 68k at 8 MHz for byte processing. IMHO 65816 can be faster and with 16-bit processing.

I agreed that some 8-bit things would be faster, not all things. There is a lot of stuff at which a 4MHz 6502 will still be slower than an 8MHz 68000. The same will turn out to be true for 16 bit operations on a 65816@4MHz vs a 68000@8MHz.

Your arithmetic example even proves my point - if you check the cycle times of the 16 bit add/sub commands for both you will quickly find that the 65816 isn't always faster even clock for clock (i.e. the 65816 actually is slower clock for clock than the 68000 on some of these commands) and when it actually is faster clock for clock, it still is never over 2x as fast so it'll at best be the same speed when compared to a 68000@8MHz - even on instructions you yourself present here as being faster.

And that leaves out the main problem that the 6502/65816 faces, which is that the Motorola tends to simply need less instructions to do the same things as a 6502/65816 by virtue of not being an accumulator based design - even if we pretend that multiply and divide don't exist. This difference is somewhat smaller with 68000 vs 65816 as the 65816 adds a bunch of instructions, but still very clearly exists. Which means that looking at instruction cycle counts of simple stuff only tells a small part of the story. And a wrong one at that.

The blockmove is nice though, I'll grant you that.

Now, I've not been able to find other 65816 benchmarks, other than a few references to them on the 6502.org website (where apparently the 65816 lost to the 68000 on the sieve benchmark by a rather significant margin, though they did not provide a link). Interestingly, people on that forum concluded that the 65816 would indeed be faster than the 68000 - if they both ran at the same clock. And I'd agree with that.

Of course, it would still lose at half the clock speed - given the few data points we do have and the fact that the very fans of the CPU didn't claim it themselves (if I was a 6502 fan I would certainly claim such a thing if it were true!). Could there be benchmarks in which the 65816 wins? Well, show them and I'll admit defeat. But claiming something when the only evidence says you are wrong is clearly not going to work.

Quote:

Indeed but it is better than nothing. I remember that I had to run this emulation in the early 90s because required software used co-pro.

This is seemingly a pure x86 problem. I've never had this problem on my Amiga back then - 99.9% of software simply included two binaries.

Quote:

So there was a software for FPU only.

This reply of yours is highly disingenuous and you know it. Firstly, I show that FPU & non-FPU software is normally part of the same package. Then I go on saying that with a lot of searching I could find one example of this 68LC040 thing being a problem back in the day. That simply does not translate into the problem being a big deal or worthy of being discussed. My results in fact show there is no real problem. Unless you can actually show me why this problem you think is a big deal actually is actually such a big deal (you know, with fact based evidence - like a massive list of affected programs), I'll consider this matter closed.

Quote:

I can repeat it sounds very odd. Why do not use faster sines or exponents? It sound like somebody says that he doesn't need multiplication because he always can get the same results with addition only.

You thinking it's odd doesn't make it so. I can keep repeating this it seems: actual, real world results don't agree with your opinion. In the real world almost no one used the FPU for these and as such, removing it from the FPU didn't change much.

The point here then, isn't that these functions couldn't be useful, but rather than they (for whatever reason) were used only rarely. That you personally don't understand why this was so is honestly completely irrelevant - it won't change the facts.

Quote:

It sounds odd again. Moto had to cut a useful part of its chip. There is nothing good about that.

What is odd it how you refuse to understand why this might have been the right choice. Seriously, just because you don't understand it doesn't mean it was the wrong thing to do. Case in point, the 68040 was quicker than the 486 clock for clock on all the things that it did implement. Clearly then, their choice had merit - they sacrificed a part of the FPU that almost no one used and in return got a CPU that was 25% faster than the competition. This is how engineering is done - you try to pick the best possible compromise that fits the limits of technology, budget, etc.

Would it have been nicer to have these functions? Sure. But not by as much as you think and the cost to keep them was apparently too great.

Quote:

There is always high-end and low-end system. 8088 based systems were quite good as a middle class of computer to the end of 80s. What is wrong with it? Amstrad PCW with z80 at effective 3.2 MHz could be quite successful to the start of 90s. Indeed for top systems there were 80286 and 80386. 80486 was available in PC since 1989, Moto had nothing comparable in 1989. 68040 with slow FPU can be generally better for top systems only until 1991 when 486dx/2 beat it forever.

You must be joking

The 8088 was not a 'middle class' computer by the end of the 1980's, it may have delivered mid-range performance in 1979, but was completely obsolete by 1985. The Apple's, Atari's, Amiga's, PC-AT's out by then absolutely trounced it performance wise.

As for 80286/80386/etc - that is completely irrelevant. This part of our discussion was purely about the 8086 as designed. You've repeatedly claimed the 8086 itself (not the follow-ups, the actual 8086 CPU) was fine until the late 1980's. Yet in 1982 the design was already so far out of date that Intel redesigned it really rather thoroughly in an effort to get it up to date.

But don't just take my word for it - there's this litwr guy who only a post ago said something rather similar in different words "Indeed 8086 needed a major upgrade in 1982 and it got it!". But now this other guy (I believe he's called litwr), disagrees with litwr

Seriously, one post back you agree with me that the changes to the 8086 were needed and now you're straight back to claiming there was nothing wrong with it

Quote:

You claimed very odd thing. I can run old DOS software with my modern multi-core PC. I have even a bootable DOS partition on my HDD. Indeed some programs can be incompatible with virtual 8086 mode without proper OS support. However DOS software for work usually worked well with Windows or Desqview. Problems could be mostly with some games. Though even almost all games can run fine with modern virtual machines. You have pointed some real mode instructions that require emulation when using virtual 8086 mode. Agreed, but there are only a few of them, they are just emulated, and they are quite rare so the performance almost not affected. Only i/o became notably slower. If you need benchmarks just run a DOS benchmark program in the real mode and with virtual machine on modern Linux or Microsoft Windows. I worked with my emulator for Commodore plus/4 computer, it worked with almost identical speed in the both environments.

I have not claimed an odd thing, I have claimed a factual thing that you seemingly just don't want to hear or something.

So, just to (hopefully) break through this continued nonsense about real mode being compatible with protected mode one last time, here are the facts. Not opinions, not oddities, just facts.

1) Installing and running DOS on a modern PC proves nothing about real mode code running in protected mode, because.. DOS, even on the very latest x64 processors, does not run in protected mode.
2) Real mode code will not run in protected mode as-is. It will crash.
2) The workaround to this, namely Virtual 8086 mode, needs the OS to handle things or the real mode program will again crash.
3) The above shows that real mode code is not compatible with protected mode - if it were compatible, no workaround would ever be needed.

And again, I'm not saying you can't run the CPU in real mode - you can. I'm also not saying the compatibility hacks required for Virtual 8086 mode don't work - they do.

However, neither are relevant - An Intel CPU in protected mode cannot run 8086 code without requiring workaround in software. This fact proves the two modes are not compatible - no matter how often you make claims about running DOS or using an OS that has these workarounds built-in.

After all, if the OS using workarounds is an acceptable solution then your original point (remember were we started this) about some 68k code not being compatible with 68010+ is null and void - you can simply have the OS reset the VBR and run any offending program in supervisor mode. This is much less work than virtual 8086 mode requires and works just fine.

Quote:

It doesn't change anything. 68k PDP-11 like ISA couldn't compete with more modern technologies. Indeed there is some charm in the retro style of old things but they are rather for museums now.

I never, ever claimed 68k is still relevant today. I've claimed that 68k was relevant for far longer than you claimed it to be. And despite what you posted above, that is a completely accurate statement. Case in point: you could buy a really rather popular device containing a 68k based CPU for years after the 486 had vanished from the PC market.

Oh and I found a lovely gem on the Wikipedia page on the PDP-11:

Quote:

Originally Posted by Wikipedia PDP-11

The design of the PDP-11 inspired the design of late-1970s microprocessors including the Intel x86[1] and the Motorola 68000.

litwr · 01 December 2018, 20:00

Quote:

Originally Posted by meynaf

And it was just a simple example. Imagine what happens for more complex code.

Wow. 10 instructions instead of 4

(And many more than 8 bytes).
Now consider a c2p : 1 source, 8 dest, 8 data, 1 loop counter...

It does not make it less dirty. Some Atari ST programs do exactly the same, and they're a pain to get ported.

The code can be made much more elegant and even for 8088.

Code:

   lodsw
   mov [off1+di],ax
   lodsw
   mov [off2+di],ax
   lodsw
   mov [off3+di],ax
   movsw
   add di,6

It is much better and uses only 2 registers. I thought in 68k way and missed that better code.

IMHO it can be even faster at 8088 than at 68000 at the same frequency. Atari ST doesn't run DOS it has OS only a bit similar to DOS.

meynaf · 01 December 2018, 20:41

Quote:

Originally Posted by litwr

The code can be made much more elegant and even for 8088.

Code:

   lodsw
   mov [off1+di],ax
   lodsw
   mov [off2+di],ax
   lodsw
   mov [off3+di],ax
   movsw
   add di,6

It is much better and uses only 2 registers. I thought in 68k way and missed that better code.

Still twice as many instructions and much larger code than 68k...
And you don't think in 68k way yet. Else you wouldn't write the same things

Quote:

Originally Posted by litwr

IMHO it can be even faster at 8088 than at 68000 at the same frequency.

Certainly not. 8088 needs two memory accesses for one on the 68000.

Quote:

Originally Posted by litwr

Atari ST doesn't run DOS it has OS only a bit similar to DOS.

Yes but it also "allows" using memory without a prior allocation.

litwr · 01 December 2018, 20:57

Quote:

Originally Posted by meynaf

Moving data on the 68k works like TST'ing it, which is like CMP'ing it with zero. A side-effect of this, is that overflow and carry always get cleared (because CMP with 0 can't set them).
Check what MOVE+BLE (or BGT) combination could do if MOVE didn't clear V.

The MOVE SR instruction isn't for just saving the state of arithmetic flags - it touches the whole SR.
On 68010 was created MOVE CCR for the arithmetic flags.

68k code can be PIC if you write it to be so. Like x86 code can be not PIC if you write it to don't be so. I don't see why you insist on seeing an advantage where there is none.
You say some instructions can't be used there, but i don't see any. So, examples please.

In many cases i just don't have any C source.

Thank you. Your explanation gives some sense for the MOVE behaviors with flags. However they are unique and look really strange. IMHO it prevents compiler to mix instructions to get code executing faster on several ALUs simultaneously.

You have written Now let's consider what happens if the hosted program saves SR for whatever reason, then restores it. We are speaking only about reading! Indeed the restoration of SR should be privileged and it was privileged.

PIC. Sorry I again should repeat the same thing.

When you write a COM-program you don't need to be aware of anything but size of the segments - you get PIC automatically. BTW I have some experience of work with experimental high-performance hardware based on Xilinx Microblaze variant without MMU. People wanted to run Minix with it so they just added two segment registers (for code and data) into hardware and using them they got PIC for every Minix program. Minix works with this system with this cheap and easy trick. It was possible to run even Linux this way but Linux uses MMU too often and this trick will often slow down performance too much. For 68k you should write your code in a special way.

Without C-sources it will be almost impossible to organize a contest.

litwr · 01 December 2018, 21:49

Quote:

Originally Posted by roondar

You're simply vastly overstating the usefulness of an FPU in the 1980's and early 1990's to the average user. An 8087 coprocessor box from Intel from 1987 (or later) pointed to all of 160 programs that could benefit. Wow, what a large number out of the thousands and thousands of programs out there at the time. It sure sounds like a must have accessory

FPU were a very profitable business for Intel...

Quote:

Originally Posted by roondar

You thinking it's odd doesn't make it so. I can keep repeating this it seems: actual, real world results don't agree with your opinion. In the real world almost no one used the FPU for these and as such, removing it from the FPU didn't change much.

I have read that ppl were very disappointed when they found out that the same software works much (!) slower with 68040 than with 68030+68882.

Quote:

Originally Posted by roondar

What is odd it how you refuse to understand why this might have been the right choice. Seriously, just because you don't understand it doesn't mean it was the wrong thing to do. Case in point, the 68040 was quicker than the 486 clock for clock on all the things that it did implement. Clearly then, their choice had merit - they sacrificed a part of the FPU that almost no one used and in return got a CPU that was 25% faster than the competition. This is how engineering is done - you try to pick the best possible compromise that fits the limits of technology, budget, etc.

They mutilated CPU and get only slight advantage for about a year. It looks rather pathetic.

Quote:

Originally Posted by roondar

The 8088 was not a 'middle class' computer by the end of the 1980's, it may have delivered mid-range performance in 1979, but was completely obsolete by 1985. The Apple's, Atari's, Amiga's, PC-AT's out by then absolutely trounced it performance wise.

I can't agree with you. A system based on 8088/86 or V20/V30 @10MHz with 1-2 MB RAM (EMS), 20-40 MB HDD and EGA card looked quite good for 1989. Indeed, it was not the top computer but for offices it was better than Amiga 500 or Atari ST. V20/V30 has about the same performance per megahertz as 68000.

Quote:

Originally Posted by roondar

I have not claimed an odd thing, I have claimed a factual thing that you seemingly just don't want to hear or something.

You continue to repeat a kind of odd things. Indeed the hardware emulation of other complex hardware requires some help from software. Modern virtual machines can run DOS with almost the same speed as with real iron. I have given you an example. Some complex, duplicated or not much required for modern tasks hardware sometimes is not supported by virtual machines. For example, I found out that there is still no any virtual machine with complete emulation of Sound Blaster. So you gave me some theoretical points but virtual machines just run DOS programs with the speed of modern iron - it was impossible without hardware emulation which gives indirectly full compatibility with real mode from a protected mode. The complete software emulation is a bit more accurate but about 100 times slower - check DOSbox.

Quote:

Originally Posted by roondar

After all, if the OS using workarounds is an acceptable solution then your original point (remember were we started this) about some 68k code not being compatible with 68010+ is null and void - you can simply have the OS reset the VBR and run any offending program in supervisor mode. This is much less work than virtual 8086 mode requires and works just fine.

I agree that it is possible to use a piece of system software to fix 68010/20/30 incompatibility with 68000 but as I know there was not such common software. Some programs (mostly games) just doesn't work with 68010/20/30.

Quote:

Originally Posted by roondar

Case in point: you could buy a really rather popular device containing a 68k based CPU for years after the 486 had vanished from the PC market.

Indeed it is better to buy a Pentium or ARM based device.

Quote:

Originally Posted by roondar

Oh and I found a lovely gem on the Wikipedia page on the PDP-11:

LOL. Of course, Intel stole the idea of LE byte order from DEC!

It is a real Wikipedia's gem. I can estimate that 68k design is about 60-70% from DEC but Intel less than 10%.

frank_b · 01 December 2018, 23:57

Quote:

Originally Posted by litwr

I have read that ppl were very disappointed when they found out that the same software works much (!) slower with 68040 than with 68030+68882.

All the FP primitives are much much faster. Something like 4x on the 040 at the same clock. I have an 040 based Amiga btw. 40 mhz 040 blizzard. They weren't rare.

roondar · 02 December 2018, 00:06

Quote:

Originally Posted by litwr

FPU were a very profitable business for Intel...

There was virtually no software available that made use of an FPU as Intel's own marketing shows*. Them selling a bunch of them anyway is more of a sign of customer stupidity and/or clever marketing than usefulness of the product.

*) Not just Intel's marketing by the way. It's common knowledge that computers in the 1980's and 1990's tended to shy away from floating point math for all but the most esoteric of cases (such as 3D rendering). Even the 'Amiga killer' Doom just ran plain old integer code.

Quote:

I have read that ppl were very disappointed when they found out that the same software works much (!) slower with 68040 than with 68030+68882.

Where? When? Who? See, I've never heard of this before this thread and was an active 68k user a whole lot longer than you were. Furthermore, I couldn't find anything about this on Google when I tried to find it just now. All I've been able to find on the 68040 FPU and it's performance was that is was really rather good. Up to 3x the speed of the 486 for common FP benchmarks.

http://www.skepticfiles.org/cowtext/...1/486vs040.htm on the linpack FP benchmark: "Here, the MC68040 outperforms the 80486 by a factor of 3. This performance ratio is well supported by the discussion given for the data in Table 1.1."

Now this is only one data point and it may well be too positive, but that one data point is still much, much more than you've offered as evidence for your claims, which is zero.

Quote:

They mutilated CPU and get only slight advantage for about a year. It looks rather pathetic.

What a ridiculous exaggeration. You do love fake melodramatics, don't you?

It was a compromise. It worked and didn't 'mutilate' anything other than a few fractal rendering programs. Just like the lowering of front bus speed by Intel later on was a necessary compromise that worked 'for about a year'.

Besides, removing rarely used instructions or features to make room for more instructions or better performance is a much better idea than keeping everything in at all costs - just ask Intel how it's push for mobile dominance is going and then ask the same of ARM to see why (as ARM actually does, from time to time, remove instructions from it's instruction set. Amusingly, they use similar kinds of logic to that used by Motorola during the 68040 design - instructions that don't offer enough 'bang for the silicon' get removed)

https://stackoverflow.com/questions/...rm-instruction

Quote:

I can't agree with you. A system based on 8088/86 or V20/V30 @10MHz with 1-2 MB RAM (EMS), 20-40 MB HDD and EGA card looked quite good for 1989. Indeed, it was not the top computer but for offices it was better than Amiga 500 or Atari ST. V20/V30 has about the same performance per megahertz as 68000.

I disagree. For 1989 such a machine would've been very, very dated compared to the 286's and 386's out there. The only 'office work' such a machine would be useful for would be word processing and light spreadsheet usage and very little else - it lacked the CPU power for serious tasks, just like the A500 & Atari ST did.

FYI: the only reason the Atari & Amiga 500 did well in the market is precisely because they were low end and thus rather cheap (unlike your example, which would've been shockingly expensive for the pitiful CPU power it gave you), but were still very good for one key workload: gaming. Which your example machine would be very poor at indeed as EGA is no good without at least a fast 286.

And seriously, office work? In those days that screamed 'any old underpowered, obsolete hardware will do'. No one in their right mind sees office PC's in the 1980's as anything other than crap

Quote:

You continue to repeat a kind of odd things. Indeed the hardware emulation of other complex hardware requires some help from software. Modern virtual machines can run DOS with almost the same speed as with real iron. I have given you an example. Some complex, duplicated or not much required for modern tasks hardware sometimes is not supported by virtual machines. For example, I found out that there is still no any virtual machine with complete emulation of Sound Blaster. So you gave me some theoretical points but virtual machines just run DOS programs with the speed of modern iron - it was impossible without hardware emulation which gives indirectly full compatibility with real mode from a protected mode. The complete software emulation is a bit more accurate but about 100 times slower - check DOSbox.

I claim 'odd kinds of stuff' I've found in Intel's CPU manuals (and other sources). But hey, what do Intel know. They only made the thing. Also, what I said has nothing, nada, zero to do with virtual machines.

Quote:

I agree that it is possible to use a piece of system software to fix 68010/20/30 incompatibility with 68000 but as I know there was not such common software. Some programs (mostly games) just doesn't work with 68010/20/30.

You've clearly not checked. There are multiple options available for the Amiga that allow you to do exactly what I pointed out.

As just one of many examples, there is T.U.D.E. (The Ultimate Degrader and Enhancer) which allows you to run programs with a number of options for compatibility. Amongst other things, T.U.D.E. lets you run software that would fail on 68010+ due to privilege errors (Fixing the move from SR problem), allows you to kill the cache (fixing self modifying code), reset the MMU (if any), reset the VBR, etc.

Most games work when you use this. The few that remain either fail due to requiring Kickstart 1.3 (which is also an option to use in T.U.D.E.), due to timing problems (i.e. not properly waiting on the Blitter) or the coder reading the Motorola manual and thinking "screw that, I'm going to put data in the high byte of my address registers anyway".

For more proof the problem (on Amiga) is almost never the processor, note that WHDLoad slaves almost always have to fix Blitter coding and less frequently kill caches or enable supervisor flags for incompatible programs. They almost never need to touch the actual program code (apart from Blitter coding as mentioned or if the program was buggy on 68000 also).

Quote:

Indeed it is better to buy a Pentium or ARM based device.

Pity then that these devices were not available with an ARM until much later and were never made based on x86 architecture then.

A Pentium/x86/x64 based version of this type of device still won't work even today because Intel just can't seem to manage making an x86 CPU that is even semi-worthwile if it can't use all the power in the world

Quote:

LOL. Of course, Intel stole the idea of LE byte order from DEC!

It is a real Wikipedia's gem. I can estimate that 68k design is about 60-70% from DEC but Intel less than 10%.

Which, by your apparent views on the PDP-11, makes Intel's x86 CPU's horribly flawed monstrosities

meynaf · 02 December 2018, 11:14

Quote:

Originally Posted by litwr

Thank you. Your explanation gives some sense for the MOVE behaviors with flags. However they are unique and look really strange.

Of course we all know that everything that's different to your beloved x86 is at least "strange"

But MOVE setting the flags is very, very useful.

Quote:

Originally Posted by litwr

IMHO it prevents compiler to mix instructions to get code executing faster on several ALUs simultaneously.

Sorry, there's no reason it could have that kind of impact.

Quote:

Originally Posted by litwr

You have written Now let's consider what happens if the hosted program saves SR for whatever reason, then restores it. We are speaking only about reading! Indeed the restoration of SR should be privileged and it was privileged.

You did not understand my explanation.
If the read gives a wrong result, then restoring also will. Hence both must be privileged.

Quote:

Originally Posted by litwr

PIC. Sorry I again should repeat the same thing.

When you write a COM-program you don't need to be aware of anything but size of the segments - you get PIC automatically.

It will not be more true if you repeat it.
You can get PIC on 68k if you setup the assembler to produce it (setup small data model). So you can get it automatically as well. But you can get it for larger programs than just 64k.
On the other hand, having to take care about the size of the data segments is a lot more intellectual load than just using relative addressing modes.

Quote:

Originally Posted by litwr

BTW I have some experience of work with experimental high-performance hardware based on Xilinx Microblaze variant without MMU. People wanted to run Minix with it so they just added two segment registers (for code and data) into hardware and using them they got PIC for every Minix program. Minix works with this system with this cheap and easy trick. It was possible to run even Linux this way but Linux uses MMU too often and this trick will often slow down performance too much. For 68k you should write your code in a special way.

Writing code for segmentation is more "in a special way" than writing PIC for 68k...

Quote:

Originally Posted by litwr

Without C-sources it will be almost impossible to organize a contest.

I don't see why it could be impossible. A detailed algorithm should largely be enough.
Anyway, if it's necessary for you, why don't you find one then ?

litwr · 07 December 2018, 21:21

@meynaf

Code:

   lodsw
   mov [off1+di],ax
   lodsw
   mov [off2+di],ax
   lodsw
   mov [off3+di],ax
   movsw

has 7 instructions.

Code:

 move.w (a0)+,(a1)+
 move.w (a0)+,(a2)+
 move.w (a0)+,(a3)+
 move.w (a0)+,(a4)+

has 5... But 68k code needs to do loading of 5 32-bit registers before it and x86 code needs only to load 2 16-bit registers. 10 > 9 and thus 68k has more instructions for the case.

@roondar
8-bit data bus at 65816 slows down this CPU very much. My point was that a 65816 variant with 16-bit data bus @4MHz should be faster than 68000 @8MHz.

mc6809e · 08 December 2018, 03:56

Quote:

Originally Posted by litwr

@meynaf

Code:

   lodsw
   mov [off1+di],ax
   lodsw
   mov [off2+di],ax
   lodsw
   mov [off3+di],ax
   movsw

has 7 instructions.

Code:

 move.w (a0)+,(a1)+
 move.w (a0)+,(a2)+
 move.w (a0)+,(a3)+
 move.w (a0)+,(a4)+

has 5... But 68k code needs to do loading of 5 32-bit register before it and x86 code needs only to load 2 16-bit registers. 10 > 9 and thus 68k has more instructions for the case.

Two things. First, if you're going to count the loading of addresses, you have to count the loading of the segment registers. That's going to cost instructions.

But that's silly anyway. The instructions are meant to be executed over and over again in a loop so that 7 instructions vs 5 makes a difference.

meynaf · 08 December 2018, 10:43

Quote:

Originally Posted by litwr

@meynaf

Code:

   lodsw
   mov [off1+di],ax
   lodsw
   mov [off2+di],ax
   lodsw
   mov [off3+di],ax
   movsw

has 7 instructions.

Right. And without any instruction to loop.

Quote:

Originally Posted by litwr

Code:

 move.w (a0)+,(a1)+
 move.w (a0)+,(a2)+
 move.w (a0)+,(a3)+
 move.w (a0)+,(a4)+

has 5...

5 ?

Learn how to count. This is 4 instructions, not 5

So you added one ghostly instruction to use 10 > 9. Was it intentional ?

Quote:

Originally Posted by litwr

But 68k code needs to do loading of 5 32-bit register before it and x86 code needs only to load 2 16-bit registers. 10 > 9 and thus 68k has more instructions for the case.

And wrong again. Poor litwr

Not only it would be 9, not 10 because 4 isn't equal to 5, but in addition you forget that on 68k we have something very powerful that's just missing in x86. So to load registers we can simply do :

Code:

 movem.l planes,a1-a4

That's only +1 instruction. Count again +1 to load a0, +1 for the dbf loop, and you end up with 7.
9 > 7 and thus x86 has more instructions for the case.

Note that even your point about the necessity to load 4 registers is wrong, as in 68k we could do :

Code:

 move.w (a0)+,(a1)+
 move.w (a0)+,off1-2(a1)
 move.w (a0)+,off2-2(a1)
 move.w (a0)+,off3-2(a1)

Don't forget that the 68k has a very powerful move instruction

07 December 2018, 21:21	#898
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	@meynaf Code: lodsw mov [off1+di],ax lodsw mov [off2+di],ax lodsw mov [off3+di],ax movsw has 7 instructions. Code: move.w (a0)+,(a1)+ move.w (a0)+,(a2)+ move.w (a0)+,(a3)+ move.w (a0)+,(a4)+ has 5... But 68k code needs to do loading of 5 32-bit registers before it and x86 code needs only to load 2 16-bit registers. 10 > 9 and thus 68k has more instructions for the case. @roondar 8-bit data bus at 65816 slows down this CPU very much. My point was that a 65816 variant with 16-bit data bus @4MHz should be faster than 68000 @8MHz. Last edited by litwr; 08 December 2018 at 10:36.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

26 November 2018, 08:44	#883
frost242 Registered User Join Date: Oct 2007 Location: France, 87 Age: 44 Posts: 96	68k is still alive in the embedded world: https://www.nxp.com/products/process...ldfire:PC68KCF

26 November 2018, 12:05	#886
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	Actually the D/A split is a very clever idea, it basically allows having 16 registers with the cost of 8. Imagine what the encoding would be with a 4-bit register field instead of a 3-bit one. You have either to make instructions larger, killing the code density completely, or to severely trim the instruction set, or a combination of both. In either case, it's not nice. But of course many programmers don't know how instructions are encoded and they see that only as an annoyance. They forget that other families with 16 registers either use large code words (like arm) or ugly prefixes (like x86), while on the 68k a small 16-bit opcode can access them all and still provide a great instruction set.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)