A mathematical demo - Page 3

Thorham · 16 January 2017, 10:01

Quote:

Originally Posted by meynaf

8086 traces back to 1977 and was inspired by 8080 which is... well, more ancient.

It's all modern technology (not as in contemporary), see 1b: https://www.merriam-webster.com/dictionary/modern

Samurai_Crow · 16 January 2017, 10:08

Quote:

Originally Posted by meynaf

"speed" isn't the only issue here. And anyway, this prevents fusing MVZ with another instruction, and can slow things down if ICache is at its limits.
Note : Gunnar has support for MVS fusing only unless i'm mistaken, and the case for MVZ is more common...
No, the real reason is that he needed the encoding space for his 64-bit stuff (which can be made by fusing as well, but this is another story).

Yeah, as if the compiler wasn't already the most complex part of the puzzle !

8-way fusions are not available but 3-way fusions are easy for a future release.

As for compilers, they are so complex that they are not profitable business so reusable code is readily available in the form of open-source.

meynaf · 16 January 2017, 10:14

Quote:

Originally Posted by Thorham

It's all modern technology (not as in contemporary), see 1b: https://www.merriam-webster.com/dictionary/modern

It can be both "modern" in the sense you give to that word, and quite old/outdated/superseded.

Quote:

Originally Posted by Samurai_Crow

8-way fusions are not available but 3-way fusions are easy for a future release.

As for compilers, they are so complex that they are not profitable business so reusable code is readily available in the form of open-source.

This doesn't change the value instructions can have.

litwr · 29 January 2017, 16:22

I have tested pi-spigot with IBM PC 386DX hardware and found out that my estimations were slightly wrong. So 14MHz 68020 at Amiga-1200 is still a bit faster than 12.5 MHz 80386. However it was mentioned that FS-UAE is not too accurate with 68020. I am curious to find out its level of accuracy and have started a new thread - http://eab.abime.net/showthread.php?t=85750.
The project tables are updated - http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html

meynaf · 29 January 2017, 16:34

What would be interesting (at least for me) is trying, for every machine, to get the smallest possible executable performing that computation.
In your page you give the executable sizes, but they are quite a lot higher than the minimum that can be done.
In this "code density" challenge, old machines would stand their chance against modern ones.

litwr · 29 January 2017, 17:16

My priorities are:
1) the faster speed;
2) more digits;
3) a program size.
The size of UI is bigger than the calculation code for CPU with hardware division.

Megol · 01 February 2017, 16:12

Quote:

Originally Posted by matthey

<snip>
It appears the 68k has better timings except for the 68020-68030 years (as you suspected). The 68k MUL was more flexible with more general purpose registers. The x86 processors tended to have higher clock speeds. Of course, a modern 68k ASIC could have a 1 cycle MUL for all types

.

Only by limiting general performance. There're reasons a high-performance processor tend to have ~3 clock latency for integer multiplication, one is that optimizing multiplication is generally wasted effort (for several reasons) another is that a multiplication is much more complicated than what is commonly the limitation of integer performance - addition.

Even now with little change in operating frequencies an ASIC isn't likely to limit peak performance in order to optimize something like multiplication.

Megol · 01 February 2017, 16:34

Quote:

Originally Posted by meynaf

Yeah, he suddenly discovered that his useless extensions need heaps of encoding space so he stole this one (and don't expect the FPU to come in next versions, as there isn't enough logic available anymore for that).
He doesn't care about code density, he doesn't care about programmability, and he even resurrected a bug that was fixed in 68010 (which wasn't needeed at all even for 68000 compatibility).
His cpu can't work without rom patches.
Not my definition of a 68k god.
What he did appears great, but who do we have to compare him with ?

Stole? It was never in the 68k instruction set.

What 68010 bug made a comeback BTW?

Thorham · 01 February 2017, 19:56

Quote:

Originally Posted by meynaf

What would be interesting (at least for me) is trying, for every machine, to get the smallest possible executable performing that computation.

But why? It can only lead to some implementations being better than others.

meynaf · 01 February 2017, 20:08

Quote:

Originally Posted by Megol

Stole? It was never in the 68k instruction set.

Right if you don't consider the coldfire as a 68k. But this encoding space was for mvs/mvz, two useful additions i would have liked to have in a future cpu.
Note that it's not exactly never either. Early 68000 masks had dcnt (ancestor of current dbcc) instruction here

Quote:

Originally Posted by Megol

What 68010 bug made a comeback BTW?

It's a 68000 bug, not 68010.

This is 68000 move sr bug that was fixed in 68010.
This bug is bad for virtualization.

Pretty much unimportant for current uses, but not nice if you plan for some distant future.

Quote:

Originally Posted by Thorham

But why? It can only lead to some implementations being better than others.

Wouldn't it be a nice way to compare different cpu families ?
I'd like to see that, especially from the code density point of view.
Perhaps i'm alone... wouldn't be the first time

I've searched the net but so far i've never found code density comparisons of many cpus with real source code...

matthey · 01 February 2017, 20:43

Quote:

Originally Posted by Megol

Only by limiting general performance. There're reasons a high-performance processor tend to have ~3 clock latency for integer multiplication, one is that optimizing multiplication is generally wasted effort (for several reasons) another is that a multiplication is much more complicated than what is commonly the limitation of integer performance - addition.

Why do processors have SIMD units with multiply, MAC units and fused multiply add instructions to improve multiply performance if addition is the bottleneck anyway? The 68060 can do at least 6 additions per cycle already.

Code:

   lea (4,A0,D0),A1 ; 3 additions
   lea (4,A2,D1),A3 ; 3 additions

A 3 integer unit 68k CPU (The Apollo Core had 3 OEP for awhile after I suggested that the 68k had good enough code density to commonly fetch 3 instructions per cycle) could have done at least 9 additions per cycle (up to 12 additions per cycle with my simple Effective Address Direct addressing mode from my 68kF ISA).

Code:

   add.l {4,A0,D0},A1 ; 4 additions
   add.l {4,A2,D1},A3 ; 4 additions
   add.l {4,A4,D2},A5 ; 4 additions

How many additions are there per multiplication on a "high performance processor"? How many additions are used for address calculations (which the 68k eats for lunch)?

I would expect that making 32x32=32 single cycle may limit the maximum clock rate. That is not necessarily a problem if the target is embedded and maximum performance per MHz per core is wanted. It is simpler (saves logic) to have single cycle instructions.

Quote:

Originally Posted by Megol

Even now with little change in operating frequencies an ASIC isn't likely to limit peak performance in order to optimize something like multiplication.

Multiplication is likely to need some rework when moving from FPGA to ASIC anyway. The 68060 is already close to a single cycle multiply with a huge die size so it may be possible but there are many variables. The 68060 could likely have done a 16x16=32 in 1 cycle but I expect the optimization would have cost a little more logic and gates were not cheap then. Wider gets exponentially slower for multiplication though. Single cycle 32x32=32 is unlikely to be a priority. It is just so close and would be simpler but limiting the clock speed would be a major down side too.

Quote:

Originally Posted by meynaf

Wouldn't it be a nice way to compare different cpu families ?
I'd like to see that, especially from the code density point of view.
Perhaps i'm alone... wouldn't be the first time
I've searched the net but so far i've never found code density comparisons of many cpus with real source code...

I found a few attempts to compare code density between CPU families over the years. The following article is very old but done back when compilers were still good for the 68k (common flaw to use compiler output for comparison which ends up comparing compiler produced code density). The 68k (not clear if 68000 or 68020 ISA) and ARM with Thumb 2 were both virtually tied with CF (ISA_C) close behind (no 8 bit CPUs).

SPARC16: A new compression approach for the SPARC architecture
https://www.researchgate.net/profile...chitecture.pdf

There was a more recent "attempt" which included 8 bit CPUs but only used a tiny hand optimized assembler program with mostly byte sized data (text). I downloaded the assembler program for the 68k and wrote the author to tell him he would be better off using a compiler for the 68k even though they generate horrible code too. I told him his whole study was majorly flawed and should be taken down as it is misinformation (it looks like an official study by a college student but it is complete rubbish). He went all defensive so I don't think he changed anything either. His background was a DOS x86 programmer as I recall and the 8086 did finish with the best code density (it would be good for tiny text programs). This attempt is really laughable as it compared his cross platform assembler programming skills which are poor.

Code Density Concerns for New Architectures
http://web.eece.maine.edu/~vweaver/p...09_density.pdf

It would be nice to see a good code density comparison of many processors but it is inherent to flaws. This litwr pi program uses OS code which reduces the executable size. It would be better to write the characters to a buffer in memory and exclude the printing support code for code density calculations. The time should also only measure the time necessary to write to the buffer and not to print. The result could be null terminated and printed afterward though. There are still several potential flaws like the over importance of division instruction performance (as litwr noted) and the tiny program size being non-representative of general CPU performance. Still it would be interesting knowing these caveats.

Thorham · 02 February 2017, 10:36

Quote:

Originally Posted by meynaf

Wouldn't it be a nice way to compare different cpu families ?

Of course, but wouldn't speed comparisons be more useful?

Megol · 02 February 2017, 12:36

Quote:

Originally Posted by matthey

There was a more recent "attempt" which included 8 bit CPUs but only used a tiny hand optimized assembler program with mostly byte sized data (text). I downloaded the assembler program for the 68k and wrote the author to tell him he would be better off using a compiler for the 68k even though they generate horrible code too. I told him his whole study was majorly flawed and should be taken down as it is misinformation (it looks like an official study by a college student but it is complete rubbish). He went all defensive so I don't think he changed anything either. His background was a DOS x86 programmer as I recall and the 8086 did finish with the best code density (it would be good for tiny text programs). This attempt is really laughable as it compared his cross platform assembler programming skills which are poor.

Instead of flaming and insulting him you could just helped optimizing the program. Others have.

BTW: Official study? Are you really that clueless of how academic studies are done?!? There are no official studies - studies are done and published by individuals and/or groups. If you (or anybody else) don't like it then you publish something yourself - it is that simple.

One does not go out bad mouthing the study and whine in public how bad it was done (especially with not proof) and call the authors skills into question.

Really, I thought better of you. This behavior is fitting an immature crank!

P.S. 8086 have very good code density for some tasks, the fact that instructions can be pretty complex while being one byte in size makes a huge difference. Don't like it? Grow up.

P.S.^2 Yes I'm pissed off.

meynaf · 02 February 2017, 14:54

Quote:

Originally Posted by matthey

It would be nice to see a good code density comparison of many processors but it is inherent to flaws. This litwr pi program uses OS code which reduces the executable size. It would be better to write the characters to a buffer in memory and exclude the printing support code for code density calculations. The time should also only measure the time necessary to write to the buffer and not to print. The result could be null terminated and printed afterward though. There are still several potential flaws like the over importance of division instruction performance (as litwr noted) and the tiny program size being non-representative of general CPU performance. Still it would be interesting knowing these caveats.

The thing is that all available attempts seem to just not give any asm code, i.e. are meaningless.
The best thing would be to recruit asm programmers for every architecture and so they would write the shortest code for each.

Quote:

Originally Posted by Thorham

Of course, but wouldn't speed comparisons be more useful?

Speed comparison of 6502/z80 vs 68000 wouldn't be very meaningful...
However in matter of short code they are competitive.

Thorham · 02 February 2017, 15:57

Quote:

Originally Posted by meynaf

Speed comparison of 6502/z80 vs 68000 wouldn't be very meaningful...

Actually, it would be. They're all the same kind of device, so performance comparisons are completely relevant.

Quote:

Originally Posted by meynaf

However in matter of short code they are competitive.

Yes, but why does that matter? Speed is far more useful than code size.

meynaf · 02 February 2017, 16:07

Quote:

Originally Posted by Thorham

Actually, it would be. They're all the same kind of device, so performance comparisons are completely relevant.

Everything beats the crap out of old 8-bit cpus.

Quote:

Originally Posted by Thorham

Yes, but why does that matter? Speed is far more useful than code size.

Depends what you're doing. In the embedded size matters.
Anyway I just like short code

matthey · 02 February 2017, 16:11

Quote:

Originally Posted by Megol

Instead of flaming and insulting him you could just helped optimizing the program. Others have.

I made suggestions but he had nothing to start with. I told him to start over with compiled code and then use a good optimizing assembler on the code (not GAS!) and he would have a reasonable starting point depending on which version of GCC he used. He didn't know the basics of assembler programming or code density on the 68k like the existence of the quick instructions (MOVEQ, ADDQ, SUBQ) and the CCR being set by most instructions as I recall. I respect someone trying to learn assembler programming on different architectures but they shouldn't be posting the results of their baby steps like it is some study.

Quote:

Originally Posted by Megol

BTW: Official study? Are you really that clueless of how academic studies are done?!? There are no official studies - studies are done and published by individuals and/or groups. If you (or anybody else) don't like it then you publish something yourself - it is that simple.

Official as in by a professor at the university and/or sanctioned/funded by the university. Also, perhaps having some semblance of scientific method and statistical analysis. I was researching code density and thought I had found something interesting and potentially useful based on the nice graphs and diversity of CPU architectures until I dug a little deeper and found out I wasted my time because this amateur wannabe's "study" came up in searches instead of something useful. There are other flaws also like only analyzing one tiny useless program (litwr's pi program is tiny but useful at least).

Quote:

Originally Posted by Megol

One does not go out bad mouthing the study and whine in public how bad it was done (especially with not proof) and call the authors skills into question.

It would be nice if there was a rating system based on science and statistics instead of popularity so trash like this doesn't come up in searches. The guy has or had a web site with the code he used if you want to find it. At least he had that. What if he didn't have that? Would people believe this trash? Most less versed in science probably do anyway. Ouch!

Quote:

Originally Posted by Megol

P.S. 8086 have very good code density for some tasks, the fact that instructions can be pretty complex while being one byte in size makes a huge difference. Don't like it? Grow up.

I am very much aware that the 8086 and its successors have very good code density with tiny programs using byte sized data (better than the 68k here). I have now mentioned this three times in this thread. My point was that his 8086 code is probably reasonably optimal and he is likely biased (intentionally or not) toward this CPU family.

Quote:

Originally Posted by Megol

Really, I thought better of you. This behavior is fitting an immature crank!

P.S.^2 Yes I'm pissed off.

Sorry. I am frustrated by the lack of science and statistics by wannabes giving deceptive and false information while wasting people's time from finding good information and the truth. I am fairly tolerant of people who are ignorant but I informed this guy of his "substantial" flaws and he ignored me. Then his ignorance became willful ignorance which irritates me.

I watched a Bill Nye show on climate change a few months back and they brought up a graph for a few seconds with the PPM of CO2 over time. I said out loud, "wait a second, that graph didn't start at zero". My brother used the DVR to rewind and freeze the frame with the graph. I then looked up a graph which started with zero PPM and it changed the whole picture as I expected. I then found a chart which went back thousands of more years and all of a sudden what looked like a hysteria causing chart became unremarkable. This is a good example of misinformation distorting the truth. It made me lose respect for everyone on that show and involved with that show. I would have demanded they zeroed that chart if i had anything to do with that show. Many people do not understand scientific method, statistics, critical thinking and propaganda and are more easily swayed. I do have a problem with the propaganda propagators, brain washers and deceivers of the world. Sadly, they have caused great evil and loss even in this modern world where we benefit so much from science.

Thorham · 02 February 2017, 16:17

Quote:

Originally Posted by meynaf

Everything beats the crap out of old 8-bit cpus.

Except 4 bit CPUs (Intel 4004)

Would be interesting to see how much faster later CPUs are.

Quote:

Originally Posted by meynaf

Depends what you're doing. In the embedded size matters.

True.

Quote:

Originally Posted by meynaf

Anyway I just like short code

I like fast code

matthey · 02 February 2017, 18:04

Quote:

Originally Posted by meynaf

Depends what you're doing. In the embedded size matters.
Anyway I just like short code

Many women say size doesn't matter too

.

Code density does matter even to modern processors. If you had read the, IMO, good SPARC16 research, it talks about (instruction) cache miss rates. There is a substantial difference in cache miss rates between a low code density RISC CPU and a high code density CISC or compressed RISC (Thumb, Thumb 2, MIPS16(e), MicroMIPS, PPC CodePack, SPARC16, etc) CPU. Reducing the cache misses results in substantially better performance and reduced power consumption. Most of the research has been done on RISC processors with compression. Examples of research study results.

1) Thumb 2 showing 55%-70% compression with 30% speed gain with 16 bit bus and 10% loss with 32 bit bus
2) MicroMIPS provided a 65% compression ratio giving a 2% speedup vs MIPS32
3) A general RISC compression showed a 35% reduction in code size gave a 10% reduction in power consumption and a 20% performance improvement
4) Compiler heuristics that gave an 85% compression ratio improved performance by 17%
5) Link time ARM optimizations reducing code size from 16%-18% provided 8%-17.4% performance gain and 7.9%-16.2% reduction in power consumption

References can be found in the Design and evaluation of compact ISA extensions paper which is the 2nd article after the SPARC16: A new compression approach for the SPARC architecture link.

There is a wide range of performance for RISC compression which I suspect is due to decreased functionality and increased number of instructions by many of these RISC compression schemes. The compiler and heuristics performance improvements were larger possibly because they did not decrease functionality (introduced limitations) and likely decreased the number of instructions. Adding instructions to an enhanced 68k ISA could increase functionality and decrease the number of instructions as well as improve code density so I would expect to see performance and power consumption improvements at the high range of these study results. Perhaps 68k ISA enhancements could help compilers generate better code which would also improve code density and have synergies with other code density performance improvements. It is sad that an enhanced 68k ISA can naturally have better code density than probably all of these RISC compression schemes with minimal implementation issues yet gets practically no attention or research.

The old SPARC16: A new compression approach for the SPARC architecture paper lists the following architectures as having the least code density (starting with worst).

1) Alpha (dead)
2) MIPS (on life support)
3) PowerPC (on life support)
4) SPARC (on life support)
5) ARM original (abandoned/replaced by higher code density Thumb 2)

Is it just a coincidence that these low code density architectures are antiquated? PA-RISC, another very low code density RISC architecture which the Amiga was going to use, isn't mentioned either. Does size matter for my girl? I think she would "prefer" short code and my long ... post

.

NorthWay · 03 February 2017, 14:40

I just recently read a paper on the RISC-V (https://people.eecs.berkeley.edu/~kr...ECS-2016-1.pdf) which has been designed with compression in mind. They discussed (among others) x86 and x64 code size and found it was rather much larger (IMO) than conventional wisdom claims. 68K was not mentioned, but I'm guessing it would begin to look good in that regard.

(My only complaints against the RISC-V would be that it has no big<->little endian instructions or loads/stores, and no load/store multiple.)

03 February 2017, 14:40	#60
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	I just recently read a paper on the RISC-V (https://people.eecs.berkeley.edu/~kr...ECS-2016-1.pdf) which has been designed with compression in mind. They discussed (among others) x86 and x64 code size and found it was rather much larger (IMO) than conventional wisdom claims. 68K was not mentioned, but I'm guessing it would begin to look good in that regard. (My only complaints against the RISC-V would be that it has no big<->little endian instructions or loads/stores, and no load/store multiple.) Last edited by NorthWay; 03 February 2017 at 14:54. Reason: link

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Maniac Mansion Demo Disk - Onslaught releases the demo of a classic for C64!	Neil79	Retrogaming General Discussion	0	16 January 2015 10:40
Yet another "plz help me find a demo" thead! AGA demo w/ texturemapped building	ruinatokyo	request.Demos	1	26 September 2013 16:29
Jason Lowe and The Mathematical Reflex Test	jedk	Retrogaming General Discussion	5	30 January 2013 02:13
Old Amiga Demo Wanted -- Music Track "The last ninja demo"	scribbleuk	request.Demos	13	23 April 2012 13:35
CU Amiga #75/Simon The Sorcerer Demo + CU Amiga #99/Amazon Queen Demo	MethodGit	request.Old Rare Games	12	16 February 2004 17:16

29 January 2017, 16:22	#44
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	I have tested pi-spigot with IBM PC 386DX hardware and found out that my estimations were slightly wrong. So 14MHz 68020 at Amiga-1200 is still a bit faster than 12.5 MHz 80386. However it was mentioned that FS-UAE is not too accurate with 68020. I am curious to find out its level of accuracy and have started a new thread - http://eab.abime.net/showthread.php?t=85750. The project tables are updated - http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html

29 January 2017, 16:34	#45
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	What would be interesting (at least for me) is trying, for every machine, to get the smallest possible executable performing that computation. In your page you give the executable sizes, but they are quite a lot higher than the minimum that can be done. In this "code density" challenge, old machines would stand their chance against modern ones.

29 January 2017, 17:16	#46
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	My priorities are: 1) the faster speed; 2) more digits; 3) a program size. The size of UI is bigger than the calculation code for CPU with hardware division.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)