68k details - Page 25

grond · 11 September 2018, 17:36

If you really want data on binary sizes, have a look at old linux distributions that were available for m68k and i386. Just pick a good sample of packages that include mostly executable code and have a look at the file sizes. I'm pretty sure that m68k will have better code density than i386. Of course, the result will be skewed by the unknown effects of gcc/i386 vs. gcc/m68k. Presumably the latter didn't get as much love.

Unfortunately this sort of information is difficult to find today on the internet. I looked around in debian online resources and didn't find anything that still had m68k packages and eventually lost interest. Perhaps one of you guys wants to pick it up...

roondar · 11 September 2018, 17:40

Quote:

Originally Posted by litwr

A very impressive technique! But it is not exactly like a C-switch statement which needs to check a value and if it is not matched to go to the next case position.

Of course, hence I said 'often' and not 'always'

Not all switches are compatible with this style, but you'd be surprised by how often it's possible to convert them into this type of indirect jump.

meynaf · 11 September 2018, 17:47

Quote:

Originally Posted by litwr

Indeed but it will require much more than 14 cycles.

Not a problem, we've got more Mhz

Quote:

Originally Posted by litwr

My point: x86 segment registers are not registers in the full sense but they were just temporary means, 68000 address registers don't have all functionality of GPR but they are forever.

Segment registers were not temporary. They are still supported today, even if it's just in compatibility modes.
During that time more features could be added to 68000 address registers - which they unfortunately didn't do.

Now I must add that x86's registers aren't GPR either.
Are SI, DI, BP true GPR ? Nope. You can't byte access them, for example. In some way, they are... address registers

Quote:

Originally Posted by litwr

It is you again who tries to suggest something. I can even begin to thing that you feel something wrong against 68000. I can repeat my point - a processor (and cheetah) must be fast.

A processor must be fast, true, but it depends more on the actual implementation. And implementation isn't my point here.

Quote:

Originally Posted by litwr

IMHO it is much better to have one bigger stack than two separate stack areas.

In a multitasking environment it's not one bigger stack, but many. You have 10 processes => 10 bigger stacks.

Quote:

Originally Posted by litwr

Those instructions are separated from others and can be easily replaced.

That's only for 64-bit mode. In 16 or 32 bit modes, they're still here and can't be replaced. You can't just remove them and still be perfectly compatible.

Quote:

Originally Posted by litwr

Maybe for those ppl who like to run old OS like MS-DOS but modern OS don't use them. It is even impossible to use them with x86-64.

Modern OS still boot from BIOS which starts with CS=FFFF and IP=0000.
I'm not sure boot block isn't still in 16-bit mode.

Quote:

Originally Posted by litwr

It is just another indirect proof. You can't compare 1 MB data byte-by-byte. It is quite possible that some 680000 routines are a bit smaller but it is possible that some routines are smaller for x86. However you almost convinced me that 68000 code density is very close to 8086 and may be even a bit better.

It's not 68000 code, but 68020. Neither is it 8086 code, it wouldn't run on 8086 anyway - too slow, not enough memory.
And it's true 68020 code is (slightly) smaller than 68000 code (while 80386 code is bigger than 8086). But the difference can't account for the *1.5 I regularly observed.

Comparing byte-by-byte isn't necessary, just check where individual routines are located.
And I know the entry point of many routines. Every time i checked, 68k code was smaller.

Some little extract for you, with 6 different game versions (Mac 68k 1.0 en, PC 1.0 en, PC 1.0 fr, PC 2.0 fr, PC 2.1 en, PC 2.1 pl) :

Code:

464e4	439190	49546b	43b7d4	423760	456ae0	GetRandomNumTroops__4gameFi
46b52	439865	495B40	43BEA9	423E50	457190	NextPlayer__4gameFv
46fd0	439E0E	4960EA	43C440	4243F0	4576C0	ComputeDailyGold__4gameFi
4744e	43A244	496520	43C872	424830	457A50	PerDay__4gameFv
47a90	43ABC4	496EA0	43D21C	4251E0	458460	PerWeek__4gameFv
48ad8	43C45C	498739	43ED56	426D20	45A0C0	PerMonth__4gameFv

Note : windows versions. Dos exes are bigger due to dos/4g.

Do you observe at which speed the offsets (load address for windows code) grow ?

It is clear that, while extra code has been added in later versions (not much), 68k code is indeed smaller.
Actually this was a great surprise to me, because it's largely suboptimal, and, worse, linker names (the names you can see above) were present at the end of nearly every routine !

I doubt CodeWarrior for Mac in Debug mode did better than MSVC6 in Release mode, yet the code is smaller.

If you don't believe me you can just get the exes (i probably even still have them) and disassemble the code yourself

Quote:

Originally Posted by litwr

8086 iret takes 32 cycles but at 80286 it takes only about 25 cycles. So 68000 is quite fast with interrupts. It is much better than I expected. Thanks for the information. However it doesn't make 68000 so superior to 80286 as you mentioned.

Well, it seems equal now : 68000 is better than you expected and 80286 is better than I expected

Quote:

Originally Posted by litwr

I have changed nothing. Just be more careful.

Hmm... All of this is none too clear...

Quote:

Originally Posted by litwr

It is not easy subject. Let's examine z80, it has 4-bit ALU, 8-bit registers which can be combined into 16-bit registers, 8- and 16-bit operations, ... Some z80 16-bit operations are very fast. The first ARMs have 26-bit address bus...

Add to that no current 64-bit cpu really has 64-bit address bus

Quote:

Originally Posted by litwr

Sorry for my English.

I can sometimes use words with a bit shifted sense. It would probably be better to write that 8086 is a typical example of 16-bit processor. Thank you for the clarifications.

English isn't native for me either so misunderstandings can unfortunately happen just too often...
Another reason for us to prefer code ?

Don_Adan · 11 September 2018, 18:57

Quote:

Originally Posted by meynaf

Have you tried :

Code:

 subq.w #1,d0
 bcs case_0
 beq case_1
 subq.w #2,d0
 bcs case_2
 beq case_3

If you can use short branches, timing becomes something like 14/22/34/42 (28 on average).

Note : more fun on 68020.

Code:

jmp [switch_table,pc,d0.w*4]

Sorry Phil, but for only 4 cases your code is too long and slow.

Code:

  subq.w #1,d0
  bcs case_0
  beq case_1
  subq.w #2,d0
  bcs case_2
case_3
...
case_2
...
case_1
...
case_0
...

meynaf · 11 September 2018, 19:16

Quote:

Originally Posted by Don_Adan

Sorry Phil, but for only 4 cases your code is too long and slow.

The extra branch is for when data is out of range. If there is really only 4 possible cases then yes the last branch can be removed.

Don_Adan · 11 September 2018, 19:50

Quote:

Originally Posted by meynaf

The extra branch is for when data is out of range. If there is really only 4 possible cases then yes the last branch can be removed.

Right.
For most cases next code is used as input:

moveq #0,d0
move.b (a0)+,d0

but next code can be used too

moveq #3,d0
and.b (a0)+,d0

Of course this is only example to remove extra branch check. Not for all cases can be used.

NorthWay · 12 September 2018, 12:53

Quote:

Originally Posted by NorthWay

I think Mike at FPGAArcade has the info on the decapped 68000.

Talking to myself, but some groundwork has been done over at http://www.visual6502.org/images/pag...ola_68000.html

plasmab · 13 September 2018, 09:00

Quote:

Originally Posted by NorthWay

Talking to myself, but some groundwork has been done over at http://www.visual6502.org/images/pag...ola_68000.html

I'm interested. Be very interested in understanding how you get the MC68000 into test mode for the firmware. I have all the gear setup to capture the output

litwr · 13 September 2018, 09:52

Quote:

Originally Posted by grond

If you really want data on binary sizes, have a look at old linux distributions that were available for m68k and i386. Just pick a good sample of packages that include mostly executable code and have a look at the file sizes. I'm pretty sure that m68k will have better code density than i386. Of course, the result will be skewed by the unknown effects of gcc/i386 vs. gcc/m68k. Presumably the latter didn't get as much love.

It is an interesting idea. However our main topic was about 8086 and 68000 code density. Linux is unavailable for them. To compare 68020/30 and 80386/486 code density is also quite interesting - I think somebody needs at least a day for this approach.

Quote:

Originally Posted by meynaf

And it's true 68020 code is (slightly) smaller than 68000 code (while 80386 code is bigger than 8086). But the difference can't account for the *1.5 I regularly observed.

I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.

Quote:

Originally Posted by meynaf

Add to that no current 64-bit cpu really has 64-bit address bus

I'm really curious when will computers have about 4 exabytes of memory? I can think about 50 years forward - too much to me to see that.

roondar · 13 September 2018, 11:31

Assuming the 1nm transistor suggested by Berkley is a thing that can work in the future, all you'd need for 4 exabytes of DRAM memory is a 283 square cm die. Back to room sized computers it is

meynaf · 13 September 2018, 13:01

Quote:

Originally Posted by litwr

I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.

Isn't that due to big mul / div routines that are single instruction on 386 ?
Anyhow, of course 8086 isn't good at 32-bit stuff

Quote:

Originally Posted by litwr

I'm really curious when will computers have about 4 exabytes of memory? I can think about 50 years forward - too much to me to see that.

I doubt we ever will.
There are limits such as speed of electricity, ultimately speed of light, that could stop the exponential growth of computational power.

meynaf · 13 September 2018, 13:18

Quote:

Originally Posted by litwr

I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.

Hey, wait. Now that i think about it, >600 bytes seems large for just outputting 800 pi digits.
I had the algorithm noted somewhere so i tried recoding it for 68020 and got a 408 bytes executable (which can probably still be slightly reduced).
Is there something the program must do that i forgot ?

litwr · 13 September 2018, 16:21

Quote:

Originally Posted by meynaf

Hey, wait. Now that i think about it, >600 bytes seems large for just outputting 800 pi digits.
I had the algorithm noted somewhere so i tried recoding it for 68020 and got a 408 bytes executable (which can probably still be slightly reduced).
Is there something the program must do that i forgot ?

It is the fastest and allows to input number of digits interactively.

meynaf · 13 September 2018, 16:32

Quote:

Originally Posted by litwr

It is the fastest and allows to input number of digits interactively.

Ok. Let's see if i can still do a shorter version.
Using different speed tricks for different cpus will lead to code size that's not relevant for code density comparison.

meynaf · 13 September 2018, 20:20

Done it !
Executable rewritten for A1200, contains all features of the original (even speed ; could be shorter if i remove the mulu optim).
Size = 596 bytes. x86 beaten

But the added bulk is mostly AmigaOS calls. For this reason i don't see this program as significant in matter of code density as it depends as much on the host operating system as on the cpu.
Nor is it relevant for cpu benchmark, as most of the time is spent displaying digits rather than calculating them...

litwr · 15 September 2018, 12:51

Quote:

Originally Posted by meynaf

Are SI, DI, BP true GPR ? Nope. You can't byte access them, for example. In some way, they are... address registers

It is 100% wrong because these registers can be used at any place where AX, BX, ... are allowed. AX, BX, CX, DX - may be used as 8 8-bit registers and this gives us 12 registers for x86.

If we have to work with bytes x86 gives us the same number of data registers as 68000 plus 4 registers which can be used for address or data.

Quote:

Originally Posted by meynaf

Done it !
Executable rewritten for A1200, contains all features of the original (even speed ; could be shorter if i remove the mulu optim).
Size = 596 bytes. x86 beaten

But the added bulk is mostly AmigaOS calls. For this reason i don't see this program as significant in matter of code density as it depends as much on the host operating system as on the cpu.
Nor is it relevant for cpu benchmark, as most of the time is spent displaying digits rather than calculating them...

Why have you hidden this program?

Let's check it.

BTW the part about the first 32-bit CPU is added to the article -https://litwr.livejournal.com/1970.html

meynaf · 15 September 2018, 13:56

Quote:

Originally Posted by litwr

It is 100% wrong because these registers can be used at any place where AX, BX, ... are allowed. AX, BX, CX, DX - may be used as 8 8-bit registers and this gives us 12 registers for x86.

Nope it's not wrong.
Only SI can read with auto-increment.
Only DI can write with auto-increment.
Only BP can be used as frame pointer.
Only DX can be used as loop counter with the loop instruction.
Only CL can be used for variable shifting.
There are numerous examples.
x86 registers are definitely not GPR.

Quote:

Originally Posted by litwr

If we have to work with bytes x86 gives us the same number of data registers as 68000 plus 4 registers which can be used for address or data.

Not really.
If we have to work with words the 68k can use the SWAP instruction and we can get up to 16 data. With extra instructions to access register parts, we could have access to up to 32 bytes.
And it's not "plus 4 registers". Only 3, as SP can't be used for data.

It seems there is a lot of misunderstanding about the D/A split of 68k.
Many people see it as a shortcoming. But it's not. Not at all. The price to pay is very small for what we get.

On cpus with 16-bit opcode we have 3-bit register encoding. This is necessary if you don't want to see the instruction set severely trimmed.
On 68k you have the base encoding which is 4-bit opcode, 3-bit register, 1-bit mode, 2-bit size, 6-bit ea. If we want to use 4-bit register, that's +2 bits here (ea becomes 7-bit). It simply doesn't fit.

On ARM (with Thumb) 16-bit opcodes can only touch R0-R7.
On x86-64 you can not use R8-R15 without a REX prefix.
But on 68k you can use all 16 registers without a size penalty.

Try to multiply, divide, shift a pointer type with a C compiler. It will be rejected, and with good reason. Do people grumble that these operations are not allowed ? It's nonsense.

As addresses are 32-bit, any 16-bit use is automatically extended.
In addition, the resulting value is supposed to be pointer, not data, and therefore the CCR remains untouched.

It's true sometimes we're out of data regs and start using address regs for data. But then we have more features, not less : automatic sign-extend and operations that don't alter the flags.
Consider simple ADDA.W (A0,D0.W),A0. You can't do that on x86. You have to read the value, extend it, then add it. Of course x86's pointer arithmetic kills the flags and that's really bad.

The only problem is that the use of An is too limited in comparison to what the encoding would allow - for the cases we indeed lack some Dn. However this could have been fixed.

But in any case, don't judge without knowing.

Quote:

Originally Posted by litwr

Why have you hidden this program?

Let's check it.

I don't like to keep attachments in my posts, but I can put it in the zone here, if you have access to it.

litwr · 15 September 2018, 14:57

Quote:

Originally Posted by meynaf

Only SI can read with auto-increment.
Only DI can write with auto-increment.
Only DX can be used as loop counter with the loop instruction.

Are you ok? You have written something incorrect... Only DX can be used as an address to port address area. It is just about of absence of not-important orthogonality. Look at GPR definition on Wikipedia - "General-purpose registers (GPRs) can store both data and addresses" - it's all.

Quote:

Originally Posted by meynaf

If we have to work with words the 68k can use the SWAP instruction and we can get up to 16 data. With extra instructions to access register parts, we could have access to up to 32 bytes.

You can also exchange a register content with a value at a memory address.

It is just a SWAP variant.

Quote:

Originally Posted by meynaf

I don't like to keep attachments in my posts, but I can put it in the zone here, if you have access to it.

I don't know about zones - please direct me there. Otherwise I can say that you are bluffing.

EDIT. I agree that the 68k's address registers can be considered as limited data registers too.

meynaf · 15 September 2018, 15:24

Quote:

Originally Posted by litwr

Are you ok?

Yes, and you ?

Quote:

Originally Posted by litwr

You have written something incorrect...

Wasn't that you, rather ?
What I wrote is correct. The uses of SI, DI, DX i have written are correct and the fact i didn't mention DX as port number doesn't change that.

Quote:

Originally Posted by litwr

Only DX can be used as an address to port address area. It is just about of absence of not-important orthogonality. Look at GPR definition on Wikipedia - "General-purpose registers (GPRs) can store both data and addresses" - it's all.

So if it's written on Wikipedia then it's necessarily true and complete ?
Anyway, according to your definition, 68k's registers are GPRs because they really can store both data and addresses (and the rest is "just about of absence of not-important orthogonality"

).

But strange case for DX indeed, even though it's not exactly a memory address it needs to store (it's a port number - and remains 16-bit even in 32-bit mode).

Also forgot to mention AX is the only register usable for a number of things.

Quote:

Originally Posted by litwr

You can also exchange a register content with a value at a memory address.

It is just a SWAP variant.

Memory accesses are usually slower. Especially an exchange which needs two memory accesses.
Of course it makes the code bigger too

Quote:

Originally Posted by litwr

I don't know about zones - please direct me there. Otherwise I can say that you are bluffing.

Look at the eab page. Between "Donate" and "Log Out" there is a link called "The Zone".
To access it :
http://eab.abime.net/faq.php?faq=vb_...ezone_faq_item

Tell me when you're ready. I'll upload it then.

meynaf · 20 September 2018, 15:27

Not eager to get it, huh ? I won't wait any longer and have made the upload anyway. Other ppl might be interested.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

11 September 2018, 17:36	#481
grond Registered User Join Date: Jun 2015 Location: Germany Posts: 1,920	If you really want data on binary sizes, have a look at old linux distributions that were available for m68k and i386. Just pick a good sample of packages that include mostly executable code and have a look at the file sizes. I'm pretty sure that m68k will have better code density than i386. Of course, the result will be skewed by the unknown effects of gcc/i386 vs. gcc/m68k. Presumably the latter didn't get as much love. Unfortunately this sort of information is difficult to find today on the internet. I looked around in debian online resources and didn't find anything that still had m68k packages and eventually lost interest. Perhaps one of you guys wants to pick it up...

13 September 2018, 11:31	#490
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,423	Assuming the 1nm transistor suggested by Berkley is a thing that can work in the future, all you'd need for 4 exabytes of DRAM memory is a 283 square cm die. Back to room sized computers it is

13 September 2018, 20:20	#495
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,335	Done it ! Executable rewritten for A1200, contains all features of the original (even speed ; could be shorter if i remove the mulu optim). Size = 596 bytes. x86 beaten But the added bulk is mostly AmigaOS calls. For this reason i don't see this program as significant in matter of code density as it depends as much on the host operating system as on the cpu. Nor is it relevant for cpu benchmark, as most of the time is spent displaying digits rather than calculating them...

20 September 2018, 15:27	#500
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,335	Not eager to get it, huh ? I won't wait any longer and have made the upload anyway. Other ppl might be interested.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)