English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 11 September 2018, 17:36   #481
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 450
If you really want data on binary sizes, have a look at old linux distributions that were available for m68k and i386. Just pick a good sample of packages that include mostly executable code and have a look at the file sizes. I'm pretty sure that m68k will have better code density than i386. Of course, the result will be skewed by the unknown effects of gcc/i386 vs. gcc/m68k. Presumably the latter didn't get as much love.

Unfortunately this sort of information is difficult to find today on the internet. I looked around in debian online resources and didn't find anything that still had m68k packages and eventually lost interest. Perhaps one of you guys wants to pick it up...
grond is offline  
Old 11 September 2018, 17:40   #482
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 474
Quote:
Originally Posted by litwr View Post
A very impressive technique! But it is not exactly like a C-switch statement which needs to check a value and if it is not matched to go to the next case position.
Of course, hence I said 'often' and not 'always'


Not all switches are compatible with this style, but you'd be surprised by how often it's possible to convert them into this type of indirect jump.
roondar is offline  
Old 11 September 2018, 17:47   #483
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
Indeed but it will require much more than 14 cycles.
Not a problem, we've got more Mhz


Quote:
Originally Posted by litwr View Post
My point: x86 segment registers are not registers in the full sense but they were just temporary means, 68000 address registers don't have all functionality of GPR but they are forever.
Segment registers were not temporary. They are still supported today, even if it's just in compatibility modes.
During that time more features could be added to 68000 address registers - which they unfortunately didn't do.

Now I must add that x86's registers aren't GPR either.
Are SI, DI, BP true GPR ? Nope. You can't byte access them, for example. In some way, they are... address registers


Quote:
Originally Posted by litwr View Post
It is you again who tries to suggest something. I can even begin to thing that you feel something wrong against 68000. I can repeat my point - a processor (and cheetah) must be fast.
A processor must be fast, true, but it depends more on the actual implementation. And implementation isn't my point here.


Quote:
Originally Posted by litwr View Post
IMHO it is much better to have one bigger stack than two separate stack areas.
In a multitasking environment it's not one bigger stack, but many. You have 10 processes => 10 bigger stacks.


Quote:
Originally Posted by litwr View Post
Those instructions are separated from others and can be easily replaced.
That's only for 64-bit mode. In 16 or 32 bit modes, they're still here and can't be replaced. You can't just remove them and still be perfectly compatible.


Quote:
Originally Posted by litwr View Post
Maybe for those ppl who like to run old OS like MS-DOS but modern OS don't use them. It is even impossible to use them with x86-64.
Modern OS still boot from BIOS which starts with CS=FFFF and IP=0000.
I'm not sure boot block isn't still in 16-bit mode.


Quote:
Originally Posted by litwr View Post
It is just another indirect proof. You can't compare 1 MB data byte-by-byte. It is quite possible that some 680000 routines are a bit smaller but it is possible that some routines are smaller for x86. However you almost convinced me that 68000 code density is very close to 8086 and may be even a bit better.
It's not 68000 code, but 68020. Neither is it 8086 code, it wouldn't run on 8086 anyway - too slow, not enough memory.
And it's true 68020 code is (slightly) smaller than 68000 code (while 80386 code is bigger than 8086). But the difference can't account for the *1.5 I regularly observed.

Comparing byte-by-byte isn't necessary, just check where individual routines are located.
And I know the entry point of many routines. Every time i checked, 68k code was smaller.

Some little extract for you, with 6 different game versions (Mac 68k 1.0 en, PC 1.0 en, PC 1.0 fr, PC 2.0 fr, PC 2.1 en, PC 2.1 pl) :
Code:
464e4	439190	49546b	43b7d4	423760	456ae0	GetRandomNumTroops__4gameFi
46b52	439865	495B40	43BEA9	423E50	457190	NextPlayer__4gameFv
46fd0	439E0E	4960EA	43C440	4243F0	4576C0	ComputeDailyGold__4gameFi
4744e	43A244	496520	43C872	424830	457A50	PerDay__4gameFv
47a90	43ABC4	496EA0	43D21C	4251E0	458460	PerWeek__4gameFv
48ad8	43C45C	498739	43ED56	426D20	45A0C0	PerMonth__4gameFv
Note : windows versions. Dos exes are bigger due to dos/4g.

Do you observe at which speed the offsets (load address for windows code) grow ?

It is clear that, while extra code has been added in later versions (not much), 68k code is indeed smaller.
Actually this was a great surprise to me, because it's largely suboptimal, and, worse, linker names (the names you can see above) were present at the end of nearly every routine !

I doubt CodeWarrior for Mac in Debug mode did better than MSVC6 in Release mode, yet the code is smaller.

If you don't believe me you can just get the exes (i probably even still have them) and disassemble the code yourself


Quote:
Originally Posted by litwr View Post
8086 iret takes 32 cycles but at 80286 it takes only about 25 cycles. So 68000 is quite fast with interrupts. It is much better than I expected. Thanks for the information. However it doesn't make 68000 so superior to 80286 as you mentioned.
Well, it seems equal now : 68000 is better than you expected and 80286 is better than I expected


Quote:
Originally Posted by litwr View Post
I have changed nothing. Just be more careful.
Hmm... All of this is none too clear...


Quote:
Originally Posted by litwr View Post
It is not easy subject. Let's examine z80, it has 4-bit ALU, 8-bit registers which can be combined into 16-bit registers, 8- and 16-bit operations, ... Some z80 16-bit operations are very fast. The first ARMs have 26-bit address bus...
Add to that no current 64-bit cpu really has 64-bit address bus


Quote:
Originally Posted by litwr View Post
Sorry for my English. I can sometimes use words with a bit shifted sense. It would probably be better to write that 8086 is a typical example of 16-bit processor. Thank you for the clarifications.
English isn't native for me either so misunderstandings can unfortunately happen just too often...
Another reason for us to prefer code ?
meynaf is offline  
Old 11 September 2018, 18:57   #484
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 50
Posts: 972
Quote:
Originally Posted by meynaf View Post
Have you tried :
Code:
 subq.w #1,d0
 bcs case_0
 beq case_1
 subq.w #2,d0
 bcs case_2
 beq case_3
If you can use short branches, timing becomes something like 14/22/34/42 (28 on average).


Note : more fun on 68020.
Code:
jmp [switch_table,pc,d0.w*4]
Sorry Phil, but for only 4 cases your code is too long and slow.

Code:
  subq.w #1,d0
  bcs case_0
  beq case_1
  subq.w #2,d0
  bcs case_2
case_3
...
case_2
...
case_1
...
case_0
...
Don_Adan is offline  
Old 11 September 2018, 19:16   #485
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by Don_Adan View Post
Sorry Phil, but for only 4 cases your code is too long and slow.
The extra branch is for when data is out of range. If there is really only 4 possible cases then yes the last branch can be removed.
meynaf is offline  
Old 11 September 2018, 19:50   #486
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 50
Posts: 972
Quote:
Originally Posted by meynaf View Post
The extra branch is for when data is out of range. If there is really only 4 possible cases then yes the last branch can be removed.
Right.
For most cases next code is used as input:

moveq #0,d0
move.b (a0)+,d0

but next code can be used too

moveq #3,d0
and.b (a0)+,d0

Of course this is only example to remove extra branch check. Not for all cases can be used.
Don_Adan is offline  
Old 12 September 2018, 12:53   #487
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 456
Quote:
Originally Posted by NorthWay View Post
I think Mike at FPGAArcade has the info on the decapped 68000.
Talking to myself, but some groundwork has been done over at http://www.visual6502.org/images/pag...ola_68000.html
NorthWay is offline  
Old 13 September 2018, 09:00   #488
plasmab
Registered User

plasmab's Avatar
 
Join Date: Sep 2016
Location: Glasgow
Posts: 2,499
Quote:
Originally Posted by NorthWay View Post
Talking to myself, but some groundwork has been done over at http://www.visual6502.org/images/pag...ola_68000.html
I'm interested. Be very interested in understanding how you get the MC68000 into test mode for the firmware. I have all the gear setup to capture the output
plasmab is offline  
Old 13 September 2018, 09:52   #489
litwr
Registered User

 
Join Date: Mar 2016
Location: Ozherele
Posts: 74
Quote:
Originally Posted by grond View Post
If you really want data on binary sizes, have a look at old linux distributions that were available for m68k and i386. Just pick a good sample of packages that include mostly executable code and have a look at the file sizes. I'm pretty sure that m68k will have better code density than i386. Of course, the result will be skewed by the unknown effects of gcc/i386 vs. gcc/m68k. Presumably the latter didn't get as much love.
It is an interesting idea. However our main topic was about 8086 and 68000 code density. Linux is unavailable for them. To compare 68020/30 and 80386/486 code density is also quite interesting - I think somebody needs at least a day for this approach.

Quote:
Originally Posted by meynaf View Post
And it's true 68020 code is (slightly) smaller than 68000 code (while 80386 code is bigger than 8086). But the difference can't account for the *1.5 I regularly observed.
I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.

Quote:
Originally Posted by meynaf View Post
Add to that no current 64-bit cpu really has 64-bit address bus
I'm really curious when will computers have about 4 exabytes of memory? I can think about 50 years forward - too much to me to see that.
litwr is offline  
Old 13 September 2018, 11:31   #490
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 474
Assuming the 1nm transistor suggested by Berkley is a thing that can work in the future, all you'd need for 4 exabytes of DRAM memory is a 283 square cm die. Back to room sized computers it is
roondar is offline  
Old 13 September 2018, 13:01   #491
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.
Isn't that due to big mul / div routines that are single instruction on 386 ?
Anyhow, of course 8086 isn't good at 32-bit stuff


Quote:
Originally Posted by litwr View Post
I'm really curious when will computers have about 4 exabytes of memory? I can think about 50 years forward - too much to me to see that.
I doubt we ever will.
There are limits such as speed of electricity, ultimately speed of light, that could stop the exponential growth of computational power.
meynaf is offline  
Old 13 September 2018, 13:18   #492
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
I can't agree that 80386's code is larger than 8086's. For pi-spigot we have 629 bytes for 386 and 660 bytes for 86.
Hey, wait. Now that i think about it, >600 bytes seems large for just outputting 800 pi digits.
I had the algorithm noted somewhere so i tried recoding it for 68020 and got a 408 bytes executable (which can probably still be slightly reduced).
Is there something the program must do that i forgot ?
meynaf is offline  
Old 13 September 2018, 16:21   #493
litwr
Registered User

 
Join Date: Mar 2016
Location: Ozherele
Posts: 74
Quote:
Originally Posted by meynaf View Post
Hey, wait. Now that i think about it, >600 bytes seems large for just outputting 800 pi digits.
I had the algorithm noted somewhere so i tried recoding it for 68020 and got a 408 bytes executable (which can probably still be slightly reduced).
Is there something the program must do that i forgot ?
It is the fastest and allows to input number of digits interactively.
litwr is offline  
Old 13 September 2018, 16:32   #494
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
It is the fastest and allows to input number of digits interactively.
Ok. Let's see if i can still do a shorter version.
Using different speed tricks for different cpus will lead to code size that's not relevant for code density comparison.
meynaf is offline  
Old 13 September 2018, 20:20   #495
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Done it !
Executable rewritten for A1200, contains all features of the original (even speed ; could be shorter if i remove the mulu optim).
Size = 596 bytes. x86 beaten

But the added bulk is mostly AmigaOS calls. For this reason i don't see this program as significant in matter of code density as it depends as much on the host operating system as on the cpu.
Nor is it relevant for cpu benchmark, as most of the time is spent displaying digits rather than calculating them...
meynaf is offline  
Old 15 September 2018, 12:51   #496
litwr
Registered User

 
Join Date: Mar 2016
Location: Ozherele
Posts: 74
Quote:
Originally Posted by meynaf View Post
Are SI, DI, BP true GPR ? Nope. You can't byte access them, for example. In some way, they are... address registers
It is 100% wrong because these registers can be used at any place where AX, BX, ... are allowed. AX, BX, CX, DX - may be used as 8 8-bit registers and this gives us 12 registers for x86. If we have to work with bytes x86 gives us the same number of data registers as 68000 plus 4 registers which can be used for address or data.

Quote:
Originally Posted by meynaf View Post
Done it !
Executable rewritten for A1200, contains all features of the original (even speed ; could be shorter if i remove the mulu optim).
Size = 596 bytes. x86 beaten

But the added bulk is mostly AmigaOS calls. For this reason i don't see this program as significant in matter of code density as it depends as much on the host operating system as on the cpu.
Nor is it relevant for cpu benchmark, as most of the time is spent displaying digits rather than calculating them...
Why have you hidden this program? Let's check it.

BTW the part about the first 32-bit CPU is added to the article -https://litwr.livejournal.com/1970.html
litwr is offline  
Old 15 September 2018, 13:56   #497
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
It is 100% wrong because these registers can be used at any place where AX, BX, ... are allowed. AX, BX, CX, DX - may be used as 8 8-bit registers and this gives us 12 registers for x86.
Nope it's not wrong.
Only SI can read with auto-increment.
Only DI can write with auto-increment.
Only BP can be used as frame pointer.
Only DX can be used as loop counter with the loop instruction.
Only CL can be used for variable shifting.
There are numerous examples.
x86 registers are definitely not GPR.


Quote:
Originally Posted by litwr View Post
If we have to work with bytes x86 gives us the same number of data registers as 68000 plus 4 registers which can be used for address or data.
Not really.
If we have to work with words the 68k can use the SWAP instruction and we can get up to 16 data. With extra instructions to access register parts, we could have access to up to 32 bytes.
And it's not "plus 4 registers". Only 3, as SP can't be used for data.


It seems there is a lot of misunderstanding about the D/A split of 68k.
Many people see it as a shortcoming. But it's not. Not at all. The price to pay is very small for what we get.

On cpus with 16-bit opcode we have 3-bit register encoding. This is necessary if you don't want to see the instruction set severely trimmed.
On 68k you have the base encoding which is 4-bit opcode, 3-bit register, 1-bit mode, 2-bit size, 6-bit ea. If we want to use 4-bit register, that's +2 bits here (ea becomes 7-bit). It simply doesn't fit.

On ARM (with Thumb) 16-bit opcodes can only touch R0-R7.
On x86-64 you can not use R8-R15 without a REX prefix.
But on 68k you can use all 16 registers without a size penalty.

Try to multiply, divide, shift a pointer type with a C compiler. It will be rejected, and with good reason. Do people grumble that these operations are not allowed ? It's nonsense.

As addresses are 32-bit, any 16-bit use is automatically extended.
In addition, the resulting value is supposed to be pointer, not data, and therefore the CCR remains untouched.

It's true sometimes we're out of data regs and start using address regs for data. But then we have more features, not less : automatic sign-extend and operations that don't alter the flags.
Consider simple ADDA.W (A0,D0.W),A0. You can't do that on x86. You have to read the value, extend it, then add it. Of course x86's pointer arithmetic kills the flags and that's really bad.

The only problem is that the use of An is too limited in comparison to what the encoding would allow - for the cases we indeed lack some Dn. However this could have been fixed.

But in any case, don't judge without knowing.


Quote:
Originally Posted by litwr View Post
Why have you hidden this program? Let's check it.
I don't like to keep attachments in my posts, but I can put it in the zone here, if you have access to it.
meynaf is offline  
Old 15 September 2018, 14:57   #498
litwr
Registered User

 
Join Date: Mar 2016
Location: Ozherele
Posts: 74
Quote:
Originally Posted by meynaf View Post
Only SI can read with auto-increment.
Only DI can write with auto-increment.
Only DX can be used as loop counter with the loop instruction.
Are you ok? You have written something incorrect... Only DX can be used as an address to port address area. It is just about of absence of not-important orthogonality. Look at GPR definition on Wikipedia - "General-purpose registers (GPRs) can store both data and addresses" - it's all.


Quote:
Originally Posted by meynaf View Post
If we have to work with words the 68k can use the SWAP instruction and we can get up to 16 data. With extra instructions to access register parts, we could have access to up to 32 bytes.
You can also exchange a register content with a value at a memory address. It is just a SWAP variant.

Quote:
Originally Posted by meynaf View Post
I don't like to keep attachments in my posts, but I can put it in the zone here, if you have access to it.
I don't know about zones - please direct me there. Otherwise I can say that you are bluffing.

EDIT. I agree that the 68k's address registers can be considered as limited data registers too.
litwr is offline  
Old 15 September 2018, 15:24   #499
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Quote:
Originally Posted by litwr View Post
Are you ok?
Yes, and you ?


Quote:
Originally Posted by litwr View Post
You have written something incorrect...
Wasn't that you, rather ?
What I wrote is correct. The uses of SI, DI, DX i have written are correct and the fact i didn't mention DX as port number doesn't change that.


Quote:
Originally Posted by litwr View Post
Only DX can be used as an address to port address area. It is just about of absence of not-important orthogonality. Look at GPR definition on Wikipedia - "General-purpose registers (GPRs) can store both data and addresses" - it's all.
So if it's written on Wikipedia then it's necessarily true and complete ?
Anyway, according to your definition, 68k's registers are GPRs because they really can store both data and addresses (and the rest is "just about of absence of not-important orthogonality" ).

But strange case for DX indeed, even though it's not exactly a memory address it needs to store (it's a port number - and remains 16-bit even in 32-bit mode).

Also forgot to mention AX is the only register usable for a number of things.


Quote:
Originally Posted by litwr View Post
You can also exchange a register content with a value at a memory address. It is just a SWAP variant.
Memory accesses are usually slower. Especially an exchange which needs two memory accesses.
Of course it makes the code bigger too


Quote:
Originally Posted by litwr View Post
I don't know about zones - please direct me there. Otherwise I can say that you are bluffing.
Look at the eab page. Between "Donate" and "Log Out" there is a link called "The Zone".
To access it :
http://eab.abime.net/faq.php?faq=vb_...ezone_faq_item

Tell me when you're ready. I'll upload it then.
meynaf is offline  
Old 20 September 2018, 15:27   #500
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,137
Not eager to get it, huh ? I won't wait any longer and have made the upload anyway. Other ppl might be interested.
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Any software to see technical OS details? necronom support.Other 3 02 April 2016 12:05
2-star rarity details? stet HOL suggestions and feedback 0 14 December 2015 05:24
EAB's FTP details... Basquemactee1 project.EAB File Server 2 30 October 2013 22:54
req details for sdl turrican3 request.Other 0 20 April 2008 22:06
Forum Details BippyM request.Other 0 15 May 2006 00:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:45.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.14976 seconds with 16 queries