English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 20 August 2017, 06:09   #141
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Photon View Post
Intel will win because it still supports special-purpose 8-bit CPU instructions that do more than other 8-bit CPUs. 16-bit is fluffier with the exception of mul/div and 32-bit Risc are the worst. Even with Thumb they can't quite get there. There are 64-bit etc CPUs too, ofc ;-)
Intel will lose because most of those "special purpose" instructions are not common enough to have an 8 bit encoding (see Vince Weaver's 8086 code). Where the 8086 is good at code density is fairly specific being tiny executables, byte size instructions used often, common instructions using inferred ops are used often and few registers are used. The 8086 code is for DOS which has minimal executable overhead and disqualifies it as the smallest Linux executable leaving the 68k as having the smallest Linux executable of 31 architectures. I have some changes pending which should drop the 68k total executable size by another 10-20 bytes.

Quote:
Originally Posted by NorthWay View Post
Back to that peculiar obsession with the size of the LZSS decompression loop. I sat down and tinkered with how I would do it natively if size was all I cared about and I could arrange data as I like. 34 bytes:
The only problem is that we failed to get your LZSS code working. I could not test 68k Linux but submitted your previous code. Vince Weaver could not get it working though. Maybe you could include the initialization code needed? Perhaps you could dl the 68k code, insert your routine and submit the changes?

Last edited by matthey; 20 August 2017 at 06:27.
matthey is offline  
Old 20 August 2017, 11:18   #142
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by NorthWay View Post
I know, but I said it was about the loop itself. That was the only thing that was counted for some reason, and so I cut out all init etc.
If I cared about a more realistic total code size then I would arrange it differently. Chances are I would care about speed too.

And if you are willing to drop in-buffer overwrite de-compression then you can separate literals and control/length+distance bits and read out the control bits("get_bits") 16 at a time.
Sorry NorthWay, i've not read the past posts, not really a criticism by me.
So I need the rules to take an attempt:
- pure 68k or 020+ allowed?
- only loop or the consts used need to be defined?
- the bit stream like the original LZSS or some better for 68k (mantaining the exact ratio and rules)?
- decompression in-place required?

Cheers,
ross
ross is offline  
Old 20 August 2017, 16:18   #143
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by ross View Post
So I need the rules to take an attempt:
- pure 68k or 020+ allowed?
020 instructions and addressing modes are allowed.

Quote:
Originally Posted by ross View Post
- only loop or the consts used need to be defined?
Vince Weaver needs to know the initialization code/consts so they should be included but separating may be helpful as the initialization code does not count for the LZSS code size.

Quote:
Originally Posted by ross View Post
- the bit stream like the original LZSS or some better for 68k (mantaining the exact ratio and rules)?

- decompression in-place required?
The decompression code needs to remain LZSS and use the existing static data. The exact rules beyond that would be a good question for Vince to answer. His e-mail and the 68k assembly code are at the following links.

http://deater.net/weave/vmwprod/asm/ll/ll.html
http://deater.net/weave/vmwprod/asm/ll/ll.m68k.s

It sometimes takes a while for him to answer e-mails but he has been responsive to my e-mails so far. As already mentioned, I recently submitted more clean up suggestions for the 68k total executable size but there were no changes for the LZSS code.
matthey is offline  
Old 20 August 2017, 19:07   #144
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by matthey View Post
The only problem is that we failed to get your LZSS code working. I could not test 68k Linux but submitted your previous code. Vince Weaver could not get it working though. Maybe you could include the initialization code needed? Perhaps you could dl the 68k code, insert your routine and submit the changes?
Hi matthey, NorthWay code cannot works with Okumura LZSS bitstream.
It's based on some 68k specificity (bit flag reversed for roxr trick, direct negate offset,..).
The SAME decode algorithm can be made much smaller if only you could shuffle the bits (that is more x86 friendly...).
And the fact that in any case the code is the smallest one says a lot about the quality of ISA.

Regards,
ross
ross is offline  
Old 20 August 2017, 19:38   #145
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
mmh, walking through the sources..

Code:
         WARNING: order of match_position and match_lenght changed!
         see lines 178 to 182
         Mofication by <stephan.walter@gmx.ch>
         Also modified to have N,F,etc, etc to be parameters, not
         hard-coded  -- vmw
Code:
#define N 1024
#define F 64
#define THRESHOLD 2
#define P_BITS 10
#define POSITION_MASK 3
It's no more [LZSS] Okumura bitstream.. (N=2^12, F=2^4[+2])
So what's the point? Accomodate for a personal test and a personal result for a preferred architecture?
It does not seem very scientific..

Last edited by ross; 20 August 2017 at 20:21. Reason: []
ross is offline  
Old 20 August 2017, 20:01   #146
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Oh, my last one was absolutely not compatible with the original rules, it was just to point out that the original rules were ...sub-optimal and not how you'd do it if you had 68K in mind.

I'll take a round a see if I can get my version of the original working.
NorthWay is offline  
Old 20 August 2017, 20:20   #147
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by NorthWay View Post
Oh, my last one was absolutely not compatible with the original rules, it was just to point out that the original rules were ...sub-optimal and not how you'd do it if you had 68K in mind.
With 68k in mind you can make some very tight code.
See my a few months ago (32b with no init consts):
http://eab.abime.net/showpost.php?p=...&postcount=480
It's more effective than LZSS.

Cheers!
ross
ross is offline  
Old 20 August 2017, 22:01   #148
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Question:

Quote:
Originally Posted by meynaf View Post
...

It's about comparison of various cpu families, but, this time, not at all about performance benchmarks - rather, code size (both number of bytes and number of instructions).

This is to make comparisons merely for academic purposes

...

So it's all about writing/showing real life code samples of significant size (i'd say 20-40 instructions should be enough), and write that for as many cpus as possible.
Answer:

Quote:
Originally Posted by Photon View Post
Intel will win because it still supports special-purpose 8-bit CPU instructions that do more than other 8-bit CPUs. 16-bit is fluffier with the exception of mul/div and 32-bit Risc are the worst. Even with Thumb they can't quite get there. There are 64-bit etc CPUs too, ofc ;-)
Counter:

Quote:
Originally Posted by matthey View Post
Intel will lose because most of those "special purpose" instructions are not common enough to have an 8 bit encoding (see Vince Weaver's 8086 code). Where the 8086 is good at code density is fairly specific being tiny executables, byte size instructions used often, common instructions using inferred ops are used often and few registers are used.

...
Maybe for examples as short as this. But addressing the academic purpose of the question, examples should be as general as possible. Math functions, parsing/conversion, sorts, decompression algorithms are good examples. For those, 8-bit instructions are used a lot.

I can't answer why a specific coder's contest result is shorter (yet?), but common things like filling/copying n bytes in a single instruction, LUT, loop, simple ALU, RET vs RTS are all shorter on Intel CPUs since and after the 8086.

Obviously, OP is looking for an equivalent excerpt (let's say, single function), don't know what (BI)OS, frameworks etc has to do with it. Surely it must be standalone to make any comparison?
Photon is offline  
Old 20 August 2017, 23:12   #149
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
The code density situation for 68k vs x86 is quite simple in fact. x86 is good only on very small code samples ; the bigger, the worst it becomes. 68k is more or less constant.
Programs that are just 100 bytes or less in size aren't very relevant. Why not something like 1MB ? It would be 1MB on 68k but 1.5 MB on x86, in spite x86 has better compilers. I have one example of this.
meynaf is offline  
Old 21 August 2017, 00:55   #150
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by ross View Post
It's no more [LZSS] Okumura bitstream.. (N=2^12, F=2^4[+2])
So what's the point? Accomodate for a personal test and a personal result for a preferred architecture?
It does not seem very scientific..
The code was originally used as a contest to optimize for the x86. Yes, it was modified to run better on the x86. Yes, the big endian code has to do an endian swap for little endian static data. Yes, I complained about these things to Vince Weaver. It is difficult to change them now even though the results are less than scientific and perhaps not even a good study. I originally asked Vince to take down his web site as a source of misinformation. My next best option was to at least get the 68k close to where it should be instead of middle of the pack for code density. Maybe the results are half way meaningful for the architectures which are well optimized at least. I recently asked him to include more data like the following.

number of instructions
code size
average instruction length
number of memory/cache accesses
number of branches

I did a comparison of the SuperH SH-3 to the 68k using Vince's code and found the SH-3 has about 50% more instructions, 40% more cache accesses, 40% more branches but only about 15% worse code density. CPU designs can overcome many obstacles but it is difficult to imagine a fast SH-3 core if these stats were normal. Maybe Vince's code could be good enough to be a starting point for ISA comparison.


Quote:
Originally Posted by Photon View Post
Maybe for examples as short as this. But addressing the academic purpose of the question, examples should be as general as possible. Math functions, parsing/conversion, sorts, decompression algorithms are good examples. For those, 8-bit instructions are used a lot.
Many programmers fall for the trap of thinking an 8 bit encoding will give superior code density. There is very limited space for even a register or immediate specification and there are only so many useful instructions with practically no data. The x86 ISA is good at utilizing instructions with inferred ops but this is bad for orthogonality and ends up being less general purpose. Several of the x86 instructions were replaced when moving to x86_64 for this reason (6/256 8 bit encodings were used for BCD).

Quote:
Originally Posted by meynaf View Post
The code density situation for 68k vs x86 is quite simple in fact. x86 is good only on very small code samples ; the bigger, the worst it becomes. 68k is more or less constant. Programs that are just 100 bytes or less in size aren't very relevant.
I agree. The 68020 ISA code density degrades some at roughly 100kB executable size but it is not as bad as x86. Many RISC processors are bad about code density degrading with larger executable sizes too.

Last edited by matthey; 21 August 2017 at 02:15.
matthey is offline  
Old 21 August 2017, 02:46   #151
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
And I just realized that you can keep that peculiar data format of the original compression example, but ditch the whole code structure and style it to be similar to my idealized version and make the loop 38(?) bytes (init not included). You don't need that extra 1K buffer.
NorthWay is offline  
Old 21 August 2017, 03:00   #152
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by NorthWay View Post
And I just realized that you can keep that peculiar data format of the original compression example, but ditch the whole code structure and style it to be similar to my idealized version and make the loop 38(?) bytes (init not included). You don't need that extra 1K buffer.
I e-mailed Vince Weaver the link to your post #138 on Saturday. It makes sense to have a more practical algorithm but it would be a lot of work to change it now as all architectures would need changing. It is not my decision either.
matthey is offline  
Old 21 August 2017, 13:44   #153
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
There is two [potential] bugs in decompression loop:
[EDIT: potential because all buffer are enlarged enaught]

Code:
decompression_loop:
	move.q	#7,%d7		| load a counter
	move.b	%a3@+,%d5	| load a byte, increment pointer

test_flags:
	cmp.l	%a4,%a3		| have we reached the end?
	bge.b	done_logo  	| if so, exit
The end check need to be done *before* byte flags read, sure there is a fetch out of buffer.


Code:
	lea	%pc@(logo),%a3		| a3 points to logo data
	lea	%pc@(logo_end),%a4	| a4 points to logo end
For the check compressed stream bounds is used: this case is true only for a full 8bits flags last byte.
You should use *decompression buffer* bound or a token in compressed stream or lose some compression but make a right compressed stream..


[ And an amusing typos :
Code:
|  There is an alternate morotolla syntax that gas can also handle
]

Cheers,
ross

Last edited by ross; 21 August 2017 at 17:17. Reason: []
ross is offline  
Old 21 August 2017, 14:25   #154
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
What's with the odd syntax in the above post
Thorham is offline  
Old 21 August 2017, 15:09   #155
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Thorham View Post
What's with the odd syntax in the above post
The GAS syntax, a totally unreadable one.
In fact I do a lot of effort to read the code..

Two more potential bugs:
Code:
|	clr.l   %d4		| (unnecessary?)
	move.w	%a3@+,%d4	| load 16-bits, increment pointer
	ror.w	#8,%d4		| unfair big-endian penalty

	move.l	%d4,%d6		| copy d4 to d6
				| no need to mask d6, as we do it
				| by default in output_loop

	lsr.l	%d0,%d4		| unsigned shift right by P_BITS
	addq.l	#(THRESHOLD+1),%d4

	add.w	%d4,%d1
With lsr.l the code works only by chance, need to be .w (note the commented out clr..).
[EDIT: the second is not a bug, only a contortion in code that explain the +1 added to d4 ]

Cheers,
ross

Last edited by ross; 21 August 2017 at 22:23. Reason: more polite :)
ross is offline  
Old 21 August 2017, 22:00   #156
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Hi.

Attached a 54 byte version (with the potential bugs corrected).
Like NorthWay said the real turning would be to avoid using the 1k buffer.
The stream format is really unfriendly.

Regards,
ross
Attached Files
File Type: s ll_ross.s (3.2 KB, 106 views)
ross is offline  
Old 23 August 2017, 00:37   #157
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by ross View Post
Attached a 54 byte version (with the potential bugs corrected).
Like NorthWay said the real turning would be to avoid using the 1k buffer.
The stream format is really unfriendly.
Thanks. I hope Vince will be able to use your suggestions and code. He wouldn't have people trying to rewrite the inefficient decompression code if it had been better to begin with (some 6502 guys also tried to rewrite the decompression code). I haven't received a response from Vince as of yet about the latest changes. He is busy sometimes.
matthey is offline  
Old 24 August 2017, 10:06   #158
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by ross View Post
The GAS syntax, a totally unreadable one.
Very strange. Why did they think that that was a good idea? Should be pretty easy to clean up with search/replace at least (perhaps with regex).
Thorham is offline  
Old 24 August 2017, 11:08   #159
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
While quite programmer unfriendly, the AT&T syntax (used by gcc and co) isn't worse than Intel's asm syntax
meynaf is offline  
Old 24 August 2017, 19:45   #160
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by meynaf View Post
While quite programmer unfriendly, the AT&T syntax (used by gcc and co) isn't worse than Intel's asm syntax
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Generated code and CPU Instruction Cache Mrs Beanbag Coders. Asm / Hardware 11 23 May 2014 11:05
EAB Christmas Song-writing Contest mr_a500 project.EAB 64 24 May 2009 02:44
AmigaSYS Wallpaper Contest Calo Nord News 10 22 April 2005 09:33
Landover's Amiga Arcade Conversion Contest Frog News 1 28 January 2005 23:41
Battlechess Contest (EAB vs A500) Bloodwych Nostalgia & memories 67 14 August 2003 14:37

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 10:03.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.27915 seconds with 16 queries