20 August 2017, 06:09 | #141 | |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
The only problem is that we failed to get your LZSS code working. I could not test 68k Linux but submitted your previous code. Vince Weaver could not get it working though. Maybe you could include the initialization code needed? Perhaps you could dl the 68k code, insert your routine and submit the changes? Last edited by matthey; 20 August 2017 at 06:27. |
|
20 August 2017, 11:18 | #142 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
So I need the rules to take an attempt: - pure 68k or 020+ allowed? - only loop or the consts used need to be defined? - the bit stream like the original LZSS or some better for 68k (mantaining the exact ratio and rules)? - decompression in-place required? Cheers, ross |
|
20 August 2017, 16:18 | #143 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Vince Weaver needs to know the initialization code/consts so they should be included but separating may be helpful as the initialization code does not count for the LZSS code size. Quote:
http://deater.net/weave/vmwprod/asm/ll/ll.html http://deater.net/weave/vmwprod/asm/ll/ll.m68k.s It sometimes takes a while for him to answer e-mails but he has been responsive to my e-mails so far. As already mentioned, I recently submitted more clean up suggestions for the 68k total executable size but there were no changes for the LZSS code. |
||
20 August 2017, 19:07 | #144 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
It's based on some 68k specificity (bit flag reversed for roxr trick, direct negate offset,..). The SAME decode algorithm can be made much smaller if only you could shuffle the bits (that is more x86 friendly...). And the fact that in any case the code is the smallest one says a lot about the quality of ISA. Regards, ross |
|
20 August 2017, 19:38 | #145 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
mmh, walking through the sources..
Code:
WARNING: order of match_position and match_lenght changed! see lines 178 to 182 Mofication by <stephan.walter@gmx.ch> Also modified to have N,F,etc, etc to be parameters, not hard-coded -- vmw Code:
#define N 1024 #define F 64 #define THRESHOLD 2 #define P_BITS 10 #define POSITION_MASK 3 So what's the point? Accomodate for a personal test and a personal result for a preferred architecture? It does not seem very scientific.. Last edited by ross; 20 August 2017 at 20:21. Reason: [] |
20 August 2017, 20:01 | #146 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Oh, my last one was absolutely not compatible with the original rules, it was just to point out that the original rules were ...sub-optimal and not how you'd do it if you had 68K in mind.
I'll take a round a see if I can get my version of the original working. |
20 August 2017, 20:20 | #147 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
See my a few months ago (32b with no init consts): http://eab.abime.net/showpost.php?p=...&postcount=480 It's more effective than LZSS. Cheers! ross |
|
20 August 2017, 22:01 | #148 | |||
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Question:
Quote:
Quote:
Quote:
I can't answer why a specific coder's contest result is shorter (yet?), but common things like filling/copying n bytes in a single instruction, LUT, loop, simple ALU, RET vs RTS are all shorter on Intel CPUs since and after the 8086. Obviously, OP is looking for an equivalent excerpt (let's say, single function), don't know what (BI)OS, frameworks etc has to do with it. Surely it must be standalone to make any comparison? |
|||
20 August 2017, 23:12 | #149 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
The code density situation for 68k vs x86 is quite simple in fact. x86 is good only on very small code samples ; the bigger, the worst it becomes. 68k is more or less constant.
Programs that are just 100 bytes or less in size aren't very relevant. Why not something like 1MB ? It would be 1MB on 68k but 1.5 MB on x86, in spite x86 has better compilers. I have one example of this. |
21 August 2017, 00:55 | #150 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
number of instructions code size average instruction length number of memory/cache accesses number of branches I did a comparison of the SuperH SH-3 to the 68k using Vince's code and found the SH-3 has about 50% more instructions, 40% more cache accesses, 40% more branches but only about 15% worse code density. CPU designs can overcome many obstacles but it is difficult to imagine a fast SH-3 core if these stats were normal. Maybe Vince's code could be good enough to be a starting point for ISA comparison. Quote:
I agree. The 68020 ISA code density degrades some at roughly 100kB executable size but it is not as bad as x86. Many RISC processors are bad about code density degrading with larger executable sizes too. Last edited by matthey; 21 August 2017 at 02:15. |
||
21 August 2017, 02:46 | #151 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
And I just realized that you can keep that peculiar data format of the original compression example, but ditch the whole code structure and style it to be similar to my idealized version and make the loop 38(?) bytes (init not included). You don't need that extra 1K buffer.
|
21 August 2017, 03:00 | #152 | |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
|
|
21 August 2017, 13:44 | #153 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
There is two [potential] bugs in decompression loop:
[EDIT: potential because all buffer are enlarged enaught] Code:
decompression_loop: move.q #7,%d7 | load a counter move.b %a3@+,%d5 | load a byte, increment pointer test_flags: cmp.l %a4,%a3 | have we reached the end? bge.b done_logo | if so, exit Code:
lea %pc@(logo),%a3 | a3 points to logo data lea %pc@(logo_end),%a4 | a4 points to logo end You should use *decompression buffer* bound or a token in compressed stream or lose some compression but make a right compressed stream.. [ And an amusing typos : Code:
| There is an alternate morotolla syntax that gas can also handle Cheers, ross Last edited by ross; 21 August 2017 at 17:17. Reason: [] |
21 August 2017, 14:25 | #154 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
What's with the odd syntax in the above post
|
21 August 2017, 15:09 | #155 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
The GAS syntax, a totally unreadable one.
In fact I do a lot of effort to read the code.. Two more potential bugs: Code:
| clr.l %d4 | (unnecessary?) move.w %a3@+,%d4 | load 16-bits, increment pointer ror.w #8,%d4 | unfair big-endian penalty move.l %d4,%d6 | copy d4 to d6 | no need to mask d6, as we do it | by default in output_loop lsr.l %d0,%d4 | unsigned shift right by P_BITS addq.l #(THRESHOLD+1),%d4 add.w %d4,%d1 [EDIT: the second is not a bug, only a contortion in code that explain the +1 added to d4 ] Cheers, ross Last edited by ross; 21 August 2017 at 22:23. Reason: more polite :) |
21 August 2017, 22:00 | #156 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Hi.
Attached a 54 byte version (with the potential bugs corrected). Like NorthWay said the real turning would be to avoid using the 1k buffer. The stream format is really unfriendly. Regards, ross |
23 August 2017, 00:37 | #157 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Thanks. I hope Vince will be able to use your suggestions and code. He wouldn't have people trying to rewrite the inefficient decompression code if it had been better to begin with (some 6502 guys also tried to rewrite the decompression code). I haven't received a response from Vince as of yet about the latest changes. He is busy sometimes.
|
24 August 2017, 10:06 | #158 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
|
24 August 2017, 11:08 | #159 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
While quite programmer unfriendly, the AT&T syntax (used by gcc and co) isn't worse than Intel's asm syntax
|
24 August 2017, 19:45 | #160 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generated code and CPU Instruction Cache | Mrs Beanbag | Coders. Asm / Hardware | 11 | 23 May 2014 11:05 |
EAB Christmas Song-writing Contest | mr_a500 | project.EAB | 64 | 24 May 2009 02:44 |
AmigaSYS Wallpaper Contest | Calo Nord | News | 10 | 22 April 2005 09:33 |
Landover's Amiga Arcade Conversion Contest | Frog | News | 1 | 28 January 2005 23:41 |
Battlechess Contest (EAB vs A500) | Bloodwych | Nostalgia & memories | 67 | 14 August 2003 14:37 |
|
|