10 February 2017, 10:26 | #81 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
|
10 February 2017, 10:47 | #82 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
|
10 February 2017, 10:58 | #83 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 854
|
I have looked a little at LZ code a few times, and I'd have to say that this is indeed a bit unconventional.
The bit ordering seems reversed from what I find natural, and having to treat the data stream as LE is close to a bug IMO. A minimalistic and more conventional approach would be to have a subroutine that returns 1 bit of result to you from the bitstream. The subroutine (can't remember how) is self-detecting when the register is empty and fetches the next byte(or you can use word/long I guess). You then have a subroutine (or inlined) that calls this N times when you need to fetch N bits. Or if the number of bits for non-literals is exactly 16 then you can group all control bits in separate bytes as this seems to do. And do you need to keep that A5 array? Wouldn't you reference A4 with negative offsets? (If I read the code right.) I think I made something LZ-like in around 50 bytes on a 6510. |
10 February 2017, 11:39 | #84 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,849
|
That doesn't look like a very good implementation.
|
10 February 2017, 13:46 | #85 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Regardless if this guy did a good job or not, what he did is the kind of comparative i wanted to build here.
|
10 February 2017, 15:27 | #86 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 854
|
This is more like what you would do on 68K IMO. 52 bytes inner loop.
That LE idiocy would obviously also be amended so it would be shorter still. Code:
_start: movea.l #(lab_1810),a6 movea.l a6,a1 move.l #$3c0,d2 movea.l #(lab_db),a3 movea.l #(lab_1f6),a4 movea.l #(lab_3d0),a5 clr.l d4 ; necessary? move.w #$3ff,d3 clr.l d1 moveq.l #10,d7 moveq.l #$1,d5 bra _entry lzss_begin: decompression_loop: string_copy: move.w (a3)+,d6 move.w d6,d1 ror.w #8,d6 lsr.w d7,d1 addq.w #3-1,d1 output_loop: and.w d3,d6 move.b (a5,d6.l),d4 addq.w #1,d6 store_byte: move.b d4,(a1)+ move.b d4,(a5,d2.l) addq.w #1,d2 and.w d3,d2 dbra d1,output_loop _entry: cmpa.l a4,a3 bge done_logo get_bit lsr.b #1,d5 bne test_flags get_bits: move.b (a3)+,d5 roxr.b #1,d5 test_flags: bcc string_copy discrete_char: move.b (a3)+,d4 clr.l d1 bra store_byte lzss_end: done_logo: Last edited by NorthWay; 11 February 2017 at 07:04. Reason: 1 opcode less, 2 bugs added and removed |
10 February 2017, 17:22 | #87 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Thanks guys. Yea, not much to start with. Bad research methodology and reporting. Poor programming skills. This is sad to see from a PhD in Electrical and Computer Engineering. I sent this Vince Weaver a (2nd) e-mail about this thread. The first one I asked him to take down his misinformation/disinformation several years ago. Maybe he will accept some ideas here and make the 68k look good instead of bad .
|
10 February 2017, 18:05 | #88 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,849
|
|
10 February 2017, 18:23 | #89 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
|
10 February 2017, 18:33 | #90 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,849
|
That sounds a like a bad idea. Especially considering that compilers don't generate code for neatness.
There is if he doesn't have much assembly language experience in general. I certainly remember the code I wrote in the beginning. The amount I've improved is ridiculous. Comes from hanging around here with you guys |
10 February 2017, 18:35 | #91 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 854
|
If you could change the way the code is built it should be possible to make it smaller, but the next optimization that I believe (without having seen the C source) you can do and stay within the spirit is to increase the buffersizes to 64K, and drop the two AND opcodes in the loop.
If you limit the compressed data size to (worst case) 2G you can convert from DBcc to subq.l/bpl.b and save one opcode in "discrete_char" by not touching d1. Ideally "discrete_char" would just do Code:
move.b (a3)+,(a1)+ bra _entry Code:
move.w (a3)+,d6 moveq.l #$3f,d1 ; #$3f (I think. split in 6+10 bits?) and.w d6,d1 asr.l #6,d6 ; init d6 to #$ffffffff addq.w #3-1,d1 lea (a1,d6.w),a2 copy move.b (a2)+,(a1)+ dbra d1,copy Last edited by NorthWay; 02 March 2017 at 10:24. Reason: multipost rule + bugfix + better scheduled |
11 February 2017, 12:03 | #92 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Perhaps now it is time for a new exercise. It's either choice of 1 in n, or bit shuffling, depending on the approach you choose. You have a register containing any possible value and want to set it to 1,2,3,4,5,6,7 depending on the value's position in the list 0, -1, 1, -2, 2, -3, 3. Input value is a longword and must be equal to one item in the list, if not, we branch to some error label (and then the return value is not important). Output value can be just a byte if it makes things easier. The value can be in the same register at the end, or in another (in that case, original value can be modified). Don't ask me what this code could be for, it's just an example. 020+ code is allowed - in fact, any existing cpu's code is allowed. Speed of code is irrelevant, only size matters. Data counts the same as code. Phew. I hope i didn't forget some detail this time |
|
11 February 2017, 15:30 | #93 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
On ARM32 you could do that in 8 bytes.
|
11 February 2017, 18:16 | #94 |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
OK, great! Code for any existing CPU is allowed, so...?
Just for the hell of it, here's what it would look like on ZPU: Code:
loadsp 0 ; Assuming the operand is at the top of stack addsp 0 loadsp 4 im 2 ashiftright xor im 1 add loadsp 0 im 0xfffffff8 and im .ok eqbranch ; error code here... .ok: ; success code here... Last edited by robinsonb5; 11 February 2017 at 19:19. |
11 February 2017, 19:49 | #95 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
OK, I messed up. It's 8 bytes for the inverse but 12 bytes for the correct conversion:
LSLS R0,R0,#1 CCADD R0,R0,#1 CSRSB R0,R0,#0 |
11 February 2017, 19:56 | #96 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Remember, the code has to detect invalid values and branch to some error label in that case. I don't know these instructions but obviously they don't do that.
|
12 February 2017, 13:22 | #97 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
Then add a "CMP R0,#7“ at the end which makes it 16 bytes. ARM32 has predication and thus doesn't have to branch.
|
12 February 2017, 17:11 | #98 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Zero is an invalid value at the end (range is 1-7) and this cmp won't catch it.
Even ARM32 sometimes has to branch, e.g. when error code is very different to normal code. And anyway it was explicitly required from start. |
12 February 2017, 18:52 | #99 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
The result can't be zero, thus, the CMP is perfectly enough. And now you are interpreting rules to favor 68k. ARM32 can have error-exit and normal exit in the same place, that's what predication is for. My code has an entire if-then-else without any branches. Ironically, ARM32 has some pretty good code-density here due to predication when history showed that predication wasn't really worth spending four bits in each instruction.
|
12 February 2017, 19:34 | #100 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
You're wrong, result can be zero. Try with input=$80000000 (LSLS gives R0=0 and a carry, CCADD isn't executed, CCRSB subs zero from zero). Yeah, i checked what these instructions do.
I'm not interpreting rules to favor 68k. I was very clear at start so there it's clearly you who attempt to not respect them. I wrote about incorrect values being rejected, you don't do it (at least not properly). I wrote about branching somewhere, you don't do it and write quibbles instead. And now you dare to charge me of interpreting the rules ??? Quote:
I wrote that the code must branch, so it must branch, ok ? Ironically, i could do it in 12 bytes on 68k. ARM32 requires twice that amount (your 3 instructions, two cmp to test out of range, one branch). So much for code density. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generated code and CPU Instruction Cache | Mrs Beanbag | Coders. Asm / Hardware | 11 | 23 May 2014 11:05 |
EAB Christmas Song-writing Contest | mr_a500 | project.EAB | 64 | 24 May 2009 02:44 |
AmigaSYS Wallpaper Contest | Calo Nord | News | 10 | 22 April 2005 09:33 |
Landover's Amiga Arcade Conversion Contest | Frog | News | 1 | 28 January 2005 23:41 |
Battlechess Contest (EAB vs A500) | Bloodwych | Nostalgia & memories | 67 | 14 August 2003 14:37 |
|
|