06 February 2010, 19:49 | #1 |
Registered User
Join Date: Sep 2004
Location: Brasil
Age: 49
Posts: 181
|
aPLib decruncher for Amiga
Hi!,
I am trying to refresh my rusty 68000, because that i have convert the aPLib decruncher to our lovely Motorola Here is the source and of course all suggestion to optimize it are welcome Code:
; ------------------------------------------------------------------------------------------------- ; Aplib decruncher for MC68000 ; by MML 2010 ; ------------------------------------------------------------------------------------------------- ; vasmm68k aplib.s68 -o aplib.obj -Fhunk -m68000 ; ------------------------------------------------------------------------------------------------- ; MACROS ; ------------------------------------------------------------------------------------------------- ; GET_BIT: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag. GET_BIT MACRO subq.b #1,d5 ; D5 = bit counter beq.s .need_more_bits\@ add.b d3,d3 ; D3.b << 1 (lsl.b #1,d3) bra.s .still_bits_left\@ .need_more_bits\@ moveq #8,d5 move.b (a0)+,d3 ; Read next crunched byte add.b d3,d3 ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3) .still_bits_left\@ ENDM ; DECODE_GAMMA: Decode values from the crunched data using gamma code DECODE_GAMMA MACRO moveq #1,\1 .get_more_gamma\@ GET_BIT roxl.l #1,\1 GET_BIT bcs.s .get_more_gamma\@ ENDM ; DO_COPY: Copy length (D2) bytes from destination (A1) - offset (D0) to destination DO_COPY MACRO move.l a1,a2 suba.l d0,a2 .loop_do_copy\@ move.b (a2)+,(a1)+ dbf d2,.loop_do_copy\@ ENDM ; ------------------------------------------------------------------------------------------------- ; aplib_decrunch: A0 = Source / A1 = Destination ; ------------------------------------------------------------------------------------------------- aplib_decrunch ; movem.l a0-a2/d0-d5,-(a7) moveq #1,d5 ; Initialize bits counter .copy_byte move.b (a0)+,(a1)+ moveq #0,d1 ; Initialize LWM .next_sequence GET_BIT bcc.s .copy_byte ; if bit sequence is %0..., then copy next byte GET_BIT bcc.s .code_pair ; if bit sequence is %10..., then is a code pair moveq #0,d0 ; offset = 0 (eor.l d0,d0) GET_BIT bcc .short_match ; if bit sequence is %110..., then is a short match ; The sequence is %111..., the next 4 bits are the offset (0-15) REPT 4 GET_BIT roxl.l #1,d0 ENDR tst.l d0 beq.s .write_byte ; if offset == 0, then write 0x00 ; If offset != 0, then write the byte on destination - offset move.l a1,a2 suba.l d0,a2 move.b (a2),d0 .write_byte move.b d0,(a1)+ moveq #0,d1 ; Initialize LWM bra .next_sequence ; Process next sequence ; Code pair %10... .code_pair DECODE_GAMMA d0 ; Get offset cmpi.l #2,d0 ; ¿offset == 2? bne.s .normal_code_pair tst.l d1 ; ¿LMW == 0? bne.s .normal_code_pair move.l d4,d0 ; offset = old_offset DECODE_GAMMA d2 ; Get length subq.l #1,d2 ; length-- DO_COPY moveq #1,d1 ; LWM = 1 bra .next_sequence ; Procesa el siguiente elemento .normal_code_pair tst.l d1 ; ¿LMW == 0? bne.s .lmw_no_0 subq.l #3,d0 ; offset -= 3 bra.s .continue_normal_code_pair .lmw_no_0 subq.l #2,d0 ; offset -= 2 .continue_normal_code_pair lsl.l #8,d0 ; offset << 8 move.b (a0)+,d0 ; get the least significant byte of the offset (16 bits) DECODE_GAMMA d2 ; Get length subq.l #1,d2 ; length-- .compare_32000 cmpi.l #$7D00,d0 ; ¿offset >= 32000? blt.s .compare_1280 addq.l #2,d2 ; length += 2 bra.s .end_compares .compare_1280 cmpi.l #$0500,d0 ; ¿offset >= 1280? blt.s .compare_128 addq.l #1,d2 ; length++ bra.s .end_compares .compare_128 cmpi.l #$0080,d0 bge.s .end_compares addq.l #2,d2 ; length += 2 .end_compares DO_COPY move.l d0,d4 ; old_offset0 = offset moveq #1,d1 ; LWM = 1 bra .next_sequence ; Process next sequence ; Short match %110... .short_match move.b (a0)+,d0 ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes) lsr.b #1,d0 beq.s .end_decrunch ; if offset == 0, end of decrunching bcs.s .with_carry moveq #1,d2 ; length = 1 bra.s .continue_short_match .with_carry moveq #2,d2 ; length = 2 .continue_short_match DO_COPY move.l d0,d4 ; old_offset = offset moveq #1,d1 ; LWM = 1 bra .next_sequence ; Process next sequence .end_decrunch ; movem.l (a7)+,a0-a2/d0-d5 rts Only remember, that if you use appack.exe, you will need to delete the header of 24 bytes that it adds at the beginning of all crunched file. |
06 February 2010, 20:41 | #2 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
That's one short decruncher the only optimization I can see is to preload the 32-bit constants into unused registers.
|
06 February 2010, 22:23 | #3 |
Registered User
Join Date: Jun 2008
Location: somewhere else
Posts: 511
|
Code:
roxl.l #1,dx >> addx.l dx,dx Code:
GET_BIT MACRO subq.b #1,d5 ; D5 = bit counter bne.s .still_bits_left\@ moveq #8,d5 move.b (a0)+,d3 ; Read next crunched byte .still_bits_left\@ add.b d3,d3 ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3) ENDM |
06 February 2010, 23:16 | #4 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
You can also remove the TST.L D0 after the REPT-ENDR sequence, and the two TST.L D1 can be reduced in size, and I think operations on D2 can be reduced to word size since its only other use is with a DBF .
|
06 February 2010, 23:32 | #5 | ||
Registered User
Join Date: Jun 2008
Location: somewhere else
Posts: 511
|
Quote:
Quote:
Also: Code:
.normal_code_pair add.l d1,d0 ; d1 is either 1 or 0 subq.l #3,d0 ; offset -= 3 .continue_normal_code_pair Last edited by hitchhikr; 06 February 2010 at 23:39. |
||
07 February 2010, 01:25 | #6 | |||||
Registered User
Join Date: Sep 2004
Location: Brasil
Age: 49
Posts: 181
|
I have applied all your optimizations Hitchhikr (GET_BIT and .normal_code_pair is to kill me ), ADDX instead of ROXL is great!!!
Quote:
Quote:
But Leffmann, i though that TST for .L, .W and .B takes the same time, i'm using the times that appears in "MC680x0 Reference 1.1, ©April,May 1995 by Flint/DARKNESS". Quote:
And the versions with all these optimizations are behind the door number 1 Code:
; ------------------------------------------------------------------------------------------------- ; Aplib decruncher for MC68000 ; by MML 2010 ; Optimized by hitchhikr and Leffmann ; ------------------------------------------------------------------------------------------------- ; vasmm68k aplib.s68 -o aplib.obj -Fbin -m68000 ; ------------------------------------------------------------------------------------------------- ; MACROS ; ------------------------------------------------------------------------------------------------- ; GET_BIT: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag. GET_BIT MACRO subq.b #1,d5 ; D5 = bit counter bne.s .still_bits_left\@ moveq #8,d5 move.b (a0)+,d3 ; Read next crunched byte .still_bits_left\@ add.b d3,d3 ; D3.b << 1 ENDM ; DECODE_GAMMA: Decode values from the crunched data using gamma code ; Long Version DECODE_GAMMAL MACRO moveq #1,\1 .get_more_gamma\@ GET_BIT addx.l \1,\1 GET_BIT bcs.s .get_more_gamma\@ ENDM ; Word Version DECODE_GAMMAW MACRO moveq #1,\1 .get_more_gamma\@ GET_BIT addx.w \1,\1 GET_BIT bcs.s .get_more_gamma\@ ENDM ; DO_COPY: Copy length (D2) bytes from destination (A1) - offset (D0) to destination DO_COPY MACRO move.l a1,a2 suba.l d0,a2 .loop_do_copy\@ move.b (a2)+,(a1)+ dbf d2,.loop_do_copy\@ ENDM ; ------------------------------------------------------------------------------------------------- ; aplib_decrunch: A0 = Source / A1 = Destination ; ------------------------------------------------------------------------------------------------- aplib_decrunch ; movem.l a0-a2/d0-d5,-(a7) moveq #1,d5 ; Initialize bits counter .copy_byte move.b (a0)+,(a1)+ moveq #0,d1 ; Initialize LWM .next_sequence GET_BIT bcc.s .copy_byte ; if bit sequence is %0..., then copy next byte GET_BIT bcc.s .code_pair ; if bit sequence is %10..., then is a code pair moveq #0,d0 ; offset = 0 (eor.l d0,d0) GET_BIT bcc .short_match ; if bit sequence is %110..., then is a short match ; The sequence is %111..., the next 4 bits are the offset (0-15) REPT 4 GET_BIT addx.l d0,d0 ENDR tst.l d0 beq.s .write_byte ; if offset == 0, then write 0x00 ; If offset != 0, then write the byte on destination - offset move.l a1,a2 suba.l d0,a2 move.b (a2),d0 .write_byte move.b d0,(a1)+ moveq #0,d1 ; Initialize LWM bra .next_sequence ; Process next sequence ; Code pair %10... .code_pair DECODE_GAMMAL d0 ; Get offset cmpi.l #2,d0 ; ¿offset == 2? bne.s .normal_code_pair tst.l d1 ; ¿LMW == 0? bne.s .normal_code_pair move.l d4,d0 ; offset = old_offset DECODE_GAMMAW d2 ; Get length subq.w #1,d2 ; length-- DO_COPY moveq #1,d1 ; LWM = 1 bra .next_sequence ; Procesa el siguiente elemento .normal_code_pair add.l d1,d0 ; d1 is either 1 or 0 subq.l #3,d0 ; offset -= 3 .continue_normal_code_pair lsl.l #8,d0 ; offset << 8 move.b (a0)+,d0 ; get the least significant byte of the offset (16 bits) DECODE_GAMMAW d2 ; Get length subq.w #1,d2 ; length-- .compare_32000 cmpi.l #$7D00,d0 ; ¿offset >= 32000? blt.s .compare_1280 addq.w #2,d2 ; length += 2 bra.s .end_compares .compare_1280 cmpi.l #$0500,d0 ; ¿offset >= 1280? blt.s .compare_128 addq.w #1,d2 ; length++ bra.s .end_compares .compare_128 cmpi.l #$0080,d0 bge.s .end_compares addq.w #2,d2 ; length += 2 .end_compares DO_COPY move.l d0,d4 ; old_offset0 = offset moveq #1,d1 ; LWM = 1 bra .next_sequence ; Process next sequence ; Short match %110... .short_match move.b (a0)+,d0 ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes) lsr.b #1,d0 beq.s .end_decrunch ; if offset == 0, end of decrunching bcs.s .with_carry moveq #1,d2 ; length = 1 bra.s .continue_short_match .with_carry moveq #2,d2 ; length = 2 .continue_short_match DO_COPY move.l d0,d4 ; old_offset = offset moveq #1,d1 ; LWM = 1 bra .next_sequence ; Process next sequence .end_decrunch ; movem.l (a7)+,a0-a2/d0-d5 rts |
|||||
07 February 2010, 01:27 | #7 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Good catch on the ADDX, had no idea it worked this way. There are only MOVEQ #0 and #1 changing D1 so TST.B or .W is fine.
You can reduce it by a further 4 bytes and make it run faster if you preload some constants: Code:
moveq #2, d6 lea $7d00.w, a3 lea $0500.w, a4 lea $0080.w, a5 cmp.l #2, d0 --> cmp.l d6, d0 cmp.l #$7d00, d0 --> cmp.l a3, d0 cmp.l #$0500, d0 --> cmp.l a4, d0 cmp.l #$0080, d0 --> cmp.l a5, d0 REPT 4 GET_BIT roxl.l #1, d0 ENDR tst.l d0 --> REPT 3 GET_BIT addx.l d0, d0 ENDR GET_BIT roxl.l #1, d0 Last edited by Leffmann; 07 February 2010 at 01:50. |
07 February 2010, 01:59 | #8 |
Registered User
Join Date: Sep 2004
Location: Brasil
Age: 49
Posts: 181
|
Sorry, i forget to preload the constants, that you tell me in your first post.
I have put in The Zone an example (showgfx) using the decruncher to show a few screens that i have made with my gfx converter (multiplatform in python+qt) that i hope to publish very soon. The quality of code is horrible , i'm "relearning" how to code the Amiga |
07 February 2010, 02:07 | #9 |
Registered User
Join Date: Jun 2008
Location: somewhere else
Posts: 511
|
Here's is my 164 bytes version
Code:
; ------------------------------------------------------------------------------------------------- ; Aplib decruncher for MC68000 ; by MML 2010 ; Size optimized (164 bytes) by Franck "hitchhikr" Charlet. ; ------------------------------------------------------------------------------------------------- ; vasmm68k aplib.s68 -o aplib.obj -Fhunk -m68000 DEST equ $500000 ; ------------------------------------------------------------------------------------------------- start: lea data(pc),a0 lea DEST,a1 ; ------------------------------------------------------------------------------------------------- ; aplib_decrunch: A0 = Source / A1 = Destination ; ------------------------------------------------------------------------------------------------- aplib_decrunch:; movem.l a0-a5/d0-d6,-(a7) lea 32000.w,a3 lea 1280.w,a4 lea 128.w,a5 moveq #1,d5 ; Initialize bits counter .copy_byte: move.b (a0)+,(a1)+ .next_sequence_init: moveq #0,d1 ; Initialize LWM .next_sequence: bsr.b .get_bit bcc.b .copy_byte ; if bit sequence is %0..., then copy next byte bsr.b .get_bit bcc.b .code_pair ; if bit sequence is %10..., then is a code pair moveq #0,d0 ; offset = 0 (eor.l d0,d0) bsr.b .get_bit bcc.b .short_match ; if bit sequence is %110..., then is a short match ; The sequence is %111..., the next 4 bits are the offset (0-15) moveq #4-1,d6 .get_3_bits: bsr.b .get_bit roxl.l #1,d0 dbf d6,.get_3_bits ; (dbcc doesn't modify flags) beq.b .write_byte ; if offset == 0, then write 0x00 ; If offset != 0, then write the byte on destination - offset move.l a1,a2 suba.l d0,a2 move.b (a2),d0 .write_byte: move.b d0,(a1)+ bra.b .next_sequence_init ; Code pair %10... .code_pair: bsr.b .decode_gamma move.l d2,d0 ; get the new offset subq.l #2,d0 ; offset == 2? bne.b .normal_code_pair tst.w d1 ; LMW == 0? bne.b .normal_code_pair move.l d4,d0 ; offset = old_offset bsr.b .decode_gamma bra.b .copy_code_pair .normal_code_pair: add.l d1,d0 ; (d1 is either 1 or 0) subq.l #1,d0 ; offset -= 1 (or 0) lsl.l #8,d0 ; offset << 8 move.b (a0)+,d0 ; get the least significant byte of the offset (16 bits) bsr.b .decode_gamma cmp.l a3,d0 ; >=32000 blt.b .compare_1280 addq.l #2,d2 ; length += 2 bra.b .continue_short_match .compare_1280: cmp.l a4,d0 ; >=1280 <32000 blt.b .compare_128 addq.l #1,d2 ; length++ bra.b .continue_short_match .compare_128: cmp.l a5,d0 ; >=128 <1280 bge.b .continue_short_match addq.l #2,d2 ; length += 2 bra.b .continue_short_match ; get_bit: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag. .get_bit: subq.b #1,d5 ; D5 = bit counter bne.b .still_bits_left moveq #8,d5 move.b (a0)+,d3 ; Read next crunched byte .still_bits_left: add.b d3,d3 ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3) rts ; decode_gamma: Decode values from the crunched data using gamma code .decode_gamma: moveq #1,d2 .get_more_gamma: bsr.b .get_bit addx.l d2,d2 bsr.b .get_bit bcs.b .get_more_gamma rts ; Short match %110... .short_match: moveq #3,d2 ; length = 3 move.b (a0)+,d0 ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes) lsr.b #1,d0 beq.b .end_decrunch ; if offset == 0, end of decrunching bcs.b .continue_short_match moveq #2,d2 ; length = 2 .continue_short_match: move.l d0,d4 ; old_offset = offset .copy_code_pair: subq.l #1,d2 ; length-- move.l a1,a2 suba.l d0,a2 .loop_do_copy: move.b (a2)+,(a1)+ dbf d2,.loop_do_copy moveq #1,d1 ; LWM = 1 bra.w .next_sequence ; Process next sequence .end_decrunch:; movem.l (a7)+,a0-a5/d0-d6 rts data: incbin "hd6:testdat" |
07 February 2010, 02:17 | #10 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Dropped the macros in subroutines nice and small, nothing to do but admitting defeat
|
07 February 2010, 18:15 | #11 |
Registered User
Join Date: Sep 2004
Location: Brasil
Age: 49
Posts: 181
|
Well, i was going to sit and put in practice all the advices of Leffmann and hitchhikr, and when i came here, what do i found???
Woooooooooo Monsieur Charlet, what a great lesson of optimization for size , it's only a byte bigger than the optimized z80 version, and in 8 bits cpus the opcodes usually are ONE byte But the best, it's that the code continues to be easy to understand, and all the changes are perfectly logics. The only sugestion, that i can think now, it's more logical that useful (same size and cycles), it would be changing the tst.w to tst.b. Thanks for the help, i'm encouraged to improvement my 68000. |
10 March 2010, 22:59 | #12 |
Registered User
Join Date: Sep 2004
Location: Brasil
Age: 49
Posts: 181
|
A minor optimization to the decruncher, change:
Code:
moveq #1,d5 ; Initialize bits counter . . . .get_bit: subq.b #1,d5 ; D5 = bit counter bne.b .still_bits_left moveq #8,d5 Code:
moveq #0,d5 ; Initialize bits counter . . . .get_bit: dbra d5,.still_bits_left moveq #7,d5 ; D5 = bit counter |
11 November 2017, 13:43 | #13 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
New stream
A bump for this old thread because there is some news.
aPLib compression have the potential to be the best real-time unpacker for 68k machine. So I thinkered how to make better suited for the architecture. I've refactored the bitstream flux and made some subtle not trivial modifications, redesigned the unpacker and inserted all the code trick that I know (you can have some sample somewhere aboard). Another brilliant 68k coder, from another board, is involved to even make the compression better. Preliminary test are encouraging, we can beat in compression almost always anything (even simple lzh based packer, if files is relatively small), with real time unpacking speed (>>floppy speed, in standard A500). [The goal is the pareto frontier for this packer class (pure LZ quasi-byte based, the only rt 68k attainable)]. Lately there is some thread about compression on EAB, then I'll ask help from other coders to trial the packer against the best available. Soon I'll publish something, stay tuned. Last edited by ross; 14 November 2017 at 22:56. |
08 February 2019, 09:17 | #14 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
|
|
08 February 2019, 12:29 | #15 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
|
08 February 2019, 13:48 | #16 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RTDD v1.9 (CrunchMania Data Decruncher). Does it exist? | BarryB | support.Apps | 5 | 15 May 2015 21:46 |
Data Decruncher | DaveMB | request.Apps | 4 | 01 October 2010 21:19 |
XPK decruncher | Joe Maroni | request.Apps | 15 | 01 May 2005 12:40 |
|
|