aPLib decruncher for Amiga

SyX · 06 February 2010, 19:49

Hi!,

I am trying to refresh my rusty 68000, because that i have convert the aPLib decruncher to our lovely Motorola

Here is the source and of course all suggestion to optimize it are welcome

Code:

; -------------------------------------------------------------------------------------------------
; Aplib decruncher for MC68000
; by MML 2010
; -------------------------------------------------------------------------------------------------

; vasmm68k aplib.s68 -o aplib.obj -Fhunk -m68000

; -------------------------------------------------------------------------------------------------
; MACROS
; -------------------------------------------------------------------------------------------------
; GET_BIT: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
GET_BIT     MACRO
    subq.b  #1,d5               ; D5 = bit counter
    beq.s   .need_more_bits\@
    add.b   d3,d3               ; D3.b << 1 (lsl.b #1,d3)
    bra.s   .still_bits_left\@
.need_more_bits\@
    moveq   #8,d5
    move.b  (a0)+,d3            ; Read next crunched byte
    add.b   d3,d3               ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3)
.still_bits_left\@
    ENDM

; DECODE_GAMMA: Decode values from the crunched data using gamma code
DECODE_GAMMA   MACRO
    moveq   #1,\1
.get_more_gamma\@
    GET_BIT
    roxl.l  #1,\1
    GET_BIT
    bcs.s   .get_more_gamma\@
    ENDM                    

; DO_COPY: Copy length (D2) bytes from destination (A1) - offset (D0) to destination
DO_COPY     MACRO
    move.l  a1,a2
    suba.l  d0,a2
.loop_do_copy\@
    move.b  (a2)+,(a1)+
    dbf     d2,.loop_do_copy\@
    ENDM

; -------------------------------------------------------------------------------------------------
; aplib_decrunch: A0 = Source / A1 = Destination
; -------------------------------------------------------------------------------------------------
aplib_decrunch
;    movem.l a0-a2/d0-d5,-(a7)

    moveq   #1,d5           ; Initialize bits counter

.copy_byte
    move.b  (a0)+,(a1)+

    moveq   #0,d1           ; Initialize LWM

.next_sequence
    GET_BIT
    bcc.s   .copy_byte      ; if bit sequence is %0..., then copy next byte

    GET_BIT
    bcc.s   .code_pair      ; if bit sequence is %10..., then is a code pair

    moveq   #0,d0           ; offset = 0 (eor.l d0,d0)
    GET_BIT                 
    bcc     .short_match    ; if bit sequence is %110..., then is a short match

    ; The sequence is %111..., the next 4 bits are the offset (0-15)
    REPT 4
        GET_BIT
        roxl.l  #1,d0
    ENDR

    tst.l   d0
    beq.s   .write_byte     ; if offset == 0, then write 0x00

    ; If offset != 0, then write the byte on destination - offset
    move.l  a1,a2
    suba.l  d0,a2
    move.b  (a2),d0

.write_byte
    move.b  d0,(a1)+
    moveq   #0,d1           ; Initialize LWM
    bra     .next_sequence  ; Process next sequence

; Code pair %10...
.code_pair
    DECODE_GAMMA d0         ; Get offset
    cmpi.l  #2,d0           ; ¿offset == 2?
    bne.s   .normal_code_pair
    tst.l   d1              ; ¿LMW == 0?
    bne.s   .normal_code_pair

    move.l  d4,d0           ; offset = old_offset
    DECODE_GAMMA d2         ; Get length
    subq.l  #1,d2           ; length--
    DO_COPY
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Procesa el siguiente elemento

.normal_code_pair
    tst.l   d1              ; ¿LMW == 0?
    bne.s   .lmw_no_0
    subq.l  #3,d0           ; offset -= 3
    bra.s   .continue_normal_code_pair
.lmw_no_0
    subq.l  #2,d0           ; offset -= 2

.continue_normal_code_pair
    lsl.l   #8,d0           ; offset << 8
    move.b  (a0)+,d0        ; get the least significant byte of the offset (16 bits)
    DECODE_GAMMA d2         ; Get length
    subq.l  #1,d2           ; length--

.compare_32000
    cmpi.l  #$7D00,d0       ; ¿offset >= 32000?
    blt.s   .compare_1280
    addq.l  #2,d2           ; length += 2
    bra.s   .end_compares

.compare_1280
    cmpi.l  #$0500,d0       ; ¿offset >= 1280?
    blt.s   .compare_128
    addq.l    #1,d2         ; length++
    bra.s   .end_compares

.compare_128
    cmpi.l  #$0080,d0
    bge.s   .end_compares
    addq.l  #2,d2          ; length += 2

.end_compares
    DO_COPY
    move.l  d0,d4           ; old_offset0 = offset
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Process next sequence

; Short match %110...
.short_match
    move.b  (a0)+,d0        ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes)

    lsr.b   #1,d0           
    beq.s   .end_decrunch   ; if offset == 0, end of decrunching

    bcs.s   .with_carry
    moveq   #1,d2           ; length = 1
    bra.s   .continue_short_match
.with_carry
    moveq   #2,d2           ; length = 2

.continue_short_match
    DO_COPY
    move.l  d0,d4           ; old_offset = offset
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Process next sequence

.end_decrunch
;    movem.l (a7)+,a0-a2/d0-d5
    rts

To crunch files, you can use the appack.exe (and compiling for linux or other systems, inclusive make your custom compressor) that appears in the "examples/c" folder of this file http://www.ibsensoftware.com/files/aPLib-1.01.zip

Only remember, that if you use appack.exe, you will need to delete the header of 24 bytes that it adds at the beginning of all crunched file.

Leffmann · 06 February 2010, 20:41

That's one short decruncher

the only optimization I can see is to preload the 32-bit constants into unused registers.

hitchhikr · 06 February 2010, 22:23

Code:

roxl.l  #1,dx >> addx.l dx,dx

Code:

GET_BIT     MACRO
    subq.b  #1,d5               ; D5 = bit counter
    bne.s   .still_bits_left\@
    moveq   #8,d5
    move.b  (a0)+,d3            ; Read next crunched byte
.still_bits_left\@
    add.b   d3,d3               ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3)
    ENDM

Leffmann · 06 February 2010, 23:16

You can also remove the TST.L D0 after the REPT-ENDR sequence, and the two TST.L D1 can be reduced in size, and I think operations on D2 can be reduced to word size since its only other use is with a DBF .

hitchhikr · 06 February 2010, 23:32

Quote:

You can also remove the TST.L D0 after the REPT-ENDR sequence

Only if addx isn't used as it doesn't always set the z flag.

Quote:

and the two TST.L D1 can be reduced in size,

Probably not as d1 tests depend on d0 state (maybe is could be completely re-arranged).

Also:

Code:

.normal_code_pair
    add.l  d1,d0            ; d1 is either 1 or 0
    subq.l #3,d0            ; offset -= 3
.continue_normal_code_pair

SyX · 07 February 2010, 01:25

I have applied all your optimizations Hitchhikr (GET_BIT and .normal_code_pair is to kill me

), ADDX instead of ROXL is great!!!

Quote:

You can also remove the TST.L D0 after the REPT-ENDR sequence

Only if addx isn't used as it doesn't always set the z flag.

Yes, after use ADDX, better keep the TST.

Quote:

and the two TST.L D1 can be reduced in size,

Probably not as d1 tests depend on d0 state (maybe is could be completely re-arranged).

Well, really D1 (LWM in the original decrunch c source code) is a flag, the only values it can have is 0 or 1. I agree that it will better to re-arranged, using for example the most significant word from other register with swap.

But Leffmann, i though that TST for .L, .W and .B takes the same time, i'm using the times that appears in "MC680x0 Reference 1.1, ©April,May 1995 by Flint/DARKNESS".

Quote:

and I think operations on D2 can be reduced to word size since its only other use is with a DBF.

Yes of course!!! I have made two GET_GAMMA macros, one for the "length" (D2) that use Word, and other for the "offset" because that can need a Long.

And the versions with all these optimizations are behind the door number 1

Code:

; -------------------------------------------------------------------------------------------------
; Aplib decruncher for MC68000
; by MML 2010
; Optimized by hitchhikr and Leffmann
; -------------------------------------------------------------------------------------------------

; vasmm68k aplib.s68 -o aplib.obj -Fbin -m68000

; -------------------------------------------------------------------------------------------------
; MACROS
; -------------------------------------------------------------------------------------------------
; GET_BIT: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
GET_BIT     MACRO
    subq.b  #1,d5               ; D5 = bit counter
    bne.s   .still_bits_left\@
    moveq   #8,d5
    move.b  (a0)+,d3            ; Read next crunched byte
.still_bits_left\@
    add.b   d3,d3               ; D3.b << 1
    ENDM

; DECODE_GAMMA: Decode values from the crunched data using gamma code
; Long Version
DECODE_GAMMAL MACRO
    moveq   #1,\1
.get_more_gamma\@
    GET_BIT
    addx.l  \1,\1
    GET_BIT
    bcs.s   .get_more_gamma\@
    ENDM                    

; Word Version
DECODE_GAMMAW   MACRO
    moveq   #1,\1
.get_more_gamma\@
    GET_BIT
    addx.w  \1,\1
    GET_BIT
    bcs.s   .get_more_gamma\@
    ENDM                    

; DO_COPY: Copy length (D2) bytes from destination (A1) - offset (D0) to destination
DO_COPY     MACRO
    move.l  a1,a2
    suba.l  d0,a2
.loop_do_copy\@
    move.b  (a2)+,(a1)+
    dbf     d2,.loop_do_copy\@
    ENDM

; -------------------------------------------------------------------------------------------------
; aplib_decrunch: A0 = Source / A1 = Destination
; -------------------------------------------------------------------------------------------------
aplib_decrunch
;    movem.l a0-a2/d0-d5,-(a7)

    moveq   #1,d5           ; Initialize bits counter

.copy_byte
    move.b  (a0)+,(a1)+

    moveq   #0,d1           ; Initialize LWM

.next_sequence
    GET_BIT
    bcc.s   .copy_byte      ; if bit sequence is %0..., then copy next byte

    GET_BIT
    bcc.s   .code_pair      ; if bit sequence is %10..., then is a code pair

    moveq   #0,d0           ; offset = 0 (eor.l d0,d0)
    GET_BIT                 
    bcc     .short_match    ; if bit sequence is %110..., then is a short match

    ; The sequence is %111..., the next 4 bits are the offset (0-15)
    REPT 4
        GET_BIT
        addx.l d0,d0
    ENDR

    tst.l   d0
    beq.s   .write_byte     ; if offset == 0, then write 0x00

    ; If offset != 0, then write the byte on destination - offset
    move.l  a1,a2
    suba.l  d0,a2
    move.b  (a2),d0

.write_byte
    move.b  d0,(a1)+
    moveq   #0,d1           ; Initialize LWM
    bra     .next_sequence  ; Process next sequence

; Code pair %10...
.code_pair
    DECODE_GAMMAL d0        ; Get offset
    cmpi.l  #2,d0           ; ¿offset == 2?
    bne.s   .normal_code_pair
    tst.l   d1              ; ¿LMW == 0?
    bne.s   .normal_code_pair

    move.l  d4,d0           ; offset = old_offset
    DECODE_GAMMAW d2        ; Get length
    subq.w  #1,d2           ; length--
    DO_COPY
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Procesa el siguiente elemento

.normal_code_pair
    add.l  d1,d0            ; d1 is either 1 or 0
    subq.l #3,d0            ; offset -= 3

.continue_normal_code_pair
    lsl.l   #8,d0           ; offset << 8
    move.b  (a0)+,d0        ; get the least significant byte of the offset (16 bits)
    DECODE_GAMMAW d2        ; Get length
    subq.w  #1,d2           ; length--

.compare_32000
    cmpi.l  #$7D00,d0       ; ¿offset >= 32000?
    blt.s   .compare_1280
    addq.w  #2,d2           ; length += 2
    bra.s   .end_compares

.compare_1280
    cmpi.l  #$0500,d0       ; ¿offset >= 1280?
    blt.s   .compare_128
    addq.w  #1,d2           ; length++
    bra.s   .end_compares

.compare_128
    cmpi.l  #$0080,d0
    bge.s   .end_compares
    addq.w  #2,d2          ; length += 2

.end_compares
    DO_COPY
    move.l  d0,d4           ; old_offset0 = offset
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Process next sequence

; Short match %110...
.short_match
    move.b  (a0)+,d0        ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes)

    lsr.b   #1,d0           
    beq.s   .end_decrunch   ; if offset == 0, end of decrunching

    bcs.s   .with_carry
    moveq   #1,d2           ; length = 1
    bra.s   .continue_short_match
.with_carry
    moveq   #2,d2           ; length = 2

.continue_short_match
    DO_COPY
    move.l  d0,d4           ; old_offset = offset
    moveq   #1,d1           ; LWM = 1
    bra     .next_sequence  ; Process next sequence

.end_decrunch
;    movem.l (a7)+,a0-a2/d0-d5
    rts

With all these modifications, we have saved 60 bytes (only are 316 bytes now) GREAT!!! Thanks to both!!!

Leffmann · 07 February 2010, 01:27

Good catch on the ADDX, had no idea it worked this way. There are only MOVEQ #0 and #1 changing D1 so TST.B or .W is fine.

You can reduce it by a further 4 bytes and make it run faster if you preload some constants:

Code:

moveq  #2, d6
lea    $7d00.w, a3
lea    $0500.w, a4
lea    $0080.w, a5 

cmp.l  #2, d0      -->  cmp.l  d6, d0
cmp.l  #$7d00, d0  -->  cmp.l  a3, d0
cmp.l  #$0500, d0  -->  cmp.l  a4, d0
cmp.l  #$0080, d0  -->  cmp.l  a5, d0



REPT 4
GET_BIT
roxl.l  #1, d0
ENDR
tst.l   d0

-->

REPT      3
GET_BIT
addx.l  d0, d0
ENDR
GET_BIT
roxl.l  #1, d0

SyX · 07 February 2010, 01:59

Sorry, i forget to preload the constants, that you tell me in your first post.

I have put in The Zone an example (showgfx) using the decruncher to show a few screens that i have made with my gfx converter (multiplatform in python+qt) that i hope to publish very soon. The quality of code is horrible

, i'm "relearning" how to code the Amiga

hitchhikr · 07 February 2010, 02:07

Here's is my 164 bytes version

Code:

; -------------------------------------------------------------------------------------------------
; Aplib decruncher for MC68000
; by MML 2010
; Size optimized (164 bytes) by Franck "hitchhikr" Charlet.
; -------------------------------------------------------------------------------------------------

; vasmm68k aplib.s68 -o aplib.obj -Fhunk -m68000

DEST                    equ     $500000

; -------------------------------------------------------------------------------------------------
start:                  lea     data(pc),a0
                        lea     DEST,a1

; -------------------------------------------------------------------------------------------------
; aplib_decrunch: A0 = Source / A1 = Destination
; -------------------------------------------------------------------------------------------------
aplib_decrunch:;        movem.l a0-a5/d0-d6,-(a7)
                        lea     32000.w,a3
                        lea     1280.w,a4
                        lea     128.w,a5
                        moveq   #1,d5           ; Initialize bits counter
.copy_byte:             move.b  (a0)+,(a1)+
.next_sequence_init:    moveq   #0,d1           ; Initialize LWM
.next_sequence:         bsr.b   .get_bit
                        bcc.b   .copy_byte      ; if bit sequence is %0..., then copy next byte
                        bsr.b   .get_bit
                        bcc.b   .code_pair      ; if bit sequence is %10..., then is a code pair
                        moveq   #0,d0           ; offset = 0 (eor.l d0,d0)
                        bsr.b   .get_bit
                        bcc.b   .short_match    ; if bit sequence is %110..., then is a short match

                        ; The sequence is %111..., the next 4 bits are the offset (0-15)
                        moveq   #4-1,d6
.get_3_bits:            bsr.b   .get_bit
                        roxl.l  #1,d0
                        dbf     d6,.get_3_bits  ; (dbcc doesn't modify flags)
                        beq.b   .write_byte     ; if offset == 0, then write 0x00

                        ; If offset != 0, then write the byte on destination - offset
                        move.l  a1,a2
                        suba.l  d0,a2
                        move.b  (a2),d0
.write_byte:            move.b  d0,(a1)+
                        bra.b   .next_sequence_init
; Code pair %10...
.code_pair:             bsr.b   .decode_gamma
                        move.l  d2,d0           ; get the new offset
                        subq.l  #2,d0           ; offset == 2?
                        bne.b   .normal_code_pair
                        tst.w   d1              ; LMW == 0?
                        bne.b   .normal_code_pair
                        move.l  d4,d0           ; offset = old_offset
                        bsr.b   .decode_gamma
                        bra.b   .copy_code_pair
.normal_code_pair:      add.l   d1,d0           ; (d1 is either 1 or 0)
                        subq.l  #1,d0           ; offset -= 1 (or 0)
                        lsl.l   #8,d0           ; offset << 8
                        move.b  (a0)+,d0        ; get the least significant byte of the offset (16 bits)
                        bsr.b   .decode_gamma
                        cmp.l   a3,d0           ; >=32000
                        blt.b   .compare_1280
                        addq.l  #2,d2           ; length += 2
                        bra.b   .continue_short_match
.compare_1280:          cmp.l   a4,d0           ; >=1280 <32000
                        blt.b   .compare_128
                        addq.l  #1,d2           ; length++
                        bra.b   .continue_short_match
.compare_128:           cmp.l   a5,d0           ; >=128 <1280
                        bge.b   .continue_short_match
                        addq.l  #2,d2          ; length += 2
                        bra.b   .continue_short_match

; get_bit: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
.get_bit:               subq.b  #1,d5           ; D5 = bit counter
                        bne.b   .still_bits_left
                        moveq   #8,d5
                        move.b  (a0)+,d3        ; Read next crunched byte
.still_bits_left:       add.b   d3,d3           ; D3.b << 1 (lsl.b #1,d3 ó roxl.b #1,d3)
                        rts

; decode_gamma: Decode values from the crunched data using gamma code
.decode_gamma:          moveq   #1,d2
.get_more_gamma:        bsr.b   .get_bit
                        addx.l  d2,d2
                        bsr.b   .get_bit
                        bcs.b   .get_more_gamma
                        rts

; Short match %110...
.short_match:           moveq   #3,d2           ; length = 3
                        move.b  (a0)+,d0        ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes)
                        lsr.b   #1,d0           
                        beq.b   .end_decrunch   ; if offset == 0, end of decrunching
                        bcs.b   .continue_short_match
                        moveq   #2,d2           ; length = 2
.continue_short_match:  move.l  d0,d4           ; old_offset = offset
.copy_code_pair:        subq.l  #1,d2           ; length--
                        move.l  a1,a2
                        suba.l  d0,a2
.loop_do_copy:          move.b  (a2)+,(a1)+
                        dbf     d2,.loop_do_copy
                        moveq   #1,d1           ; LWM = 1
                        bra.w   .next_sequence  ; Process next sequence

.end_decrunch:;         movem.l (a7)+,a0-a5/d0-d6
                        rts

data:                   incbin  "hd6:testdat"

Leffmann · 07 February 2010, 02:17

Dropped the macros in subroutines

nice and small, nothing to do but admitting defeat

SyX · 07 February 2010, 18:15

Well, i was going to sit and put in practice all the advices of Leffmann and hitchhikr, and when i came here, what do i found???

Woooooooooo

Monsieur Charlet, what a great lesson of optimization for size

, it's only a byte bigger than the optimized z80 version, and in 8 bits cpus the opcodes usually are ONE byte

But the best, it's that the code continues to be easy to understand, and all the changes are perfectly logics.

The only sugestion, that i can think now, it's more logical that useful (same size and cycles), it would be changing the tst.w to tst.b.

Thanks for the help, i'm encouraged to improvement my 68000.

SyX · 10 March 2010, 22:59

A minor optimization to the decruncher, change:

Code:

                        moveq   #1,d5           ; Initialize bits counter
.
.
.
.get_bit:               subq.b  #1,d5           ; D5 = bit counter
                        bne.b   .still_bits_left
                        moveq   #8,d5

To:

Code:

                        moveq   #0,d5           ; Initialize bits counter
.
.
.
.get_bit:               dbra    d5,.still_bits_left
                        moveq   #7,d5           ; D5 = bit counter

ross · 11 November 2017, 13:43

A bump for this old thread because there is some news.

aPLib compression have the potential to be the best real-time unpacker for 68k machine.
So I thinkered how to make better suited for the architecture.

I've refactored the bitstream flux and made some subtle not trivial modifications, redesigned the unpacker and inserted all the code trick that I know (you can have some sample somewhere aboard).

Another brilliant 68k coder, from another board, is involved to even make the compression better.

Preliminary test are encouraging, we can beat in compression almost always anything (even simple lzh based packer, if files is relatively small), with real time unpacking speed (>>floppy speed, in standard A500).
[The goal is the pareto frontier for this packer class (pure LZ quasi-byte based, the only rt 68k attainable)].

Lately there is some thread about compression on EAB, then I'll ask help from other coders to trial the packer against the best available.

Soon I'll publish something, stay tuned.

Keir · 08 February 2019, 09:17

Quote:

Originally Posted by ross

A bump for this old thread because there is some news.

aPLib compression have the potential to be the best real-time unpacker for 68k machine.
So I thinkered how to make better suited for the architecture.

I've refactored the bitstream flux and made some subtle not trivial modifications, redesigned the unpacker and inserted all the code trick that I know (you can have some sample somewhere aboard).

Another brilliant 68k coder, from another board, is involved to even make the compression better.

Preliminary test are encouraging, we can beat in compression almost always anything (even simple lzh based packer, if files is relatively small), with real time unpacking speed (>>floppy speed, in standard A500).
[The goal is the pareto frontier for this packer class (pure LZ quasi-byte based, the only rt 68k attainable)].

Lately there is some thread about compression on EAB, then I'll ask help from other coders to trial the packer against the best available.

Soon I'll publish something, stay tuned.

Was there news yet? I would be excited to try this in place of inflate in my exe packer.

ross · 08 February 2019, 12:29

Quote:

Originally Posted by kaffer

Was there news yet? I would be excited to try this in place of inflate in my exe packer.

Like other projects of mine: "paused"

Keir · 08 February 2019, 13:48

Quote:

Originally Posted by ross

Like other projects of mine: "paused"

I'll keep my eyes peeled, it sounds very interesting

11 November 2017, 13:43	#13
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	New stream A bump for this old thread because there is some news. aPLib compression have the potential to be the best real-time unpacker for 68k machine. So I thinkered how to make better suited for the architecture. I've refactored the bitstream flux and made some subtle not trivial modifications, redesigned the unpacker and inserted all the code trick that I know (you can have some sample somewhere aboard). Another brilliant 68k coder, from another board, is involved to even make the compression better. Preliminary test are encouraging, we can beat in compression almost always anything (even simple lzh based packer, if files is relatively small), with real time unpacking speed (>>floppy speed, in standard A500). [The goal is the pareto frontier for this packer class (pure LZ quasi-byte based, the only rt 68k attainable)]. Lately there is some thread about compression on EAB, then I'll ask help from other coders to trial the packer against the best available. Soon I'll publish something, stay tuned. Last edited by ross; 14 November 2017 at 22:56.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
RTDD v1.9 (CrunchMania Data Decruncher). Does it exist?	BarryB	support.Apps	5	15 May 2015 21:46
Data Decruncher	DaveMB	request.Apps	4	01 October 2010 21:19
XPK decruncher	Joe Maroni	request.Apps	15	01 May 2005 12:40

06 February 2010, 20:41	#2
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	That's one short decruncher the only optimization I can see is to preload the 32-bit constants into unused registers.

06 February 2010, 23:16	#4
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	You can also remove the TST.L D0 after the REPT-ENDR sequence, and the two TST.L D1 can be reduced in size, and I think operations on D2 can be reduced to word size since its only other use is with a DBF .

07 February 2010, 01:27	#7
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	Good catch on the ADDX, had no idea it worked this way. There are only MOVEQ #0 and #1 changing D1 so TST.B or .W is fine. You can reduce it by a further 4 bytes and make it run faster if you preload some constants: Code: moveq #2, d6 lea $7d00.w, a3 lea $0500.w, a4 lea $0080.w, a5 cmp.l #2, d0 --> cmp.l d6, d0 cmp.l #$7d00, d0 --> cmp.l a3, d0 cmp.l #$0500, d0 --> cmp.l a4, d0 cmp.l #$0080, d0 --> cmp.l a5, d0 REPT 4 GET_BIT roxl.l #1, d0 ENDR tst.l d0 --> REPT 3 GET_BIT addx.l d0, d0 ENDR GET_BIT roxl.l #1, d0 Last edited by Leffmann; 07 February 2010 at 01:50.

07 February 2010, 01:59	#8
SyX Registered User Join Date: Sep 2004 Location: Brasil Age: 49 Posts: 181	Sorry, i forget to preload the constants, that you tell me in your first post. I have put in The Zone an example (showgfx) using the decruncher to show a few screens that i have made with my gfx converter (multiplatform in python+qt) that i hope to publish very soon. The quality of code is horrible , i'm "relearning" how to code the Amiga

07 February 2010, 02:17	#10
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	Dropped the macros in subroutines nice and small, nothing to do but admitting defeat

07 February 2010, 18:15	#11
SyX Registered User Join Date: Sep 2004 Location: Brasil Age: 49 Posts: 181	Well, i was going to sit and put in practice all the advices of Leffmann and hitchhikr, and when i came here, what do i found??? Woooooooooo Monsieur Charlet, what a great lesson of optimization for size , it's only a byte bigger than the optimized z80 version, and in 8 bits cpus the opcodes usually are ONE byte But the best, it's that the code continues to be easy to understand, and all the changes are perfectly logics. The only sugestion, that i can think now, it's more logical that useful (same size and cycles), it would be changing the tst.w to tst.b. Thanks for the help, i'm encouraged to improvement my 68000.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)