![]() |
![]() |
![]() |
#501 |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
|
![]() |
![]() |
#502 |
Computer Nerd
![]() Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 44
Posts: 3,205
|
I'd much rather optimize code for speed.
|
![]() |
![]() |
#503 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
Quote:
![]() Speed depends on the cpu you're running the code. Code size does not. 90% of a program doesn't need optimising for speed, only a handful routines do -- but code size matters everywhere. This is why I like short code more than fast code - and for fast code one could just get a faster cpu... |
|
![]() |
![]() |
#504 |
Computer Nerd
![]() Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 44
Posts: 3,205
|
Code that doesn't require speed should be written for clarity (unless it bloats the code a lot). A good example would be using multiplies instead of shifts and adds.
Now you're talking like Microsoft ![]() ![]() When software is written properly, code size doesn't matter one bit, because you're not bloating it with code you don't need. |
![]() |
![]() |
#505 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
Quote:
Quote:
For example, you might be happy with some self-modifying code on 68000 and then you discover it no longer works on another cpu. Or you replaced muls by shifts and adds and it becomes slower on machines with a fast multiply. You speak about my picture viewer but if you examine its code you won't see huge unrolled loops or bad quality trades for speed. Quote:
That said, i would indeed not waste readability for code that's just slightly shorter (or slightly faster, as said above). |
|||
![]() |
![]() |
#506 | |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Efficiently created processor designs (with efficient ISAs) should try to make the smallest code the fastest code too. The most efficient places to employ extra hardware in a CPU are places which improve code density. These resources often at least partially pay for themselves. A good example is the innate code compression in the 68k where the advantages of smaller code easily outweigh the small latency and extra transistors used in decoding. Extra hardware could be added to reduce the advantages of loop unrolling, function inlining and need for wasted alignment padding and this would be partially paid for if it reduces code size. In contrast, trying to make big code fast is like sailing into the wind which destroyed many a sail of RISC ships and has sunk several. The least code dense Alpha and PA-RISC are sunk and the next least MIPS, SPARC and PPC are dead in the water with their crews bailing water. |
|
![]() |
![]() |
#507 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
Anyway here's the 32 byte version I told about :
Code:
move.l d3,d0 moveq #16,d1 asl.l d1,d0 bvs.s .ns addq.b #8,d1 asl.l #8,d0 bvs.s .ns addq.b #2,d1 .ns move.l d3,d0 lsr.l #8,d0 bne.s .nu0 addq.b #1,d1 .nu0 lsr.l #8,d0 bne.s .nu1 addq.b #4,d1 .nu1 rts ![]() Quote:
But the 68k isn't the best thing we could have. By reencoding it and adding a few key features we could beat it by at least 10% (got 18% for the above example). |
|
![]() |
![]() |
#508 |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
30 byte (28 if flag not in low bit *):
Code:
; d0.l input value ; d1 output flag moveq #1,d2 swap d2 .l move.l d0,d3 sub.l d2,d3 addx d1,d1 add.l d2,d3 move.l d2,d4 lsr.l #1,d4 add.l d4,d3 sub.l d2,d3 addx d1,d1 lsr.l #8,d2 bne.s .l lsr.b #2,d1 * rts ![]() |
![]() |
![]() |
#509 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
Too bad it's much slower, but still great
![]() However, how could you miss this : Code:
moveq #1,d2 swap d2 .l move.l d0,d3 sub.l d2,d3 addx.l d1,d1 move.l d2,d3 lsr.l #1,d3 add.l d0,d3 sub.l d2,d3 addx.l d1,d1 lsr.l #8,d2 bne.s .l rts ![]() |
![]() |
![]() |
#510 | |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
Quote:
![]() ![]() What i make with d4 is absurd ![]() This is really funny, i create a very alternative 'think mode' and losing the simple trick (same as CRC Boot..). Bye, ![]() |
|
![]() |
![]() |
#511 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
|
![]() |
![]() |
#512 |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
[OT]
scusate ma non resisto (two is megl' che uan): [ Show youtube player ] il mio inglese maccheronico e quello che ha scritto meynaf me l'ha fatto venire in mente ![]() [/OF] sorry, some italian humor about 'two is better than one' joint with my spaghetti english ![]() |
![]() |
![]() |
#513 |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
|
![]() |
![]() |
#514 | |||
Computer Nerd
![]() Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 44
Posts: 3,205
|
Yes, but probably not always (can't think of an example, though
![]() Quote:
Quote:
Quote:
Anyway, the way I interpreted what you wrote made me fear the worst ![]() |
|||
![]() |
![]() |
#515 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,146
|
Quote:
Quote:
Look at all these great demos of the past. How many of them still run fine on accelerated machines ? Quote:
Quote:
![]() Perhaps I shouldn't have added this last remark, but if a cpu is underpowered for a task then there is little we can do about it aside of getting a faster one... |
||||
![]() |
![]() |
#516 | |
Registered User
![]() Join Date: Feb 2017
Location: Denmark
Posts: 78
|
Quote:
Here's my efforts (without tweaking the encoder / hurting performance too much). Byte-oriented like your approach with the mode (control) bits kept in a separate byte. Format of encoded data: 8 control bits ($00 = special) followed by 8 times either a literal (if control bit is 0) or a 16-bit match (11 bits of offset, 5 bits for length). $00 control byte followed by $00 = 8 literals, otherwise the next byte is -8-symbols_in_this_block (i.e. -1 -> 7 lz symbols follow), followed by the control bits for the last symbols. This leaves 1..$f7 unused for now. I'm not too satisfied about the special cases and there's lots of instructions from my current 58 to something worth bragging about ![]() Main decoding routine: Code:
LENBITS=5 WINBITS=11 ; a0 = output ; a1 = input ; a2 = input end unpack_getcontrolbits: cmp.l a1, a2 ; are we done? bne.s unpack ; no go to main routine rts ; done - return unpack: ; MAIN ENTRY POINT moveq #8, d3 ; d3 = remaining control bits move.b (a1)+, d0 ; d0 = control bits bne.s .loop ; not special case move.b (a1)+, d0 ; grab next byte in stream beq.s .loop ; the control bits just happened to be zero add.b d0, d3 ; d3 = 8+byte read (=num control bits) move.b (a1)+, d0 ; load control bits .loop: subq.b #1, d3 ; we consumed a control bit bmi.s unpack_getcontrolbits ; need to refill control byte? add.b d0, d0 ; grab control bit bcs.s .handlematch ; copy from window? move.b (a1)+, (a0)+ ; copy literal bra.s .loop ; and continue with unpacking .handlematch: move.b (a1)+, -(a7) ; grab 8 MSB of offset move.w (a7)+, d1 ; to upper byte of d1.w move.b (a1)+, d1 ; lower bits moveq #(1<<LENBITS)-1, d2 ; length mask and.b d1, d2 ; d2 = length addq.b #2, d2 ; d2 = length + min_match_length(3) - 1 lsr.w #LENBITS, d1 ; shift offset into place not.w d1 ; d1 = -1-offset (-1..-(1<<WINBITS)) .copy: move.b (a0,d1.w), (a0)+ ; copy match dbf d2, .copy bra.s .loop |
|
![]() |
![]() |
#517 | |
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
![]() Quote:
The format, like Thorham suggested, is made for a totally byte oriented approach. But with a big problem (for pure 68k): offset in 16bit. So You need some trick or simply two separate stream for byte/word token (blazing fast, but no more in place unpacking). This is the code (not rechecked, maybe later): Code:
ls: moveq #8,d2 move.b (a0)+,d1 la: subq.w #1,d2 bmi.s ls move.b (a0)+,d0 beq.s exit movea.l a0,a2 add.b d1,d1 lb: bcc.b ll lz: move.b (a0)+,-(sp) move.w (sp)+,d3 move.b (a0)+,d3 suba.w d3,a2 ll: move.b (a2)+,(a1)+ subq.b #1,d0 bne.b ll bra.b la exit: rts Bye! ![]() EDIT: two stream version, even more compact (28 byte) ![]() Code:
ls: moveq #8,d2 move.b (a0)+,d1 la: subq.w #1,d2 bmi.s ls move.b (a0)+,d0 beq.s exit movea.l a0,a2 add.b d1,d1 lb: bcc.b ll lz: suba.w (a3)+,a2 ll: move.b (a2)+,(a1)+ subq.b #1,d0 bne.b ll bra.b la exit: Last edited by ross; 24 March 2017 at 23:16. Reason: two stream version, uncorrect code, see later post |
|
![]() |
![]() |
#518 | |
Registered User
![]() Join Date: Feb 2017
Location: Denmark
Posts: 78
|
Quote:
![]() Here's the code with my comments. Did I understand the code right? If yes, I think I'll have to make some adaptions to get it working in my scenario: I want to copy from the uncompressed area instead of from the compressed stream, but overall lots of new information for me to look at! Code:
ls: moveq #8,d2 ; d2 = number of control bits move.b (a0)+,d1 ; d1 = control byte la: subq.w #1,d2 ; we'll consume one control bit bmi.s ls ; need to refill? move.b (a0)+,d0 ; d0 = match length/literal count beq.s exit ; zero conveniently means exit! movea.l a0,a2 ; a2 = offset of next src byte add.b d1,d1 ; get control bit lb bcc.b ll ; literal? lz: move.b (a0)+,-(sp) ; use stack trick to get... move.w (sp)+,d3 ; upper byte to d3.w move.b (a0)+,d3 ; lower byte to d3.w suba.w d3,a2 ; offset by -d3 (note: copy from compressed stream) ll: move.b (a2)+,(a1)+ ; copy match/literal subq.b #1,d0 ; one more byte done bne.b ll ; continue with copy bra.b la ; process next control bit exit: rts I'll have to study this for more nice tricks! Once again thanks a lot ![]() |
|
![]() |
![]() |
#519 | ||
Defendit numerus
![]() Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 3,130
|
Quote:
![]() Right code: Code:
ls: moveq #8,d2 ; d2 = number of control bits move.b (a0)+,d1 ; d1 = control byte la: subq.w #1,d2 ; we'll consume one control bit bmi.s ls ; need to refill? move.b (a0)+,d0 ; d0 = match length/literal count beq.s exit ; zero conveniently means exit! movea.l a0,a2 ; a2 = prepare for literal copy add.b d1,d1 ; get control bit lb bcc.b ll ; literal? lz: movea.l a1,a2 ; a2 = prepare for match copy move.b (a0)+,-(sp) ; use stack trick to get... move.w (sp)+,d3 ; upper byte to d3.w move.b (a0)+,d3 ; lower byte to d3.w suba.w d3,a2 ; offset by -d3 (note: copy from uncompressed stream) ll: move.b (a2)+,(a1)+ ; copy match/literal subq.b #1,d0 ; one more byte done bne.b ll ; continue with copy bra.b la ; process next control bit exit: rts Quote:
A separate stream in a3 give the offset. So: a0=first stream (byte token) a3=second stream (word token, 15bit negative offset) Code:
ls: moveq #8,d2 move.b (a0)+,d1 la: subq.w #1,d2 bmi.s ls move.b (a0)+,d0 beq.s exit movea.l a0,a2 add.b d1,d1 lb: bcc.b ll lz: movea.l a1,a2 [same mistake :p] suba.w (a3)+,a2 ll: move.b (a2)+,(a1)+ subq.b #1,d0 bne.b ll bra.b la exit: ![]() Last edited by ross; 24 March 2017 at 23:34. Reason: spelling |
||
![]() |
![]() |
#520 |
Registered User
![]() Join Date: Feb 2017
Location: Denmark
Posts: 78
|
After spending way too much time on this silly project, I think I'm finally done creating odd LZ-variants
![]() Playing around with various parameters, I've settled on the following word oriented format: 16-bits of control/mode bits (MSB->LSB, 1=literal 0=match). Literals are 16-bit and matches are coded with 11-bits for the offset, 5 bits for length-1. A zero match word signals end of data). 42-byte inner loop can't quite match the best of Ross', but the extra instructions gave improvements on my test data. Of course adding more instructions gave better results, but this is what I ended up with. Code:
; a0 = output ; a1 = input (compressed data) ; trashes d0-d3/a0-a2 unpack: .refill: moveq #16, d1 ; d1 = number of control bits move.w (a1)+, d0 ; d0 = control bits .mainloop: subq.w #1, d1 ; consume a control bit bmi.s .refill ; need to refill? add.w d0, d0 ; extract next control bit bcc.s .copymatch ; match? move.w (a1)+, (a0)+ ; copy literal bra.s .mainloop ; one more loop .copymatch: move.w (a1)+, d2 ; get match length and offset beq.s .quit ; done? moveq #(1<<lenbits)-1, d3 ; d3 = length mask and.w d2, d3 ; d3 = length eor.w d3, d2 ; d2 = offset<<lenbits lsr.w #lenbits-1, d2 ; d2 = offset<<1 move.l a0, a2 ; a2 = &output[pos] sub.w d2, a2 ; a2 = &output[pos-offset] .copy: move.w (a2)+, (a0)+ ; copy match dbf d3, .copy ; done? bra.s .mainloop ; continue with main loop .quit: rts Last edited by paraj; 17 April 2017 at 01:51. Reason: Forgot to credit Estrayk for use of music in test_data - sorry! |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Starting ASM coding on A1200. Which Assembler? | Nosferax | Coders. Asm / Hardware | 68 | 27 November 2015 17:14 |
4th tutorial on ASM- and HW-coding | Vikke | Coders. Asm / Hardware | 11 | 10 April 2013 21:32 |
3rd tutorial on ASM- and HW-coding | Vikke | Coders. Asm / Hardware | 6 | 26 March 2013 16:57 |
First tutorial on ASM- and HW-coding | Vikke | Coders. Asm / Hardware | 46 | 18 March 2013 13:33 |
2nd tutorial on ASM- and HW-coding | Vikke | Coders. Asm / Hardware | 10 | 17 March 2013 12:49 |
|
|