For people who like optimizing 680x0 code.

Thorham · 26 May 2008, 20:24

Hi coders,

While cleaning up my miggys hd, I came across a small program that some of you might like to try and optimize. Currently the code only tokenizes a piece of data (best tried with normal text). It 'probably' doesn't contain any bugs, and can be assembled with AsmOne. The file 'kanjidic' that's incbinned in the source can be obtained here: http://ftp.monash.edu.au/pub/nihongo/kanjidic.gz. It's a direct link to the (one megabyte or so) text.

Have fun

Code:

;-----------------------------------------------------------------------------------
;Pack data.
;-----------------------------------------------------------------------------------
Start	movem.l	d0-a6,-(sp)

	move.l	#In,a0

	move.l	#InText,00(a0)
	move.l	#InTextSize,04(a0)
	move.l	#" ",08(a0)
	move.l	#HashTable,12(a0)
	move.l	#InTextTrail,16(a0)
	move.l	#BucketLinks,20(a0)
	bsr	Tokenize

	movem.l	(sp)+,d0-a6
	rts
;-----------------------------------------------------------------------------------
Tokenize ;Creates a list of unique strings based on an end of string marker.
;-----------------------------------------------------------------------------------
;	00=data
;	04=data length
;	08=end of string marker
;	12=hash table
;	16=trailing space
;	20=bucket link table
;-----------------------------------------------------------------------------------
	movem.l	d0-a6,-(sp)

	move.l	#In,a0
	move.l	00(a0),a1	;Input data.
	move.l	04(a0),d0	;Input size.
	move.l	08(a0),d1	;End of string marker.
	move.l	16(a0),a2	;Input data trailing space.
	move.l	20(a0),a4	;Bucket links.

	move.l	12(a0),a0	;Hash table.

	move.b	d1,(a2)+	;Write trailing end of string chars.
	move.b	d1,(a2)+
	move.b	d1,(a2)+
	move.b	d1,(a2)+

	move.l	a1,a2
	add.l	d0,a2		;Point a2 to end of data.

	move.l	#0,d2		;String char.
	move.l	#0,d3		;String hash.
	move.l	#0,d4		;String length.
	move.l	#0,d5		;Zero compare reg.

.eos_nh
	cmp.l	a1,a2		;Test if end of data reached.
	blt	.eof

	move.l	a1,a3		;Set string start.
	moveq	#0,d3		;Clear string hash.
	moveq	#0,d4		;Clear string length.
.lp
	move.b	(a1)+,d2	;Get char from data.
	cmp.b	d1,d2		;Test for end of string.
	beq	.eos

	addq.l	#1,d4		;Increase string length.

	rol.w	#4,d3		;Simple hash function.
	add.w	d2,d3

	bra	.lp		;Get next char from data.
.eos
	cmp.l	d5,d4		;If string length is zero,
	beq	.eos_nh		;then get next string.

	move.l	(a0,d3.l*4),a5	;Get bucket.

	cmp.l	d5,a5		;Check empty bucket.
	beq	.new_bucket

	move.l	a3,d6		;Save current string.
.lp3
	move.l	(a5)+,a6	;Get string from bucket.
.lp2
	move.b	(a3)+,d2	;Compare strings.
	cmp.b	(a6)+,d2
	beq	.cmpstr

	move.l	d6,a3		;If strings don't match
	move.l	a5,a6		;get next string in the bucket.
	move.l	(a5),a5
	cmp.l	d5,a5
	bne	.lp3

	move.l	a4,(a6)		;If no more strings in the bucket
	move.l	a3,(a4)		;then add current string to the bucket.
	addq.l	#8,a4
	bra	.eos_nh
.cmpstr
	cmp.b	d1,d2		;2nd part of compare string. Tests if end of
	bne	.lp2		;both strings reached. If so, strings are the
	bra	.eos_nh		;same and next string in data can be searched.
				;Otherwise compare next chars in both string.
.new_bucket
	move.l	a4,(a0,d3.l*4)	;Write bucket link to hash table.
	move.l	a3,(a4)		;Write string adress to bucket link.
	addq.l	#8,a4		;Point to new bucket link.
	bra	.eos_nh
.eof
	movem.l	(sp)+,d0-a6
	rts
;-----------------------------------------------------------------------------------
In	dcb.l	32
Out	dcb.l	32
;-----------------------------------------------------------------------------------
		section	data,data_f
InText		incbin	"kanjidic"
InTextTrail	dc.l	0
InTextEnd
InTextSize	=InTextEnd-InText
HashTable	dcb.l	65536
BucketLinks	dcb.l	2048*1024
;-----------------------------------------------------------------------------------

RedskullDC · 27 May 2008, 10:46

Hi Thorham,

Just applied some obvious optimisations:

Quote:

Originally Posted by Thorham

Hi coders,

......

Code:

;-----------------------------------------------------------------------------------
;Pack data.
;-----------------------------------------------------------------------------------
Start    movem.l    a0,-(sp)
 
    move.l    #In,a0
    move.l    #InText,00(a0)
    move.l    #InTextSize,04(a0)
    move.l    #"    ",08(a0)
    move.l    #HashTable,12(a0)
    move.l    #InTextTrail,16(a0)
    move.l    #BucketLinks,20(a0)
    bsr    Tokenize
 
    movem.l    (sp)+,a0
    rts
;-----------------------------------------------------------------------------------
Tokenize ;Creates a list of unique strings based on an end of string marker.
;-----------------------------------------------------------------------------------
;    00=data
;    04=data length
;    08=end of string marker
;    12=hash table
;    16=trailing space
;    20=bucket link table
;-----------------------------------------------------------------------------------
    movem.l    d0-d6/a0-a6,-(sp)
    move.l    00(a0),a1    ;Input data.
    move.l    04(a0),d0    ;Input size.
    move.l    08(a0),d1    ;End of string marker.
    move.l    16(a0),a2    ;Input data trailing space.
    move.l    20(a0),a4    ;Bucket links.
    move.l    12(a0),a0    ;Hash table.
    move.l    d1,(a2)+    ;Write trailing end of string chars.
    move.l    a1,a2
    add.l    d0,a2        ;Point a2 to end of data.
 
    move.l    #0,d2        ;String char.
    move.l    #0,d5        ;Zero compare reg.
 
.eos_nh
    cmp.l    a1,a2        ;Test if end of data reached.
    blt    .eof
 
    move.l    a1,a3        ;Set string start.
    moveq    #0,d3        ;Clear string hash.
    moveq    #0,d4        ;Clear string length.
.lp
    move.b    (a1)+,d2    ;Get char from data.
    cmp.b    d1,d2        ;Test for end of string.
    beq    .eos
 
    addq.l    #1,d4        ;Increase string length.
 
    rol.w    #4,d3        ;Simple hash function.
    add.w    d2,d3
 
    bra    .lp        ;Get next char from data.
.eos
    cmp.l    d5,d4        ;If string length is zero,
    beq    .eos_nh        ;then get next string.
 
    move.l    (a0,d3.l*4),a5    ;Get bucket.
 
    cmp.l    d5,a5        ;Check empty bucket.
    beq    .new_bucket
 
    move.l    a3,d6        ;Save current string.
.lp3
    move.l    (a5)+,a6    ;Get string from bucket.
.lp2
    move.b    (a3)+,d2    ;Compare strings.
    cmp.b    (a6)+,d2
    beq    .cmpstr
 
    move.l    d6,a3        ;If strings don't match
    move.l    a5,a6        ;get next string in the bucket.
    move.l    (a5),a5
    cmp.l    d5,a5
    bne    .lp3
 
    move.l    a4,(a6)        ;If no more strings in the bucket
    move.l    a3,(a4)        ;then add current string to the bucket.
    addq.l    #8,a4
    bra    .eos_nh
.cmpstr
    cmp.b    d1,d2        ;2nd part of compare string. Tests if end of
    bne    .lp2        ;both strings reached. If so, strings are the
    bra    .eos_nh        ;same and next string in data can be searched.
                ;Otherwise compare next chars in both string.
.new_bucket
    move.l    a4,(a0,d3.l*4)    ;Write bucket link to hash table.
    move.l    a3,(a4)        ;Write string adress to bucket link.
    addq.l    #8,a4        ;Point to new bucket link.
    bra    .eos_nh
.eof
    movem.l    (sp)+,d0-d6/a0-a6
    rts
;-----------------------------------------------------------------------------------
In    dcb.l    32
Out    dcb.l    32
;-----------------------------------------------------------------------------------
        section    data,data_f
InText        incbin    "kanjidic"
InTextTrail    dc.l    0
InTextEnd
InTextSize    =InTextEnd-InText
HashTable    dcb.l    65536
BucketLinks    dcb.l    2048*1024
;-----------------------------------------------------------------------------------

Cheers,
Red

Thorham · 27 May 2008, 13:19

Quote:

Originally Posted by RedskullDC

Hi Thorham,

Just applied some obvious optimisations:

Cheers,
Red

Hey Redskull,

Erm, perhaps I should have mentioned that the optimizations have to be made within the cpu intensive loop, my bad

The point here is that on my '30 50mhz the tight loop takes about 1.1 secs with the linked text file, and I'm wondering if this time can be improved

RedskullDC · 27 May 2008, 16:55

Hi Thorham,

You can save 4 cycles for every byte in the source block by dispensing with the d4 string length counter, and calculating the length when you reach ".eos". The longer the strings are in the source block, the more efficient it will become.
Smaller strings (3chars or less including separator byte) will incur a penalty on the original routine.

I don't actually think the code works correctly by the way.

The bucketlink table (when built) has a longword of 0 between each pointer.
It only works in this instance because the source text doesn't actually have any strings which are *DIFFERENT* but resolve to the same has code. (I tried it

)

If you had a source file with *lots* of strings which resolved to the same hashcode, you could save lots of time on the bucket comparison by saving the string length along with the pointer in the bucket table and comparing that before embarking on the lenghty byte comparison.

Hope this helps,
Red

Amended code snippets:

Quote:

Originally Posted by Thorham

Code:

 
.lp
    move.b    (a1)+,d2    ;Get char from data.
    cmp.b    d1,d2        ;Test for end of string.
    beq    .eos
 
;    addq.l    #1,d4        ;Increase string length. 
 
    rol.w    #4,d3        ;Simple hash function.
    add.w    d2,d3
 
    bra    .lp        ;Get next char from data.
.eos
    move.l a1,d4  ;calc string length
    subq #1,d4
    sub.l a3,d4

    cmp.l    d5,d4        ;If string length is zero,
    beq    .eos_nh        ;then get next string.
 
    move.l    (a0,d3.l*4),a5    ;Get bucket.
 
    cmp.l    d5,a5        ;Check empty bucket.
    beq    .new_bucket
 
    move.l    a3,d6        ;Save current string.
.lp3
    move.l    (a5)+,a6    ;Get string from bucket.
; ******* should addq #4 to a5 here to work correctly ?!?
.lp2
    move.b    (a3)+,d2    ;Compare strings.
    cmp.b    (a6)+,d2
    beq    .cmpstr
 
    move.l    d6,a3        ;If strings don't match
    move.l    a5,a6        ;get next string in the bucket.
    move.l    (a5),a5
    cmp.l    d5,a5
    bne    .lp3
 
    move.l    a4,(a6)        ;If no more strings in the bucket
    move.l    a3,(a4)        ;then add current string to the bucket.
    addq.l    #8,a4
    bra    .eos_nh
.cmpstr
    cmp.b    d1,d2        ;2nd part of compare string. Tests if end of
    bne    .lp2        ;both strings reached. If so, strings are the
    bra    .eos_nh        ;same and next string in data can be searched.
                ;Otherwise compare next chars in both string.
.new_bucket
    move.l    a4,(a0,d3.l*4)    ;Write bucket link to hash table.
    move.l    a3,(a4)        ;Write string adress to bucket link.
    addq.l    #8,a4        ;Point to new bucket link.
    bra    .eos_nh
.eof
    movem.l    (sp)+,d0-a6
    rts

Thorham · 27 May 2008, 20:52

Quote:

Originally Posted by RedskullDC

Hi Thorham,

You can save 4 cycles for every byte in the source block by dispensing with the d4 string length counter, and calculating the length when you reach ".eos". The longer the strings are in the source block, the more efficient it will become.

Smaller strings (3chars or less including separator byte) will incur a penalty on the original routine

Thanks for pointing that one out, mate

Your optimization is a good idea, and can actually be optimized further like so:

Code:

.eos
	move.l	a1,d4		;calc string length
	sub.l	a3,d4
	subq.l	#1,d4
	beq	.eos_nh

And that penalty isn't really that large

Quote:

Originally Posted by RedskullDC

I don't actually think the code works correctly by the way.

The bucketlink table (when built) has a longword of 0 between each pointer.
It only works in this instance because the source text doesn't actually have any strings which are *DIFFERENT* but resolve to the same has code. (I tried it

)

Oh no, if you're right then I

and

Anyway, are you sure? Have you tried it with the linked text file? I'm just asking, because I can't replicate the bug, and get a whole bunch of links in the bucket link table, not just string pointers, witch lead me to believe the code is correct. I must admit that I haven't tested the code thoroughly, though. Time for some checking!

Edited: I think know what you mean now, I missed it in your snippet:

Code:

.lp3
	move.l	(a5)+,a6	;Get string from bucket.

; ******* should addq #4 to a5 here to work correctly ?!?

.lp2
	move.b	(a3)+,d2	;Compare strings.
	cmp.b	(a6)+,d2
	beq	.cmpstr

	move.l	d6,a3		;If strings don't match
	move.l	a5,a6		;get next string in the bucket.
	move.l	(a5),a5
	cmp.l	d5,a5
	bne	.lp3

The auto incremented a5 is used, because each bucket link table entry is two longwords: string ptr, link ptr. By autoincing (that's not a word

) a5 will point to the link ptr, which you can see being used a little bit further down. The remark that says 'get next string in the bucket' is just very misplaced

and a5 is actually just saved in a6 for use if bne .lp3 is not taken.

Quote:

Originally Posted by RedskullDC

If you had a source file with *lots* of strings which resolved to the same hashcode, you could save lots of time on the bucket comparison by saving the string length along with the pointer in the bucket table and comparing that before embarking on the lenghty byte comparison.

Yep, that's a problem, alright. Of course it's vital to use a good hash algorithm, and the one I'm using is just something that I've 'invented' (read: messed around to find it) purely for speed, and I don't have a clue about how good it is at evenly distributing the strings, so that's certainly a good suggestion. Since there's more then enough cache space left, it's reather easy to implement properly, too

Quote:

Originally Posted by RedskullDC

Hope this helps,
Red

Any suggestions are appreciated RedskullDC

RedskullDC · 28 May 2008, 11:48

Well that was more fun than a daily Sudoku!

Anyone got any more to look at???

Red

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
emulators on 680x0 amiga	fishyfish	Retrogaming General Discussion	7	18 January 2012 14:37
Optimizing question: instruction order	TheDarkCoder	Coders. Asm / Hardware	9	29 October 2011 17:07
680x0 to 68000	Counia	Hardware mods	1	01 March 2011 10:18
Free : 680x0 CPU's	alexh	MarketPlace	23	28 November 2009 12:29
Benching and optimizing CF-IDE speed	Photon	support.Hardware	12	15 July 2009 01:48

28 May 2008, 11:48	#6
RedskullDC Digital Corruption Join Date: Jan 2007 Location: Dorrigo/Australia Age: 60 Posts: 355	Well that was more fun than a daily Sudoku! Anyone got any more to look at??? Red

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)