English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 18 October 2022, 23:48   #1
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Fast sort for small array of words

Does anyone have suggestions for a fast routine to sort a small array of words?

I've implemented an insertion sort and it does the job but it's loopy and it feels like there should be a better way.

My specific requirements are for between 1 and 8 values, all 16bit words?

And specifically for a standard A500.
Jobbo is offline  
Old 19 October 2022, 00:07   #2
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Did you consider using several routines, one each for lengths e.g. 1-4 (fully branched out) and a 'generic' one, whichever it would up end being, for 5+?
Value range is too big for some kind of counting, up to 8 values is too small for radix. Insertion feels quite natural ><...
Any possibility of incrementally updating the "old list", from the previous frame, if that's how it works?
a/b is offline  
Old 19 October 2022, 00:21   #3
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
It doesn't have the incremental possibility but I've used that in the past.

I had started with some hardcoded routines for each size by learning about sorting-networks and using that approach.

However, I think once the size was above 4 (or maybe 5 as you say) it seemed like it was going to end up worse.

I decided it wasn't worth the trouble but now I'm trying to claw back some cycles wherever I can.

Anyway I think I basically followed the same thought process as you before arriving at what I have now.
Jobbo is offline  
Old 19 October 2022, 00:51   #4
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Hmm... one other thing I would consider is building a binary tree (in a static array with: 1 word data, 1 byte left idx*4, 1 byte right idx*4) instead of a list if there are spare registers. It's maybe, maybe not... don't have to copy stuff around but there is extra logic so who knows.
Other than that, try to optimize the existing code. Unrolls, more unrolls, test branches for taken/not taken frequency, ...
a/b is offline  
Old 19 October 2022, 05:40   #5
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
For what it's worth I'll post my insertion sort code and see if anyone feels like improving it:

Code:
|	a0 = out array
|	a1 = in array
|	d0 = count

SortValues:
		move.w	(a1)+,(a0)

		subq.w	#1,d0
		beq.b	end

		add.w	d0,d0

		moveq	#2,d1
loop1:
		move.w	(a1)+,d3

		move.w	d1,d2

		move.w	(-2,a0,d2.w),d4

loop2:
		cmp.w	d4,d3
		bcc.b	break

		move.w	d4,(a0,d2.w)
		subq.w	#2,d2
		ble.b	break

		move.w	(-2,a0,d2.w),d4

		bra.b	loop2

break:
		move.w	d3,(a0,d2.w)

		addq.w	#2,d1
		cmp.w	d1,d0
		bge.b	loop1

end:
		rts
Jobbo is offline  
Old 19 October 2022, 06:54   #6
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Without going too much into the code for now (bed time)... You are sorting high to low, right?
If you add a 0 guard at the end of the output array (when count-1 > 0), you can have only 1 cmp/break in the inner loop (d3 vs. d4, and any number is >= 0 guard), and then you can unroll it 7 times because you know there is 8 numbers at the most. Just a general idea, something like:
Code:
loop2:	REPT	7
		cmp.w	d4,d3
		bls.b	break
		move.w	d4,(a0,d2.w)
		subq.w	#2,d2
		move.w	(-2,a0,d2.w),d4
	ENDR
break:
Another step would be to try to get rid of the index mode (+6 cycles per) and use predecrements. Tomorrow...
a/b is offline  
Old 19 October 2022, 10:07   #7
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@Jobbo
I have some questions.
Can you say more about this words values ? Mean they are random for every call of sort routine. Or perhaps there is specific order of this values in array.
Are you pretty sure about sorting this values ? There is no other way, like insert value in right order to the array.
Asman is offline  
Old 19 October 2022, 10:38   #8
pink^abyss
Registered User
 
Join Date: Aug 2018
Location: Untergrund/Germany
Posts: 408
Quote:
Originally Posted by Jobbo View Post
Does anyone have suggestions for a fast routine to sort a small array of words?

I've implemented an insertion sort and it does the job but it's loopy and it feels like there should be a better way.

My specific requirements are for between 1 and 8 values, all 16bit words?

And specifically for a standard A500.

If you need it for gfx objects you could use Ocean sort. For A500 (and C64) a good choice.
pink^abyss is offline  
Old 19 October 2022, 14:30   #9
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Quote:
Originally Posted by a/b View Post
Without going too much into the code for now (bed time)... You are sorting high to low, right?
If you add a 0 guard at the end of the output array ...

I'm sorting low to high.

I'll see where I can get unrolling that innerloop and improving the compares.

Should be a nice easy change.
Jobbo is offline  
Old 19 October 2022, 14:36   #10
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Quote:
Originally Posted by Asman View Post
@Jobbo
I have some questions.
Can you say more about this words values ? Mean they are random for every call of sort routine. Or perhaps there is specific order of this values in array.
Are you pretty sure about sorting this values ? There is no other way, like insert value in right order to the array.
The values are current in the range 0 to 287.

What I'm actually doing is taking the next n values from a Van der Corput sequence table and then sorting those n values.
Jobbo is offline  
Old 19 October 2022, 14:41   #11
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Quote:
Originally Posted by pink^abyss View Post
If you need it for gfx objects you could use Ocean sort. For A500 (and C64) a good choice.
As I understand it the Ocean sort is just an insertion sort which is what I'm doing. But I don't have the advantage that my values carry from frame to frame and are mostly sorted already.

This is a nice resource that I found useful: http://selmiak.bplaced.net/games/c64...e-Multiplexing
Jobbo is offline  
Old 19 October 2022, 14:43   #12
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Quote:
Originally Posted by Jobbo View Post
The values are current in the range 0 to 287.

What I'm actually doing is taking the next n values from a Van der Corput sequence table and then sorting those n values.
It occurs to me now that I might as well pre-sort all the possible sequences since it won't take up very much ram.

I just need 8 tables of n*m entries where n=288 and m=1 to 8.
Jobbo is offline  
Old 19 October 2022, 14:48   #13
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Put a guard with min or max value just before the output table, and try this:

Code:
	lea	(In,pc),a1
	lea	(Out,pc),a0
	moveq	#8,d0

Sort	move.w	(a1)+,(a0)+
	subq.w	#1+1,d0			; dbf adjust
	blt.b	.Done

.Outer	move.w	(a1)+,d1
	move.l	a0,a2
	REPT	7
		move.w	-(a2),d2
		cmp.w	d2,d1
;		bls.b	.Insert			; descending
		bhs.b	.Insert			; ascending
		move.w	d2,(2,a2)
	ENDR

.InsertTop
	move.w	d1,(a2)
	rts

.Insert	move.w	d1,(2,a2)
	addq.l	#2,a0

	dbf	d0,.Outer
.Done	rts

;	DC.W	$ffff			; guard (descending)
	DC.W	0			; guard (ascending)
Out	DS.W	8

In	DC.W	2,7,10,5,25,11,19,33
EDIT: OK, just now noticed that you said low to high. That bcc (I mistakenly interpreted as bls, not bhs) and me being too tired I guess, and I just kept going with bls ><. Fortunately, it only takes a couple of small changes to make it sort in ascending order. Sorry!

Last edited by a/b; 19 October 2022 at 17:26. Reason: 8th iteration can be further optimized, ascending/descending
a/b is offline  
Old 19 October 2022, 15:01   #14
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
1-8 values sounds like a good fit for the data registers, so how about this beauty?
Code:
insertimpl macro
r1 set \1
r2 set r1-1
        rept r1
        cmp.w   d\<r1>,d\<r2>
        bls.b   .l\<r>
        exg     d\<r1>,d\<r2>
r1 set r1-1
r2 set r2-1
        endr
.l\<r>:
        endm

loadimpl macro
r1 set \1-1
        if (\1>4)
        movem.w (a0)+,d0-d\<r1>
        else
r1 set 0
        rept \1
        move.w  (a0)+,d\<r1>
r1 set r1+1
        endr
        endc
        endm

storeimpl macro
r1 set \1-1
        if (\1>4)
        movem.w d0-d\<r1>,(a1)
        else
r1 set 0
        rept \1
        move.w  d\<r1>,(a1)+
r1 set r1+1
        endr
        endc
        endm

sortimpl macro
        loadimpl \1
r set 1
        rept \1-1
        insertimpl r
r set r+1
        endr
        storeimpl \1
        rts
        endm

sort:
        add.w   d0,d0
        move.w  .jumptab-2(pc,d0.w),d0
        jmp     .jumptab(pc,d0.w)
.jumptab:
        dc.w sort1-.jumptab
        dc.w sort2-.jumptab
        dc.w sort3-.jumptab
        dc.w sort4-.jumptab
        dc.w sort5-.jumptab
        dc.w sort6-.jumptab
        dc.w sort7-.jumptab
        dc.w sort8-.jumptab
sort1:  sortimpl 1
sort2:  sortimpl 2
sort3:  sortimpl 3
sort4:  sortimpl 4
sort5:  sortimpl 5
sort6:  sortimpl 6
sort7:  sortimpl 7
sort8:  sortimpl 8
paraj is offline  
Old 19 October 2022, 15:13   #15
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Haha, someone actually did it :P. Yeah, I mentioned that earlier, but he said that after 4-5 it's seemed to get worse. For 1-4 it seems like a good idea. I mean, with regs it will be better but at 7-8 entries.. I guess test and see ;P.
a/b is offline  
Old 19 October 2022, 15:29   #16
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
Quote:
Originally Posted by a/b View Post
Haha, someone actually did it :P. Yeah, I mentioned that earlier, but he said that after 4-5 it's seemed to get worse. For 1-4 it seems like a good idea. I mean, with regs it will be better but at 7-8 entries.. I guess test and see ;P.

Though that was talking about doing it optimally or closer to it. E.g. for 4 elements only 5 compares are necessary while insertion sort could use 6, but it gets quite involved quickly. But anyway, wasn't a too serious suggestion, but now it can be compared against the others
paraj is offline  
Old 20 October 2022, 13:54   #17
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Did a speed test of those three routines, two runs each:
-full, generate/sort/verify
-partial, generate/verify (output was all 0 so verify went through w/o errors)
The reason being two routines required me to stack some registers (this explains different partial times you see below), so I subtracted the partial time to make it more fair.

50,000 iterations, random amount 1-8, each value 16-bit random, the same seed for all runs, stock A500. Here're the numbers:
- original routine: 7.8240 - 3.6057 = 4.2183 sec
- improved version: 6.8224 - 3.4938 = 3.3286 sec
- macro version: 5.8578 - 4.0856 = 1.7722 sec

Pretty good macro ;P. If you can spare extra ~600 bytes, it's definitely a way to go.
a/b is offline  
Old 20 October 2022, 13:58   #18
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 389
Thanks for running the tests! Nice to see a clear winner.

For my particular routine I can pre sort but this is good info for the future.

And always interesting to get a good exchange going on here!
Jobbo is offline  
Old 20 October 2022, 18:33   #19
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
Yeah, thanks for testing a/b, and good luck with the project Jobbo.

I know it's not needed, but I couldn't resist trying to improve a bit. Think this should be faster and smaller code (but haven't checked). Sort 1-4 can be done with simple compare and swap, but 5-8 need merging. Only did it for 8 here (and not really tested) but the other 3 cases should be simple enough and can reuse the merge code. E.g. 7:
Code:
sort7: ; assume values are loaded into d0-d3,d5-d7
        sort4 0,1,2,3
        sort3 5,6,7
        bra merge43

Code:
order   macro
r1 set \1
r2 set \2
        cmp.w   d\<r2>,d\<r1>
        bls.b   .\@
        exg     d\<r1>,d\<r2>
.\@:
        endm

sort2 macro
        order \1,\2
        endm

sort3 macro
        order \1,\2
        order \2,\3
        order \1,\2
        endm

sort4 macro
        order \1,\2
        order \3,\4
        order \2,\4
        order \1,\3
        order \2,\3
        endm

merge macro
x set \1
y set \2
x1 set \1-1
y1 set \2-1
rx set 4-\1
ry set 8-\2
merge\<x>\<y>:
        cmp.w   d\<ry>,d\<rx>
        bls.b   .x\@
        move.w  d\<ry>,(a1)+
        bra     merge\<x>\<y1>
.x\@:
        move.w  d\<rx>,(a1)+
        bra     merge\<x1>\<y>
        endm

sort8:
        movem.w (a0)+,d0-d7
        sort4 0,1,2,3
        sort4 4,5,6,7
        ; start with merge44
ycnt set 4
        rept 4
xcnt set 4
        rept 4
        merge xcnt,ycnt
xcnt set xcnt-1
        endr
ycnt set ycnt-1
        endr
merge40: move.w d0,(a1)+
merge30: move.w d1,(a1)+
merge20: move.w d2,(a1)+
merge10: move.w d3,(a1)+
        rts
merge04: move.w d4,(a1)+
merge03: move.w d5,(a1)+
merge02: move.w d6,(a1)+
merge01: move.w d7,(a1)+
        rts
paraj is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Saving an array to File LeCaravage Coders. Blitz Basic 4 14 February 2021 16:37
68040 const array init phx Coders. Asm / Hardware 6 14 February 2020 09:27
Creating an array gives me a Guru Shatterhand Coders. Blitz Basic 14 13 August 2019 20:54
Fast way of converting screen coordinates to HW Sprite control words DanScott Coders. Asm / Hardware 44 06 June 2018 14:54
Blitz2: Pointer to array idrougge Coders. Blitz Basic 3 26 March 2015 21:44

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:50.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10947 seconds with 15 queries