18 October 2022, 23:48 | #1 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Fast sort for small array of words
Does anyone have suggestions for a fast routine to sort a small array of words?
I've implemented an insertion sort and it does the job but it's loopy and it feels like there should be a better way. My specific requirements are for between 1 and 8 values, all 16bit words? And specifically for a standard A500. |
19 October 2022, 00:07 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Did you consider using several routines, one each for lengths e.g. 1-4 (fully branched out) and a 'generic' one, whichever it would up end being, for 5+?
Value range is too big for some kind of counting, up to 8 values is too small for radix. Insertion feels quite natural ><... Any possibility of incrementally updating the "old list", from the previous frame, if that's how it works? |
19 October 2022, 00:21 | #3 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
It doesn't have the incremental possibility but I've used that in the past.
I had started with some hardcoded routines for each size by learning about sorting-networks and using that approach. However, I think once the size was above 4 (or maybe 5 as you say) it seemed like it was going to end up worse. I decided it wasn't worth the trouble but now I'm trying to claw back some cycles wherever I can. Anyway I think I basically followed the same thought process as you before arriving at what I have now. |
19 October 2022, 00:51 | #4 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Hmm... one other thing I would consider is building a binary tree (in a static array with: 1 word data, 1 byte left idx*4, 1 byte right idx*4) instead of a list if there are spare registers. It's maybe, maybe not... don't have to copy stuff around but there is extra logic so who knows.
Other than that, try to optimize the existing code. Unrolls, more unrolls, test branches for taken/not taken frequency, ... |
19 October 2022, 05:40 | #5 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
For what it's worth I'll post my insertion sort code and see if anyone feels like improving it:
Code:
| a0 = out array | a1 = in array | d0 = count SortValues: move.w (a1)+,(a0) subq.w #1,d0 beq.b end add.w d0,d0 moveq #2,d1 loop1: move.w (a1)+,d3 move.w d1,d2 move.w (-2,a0,d2.w),d4 loop2: cmp.w d4,d3 bcc.b break move.w d4,(a0,d2.w) subq.w #2,d2 ble.b break move.w (-2,a0,d2.w),d4 bra.b loop2 break: move.w d3,(a0,d2.w) addq.w #2,d1 cmp.w d1,d0 bge.b loop1 end: rts |
19 October 2022, 06:54 | #6 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Without going too much into the code for now (bed time)... You are sorting high to low, right?
If you add a 0 guard at the end of the output array (when count-1 > 0), you can have only 1 cmp/break in the inner loop (d3 vs. d4, and any number is >= 0 guard), and then you can unroll it 7 times because you know there is 8 numbers at the most. Just a general idea, something like: Code:
loop2: REPT 7 cmp.w d4,d3 bls.b break move.w d4,(a0,d2.w) subq.w #2,d2 move.w (-2,a0,d2.w),d4 ENDR break: |
19 October 2022, 10:07 | #7 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@Jobbo
I have some questions. Can you say more about this words values ? Mean they are random for every call of sort routine. Or perhaps there is specific order of this values in array. Are you pretty sure about sorting this values ? There is no other way, like insert value in right order to the array. |
19 October 2022, 10:38 | #8 | |
Registered User
Join Date: Aug 2018
Location: Untergrund/Germany
Posts: 408
|
Quote:
If you need it for gfx objects you could use Ocean sort. For A500 (and C64) a good choice. |
|
19 October 2022, 14:30 | #9 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Quote:
I'm sorting low to high. I'll see where I can get unrolling that innerloop and improving the compares. Should be a nice easy change. |
|
19 October 2022, 14:36 | #10 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Quote:
What I'm actually doing is taking the next n values from a Van der Corput sequence table and then sorting those n values. |
|
19 October 2022, 14:41 | #11 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Quote:
This is a nice resource that I found useful: http://selmiak.bplaced.net/games/c64...e-Multiplexing |
|
19 October 2022, 14:43 | #12 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Quote:
I just need 8 tables of n*m entries where n=288 and m=1 to 8. |
|
19 October 2022, 14:48 | #13 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Put a guard with min or max value just before the output table, and try this:
Code:
lea (In,pc),a1 lea (Out,pc),a0 moveq #8,d0 Sort move.w (a1)+,(a0)+ subq.w #1+1,d0 ; dbf adjust blt.b .Done .Outer move.w (a1)+,d1 move.l a0,a2 REPT 7 move.w -(a2),d2 cmp.w d2,d1 ; bls.b .Insert ; descending bhs.b .Insert ; ascending move.w d2,(2,a2) ENDR .InsertTop move.w d1,(a2) rts .Insert move.w d1,(2,a2) addq.l #2,a0 dbf d0,.Outer .Done rts ; DC.W $ffff ; guard (descending) DC.W 0 ; guard (ascending) Out DS.W 8 In DC.W 2,7,10,5,25,11,19,33 Last edited by a/b; 19 October 2022 at 17:26. Reason: 8th iteration can be further optimized, ascending/descending |
19 October 2022, 15:01 | #14 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
|
1-8 values sounds like a good fit for the data registers, so how about this beauty?
Code:
insertimpl macro r1 set \1 r2 set r1-1 rept r1 cmp.w d\<r1>,d\<r2> bls.b .l\<r> exg d\<r1>,d\<r2> r1 set r1-1 r2 set r2-1 endr .l\<r>: endm loadimpl macro r1 set \1-1 if (\1>4) movem.w (a0)+,d0-d\<r1> else r1 set 0 rept \1 move.w (a0)+,d\<r1> r1 set r1+1 endr endc endm storeimpl macro r1 set \1-1 if (\1>4) movem.w d0-d\<r1>,(a1) else r1 set 0 rept \1 move.w d\<r1>,(a1)+ r1 set r1+1 endr endc endm sortimpl macro loadimpl \1 r set 1 rept \1-1 insertimpl r r set r+1 endr storeimpl \1 rts endm sort: add.w d0,d0 move.w .jumptab-2(pc,d0.w),d0 jmp .jumptab(pc,d0.w) .jumptab: dc.w sort1-.jumptab dc.w sort2-.jumptab dc.w sort3-.jumptab dc.w sort4-.jumptab dc.w sort5-.jumptab dc.w sort6-.jumptab dc.w sort7-.jumptab dc.w sort8-.jumptab sort1: sortimpl 1 sort2: sortimpl 2 sort3: sortimpl 3 sort4: sortimpl 4 sort5: sortimpl 5 sort6: sortimpl 6 sort7: sortimpl 7 sort8: sortimpl 8 |
19 October 2022, 15:13 | #15 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Haha, someone actually did it :P. Yeah, I mentioned that earlier, but he said that after 4-5 it's seemed to get worse. For 1-4 it seems like a good idea. I mean, with regs it will be better but at 7-8 entries.. I guess test and see ;P.
|
19 October 2022, 15:29 | #16 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
|
Quote:
Though that was talking about doing it optimally or closer to it. E.g. for 4 elements only 5 compares are necessary while insertion sort could use 6, but it gets quite involved quickly. But anyway, wasn't a too serious suggestion, but now it can be compared against the others |
|
20 October 2022, 13:54 | #17 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Did a speed test of those three routines, two runs each:
-full, generate/sort/verify -partial, generate/verify (output was all 0 so verify went through w/o errors) The reason being two routines required me to stack some registers (this explains different partial times you see below), so I subtracted the partial time to make it more fair. 50,000 iterations, random amount 1-8, each value 16-bit random, the same seed for all runs, stock A500. Here're the numbers: - original routine: 7.8240 - 3.6057 = 4.2183 sec - improved version: 6.8224 - 3.4938 = 3.3286 sec - macro version: 5.8578 - 4.0856 = 1.7722 sec Pretty good macro ;P. If you can spare extra ~600 bytes, it's definitely a way to go. |
20 October 2022, 13:58 | #18 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 389
|
Thanks for running the tests! Nice to see a clear winner.
For my particular routine I can pre sort but this is good info for the future. And always interesting to get a good exchange going on here! |
20 October 2022, 18:33 | #19 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,105
|
Yeah, thanks for testing a/b, and good luck with the project Jobbo.
I know it's not needed, but I couldn't resist trying to improve a bit. Think this should be faster and smaller code (but haven't checked). Sort 1-4 can be done with simple compare and swap, but 5-8 need merging. Only did it for 8 here (and not really tested) but the other 3 cases should be simple enough and can reuse the merge code. E.g. 7: Code:
sort7: ; assume values are loaded into d0-d3,d5-d7 sort4 0,1,2,3 sort3 5,6,7 bra merge43 Code:
order macro r1 set \1 r2 set \2 cmp.w d\<r2>,d\<r1> bls.b .\@ exg d\<r1>,d\<r2> .\@: endm sort2 macro order \1,\2 endm sort3 macro order \1,\2 order \2,\3 order \1,\2 endm sort4 macro order \1,\2 order \3,\4 order \2,\4 order \1,\3 order \2,\3 endm merge macro x set \1 y set \2 x1 set \1-1 y1 set \2-1 rx set 4-\1 ry set 8-\2 merge\<x>\<y>: cmp.w d\<ry>,d\<rx> bls.b .x\@ move.w d\<ry>,(a1)+ bra merge\<x>\<y1> .x\@: move.w d\<rx>,(a1)+ bra merge\<x1>\<y> endm sort8: movem.w (a0)+,d0-d7 sort4 0,1,2,3 sort4 4,5,6,7 ; start with merge44 ycnt set 4 rept 4 xcnt set 4 rept 4 merge xcnt,ycnt xcnt set xcnt-1 endr ycnt set ycnt-1 endr merge40: move.w d0,(a1)+ merge30: move.w d1,(a1)+ merge20: move.w d2,(a1)+ merge10: move.w d3,(a1)+ rts merge04: move.w d4,(a1)+ merge03: move.w d5,(a1)+ merge02: move.w d6,(a1)+ merge01: move.w d7,(a1)+ rts |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Saving an array to File | LeCaravage | Coders. Blitz Basic | 4 | 14 February 2021 16:37 |
68040 const array init | phx | Coders. Asm / Hardware | 6 | 14 February 2020 09:27 |
Creating an array gives me a Guru | Shatterhand | Coders. Blitz Basic | 14 | 13 August 2019 20:54 |
Fast way of converting screen coordinates to HW Sprite control words | DanScott | Coders. Asm / Hardware | 44 | 06 June 2018 14:54 |
Blitz2: Pointer to array | idrougge | Coders. Blitz Basic | 3 | 26 March 2015 21:44 |
|
|