15 March 2016, 14:54 | #21 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Sorry for not having studied the code as such to know if it can be traversed backwards, but I've always been a big fan of cocktailsort instead of bubblesort. It is practically the same, but gets the outliers in place much faster.
Also, you can change bubblesort from being in-place to always writing values to a new second representation of the list which has a more constant time (does a sort like that have its own name?). Doing it with cocktailsort you can ping-pong from old to new to old again as if nothing had changed. |
15 March 2016, 15:08 | #22 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@DonScott - right. A0 needs to be +4. Sorry for confusing.
|
15 March 2016, 17:41 | #23 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,960
|
Quote:
If you exchange A1 and A2 registers, you can use movem.l a1/a2,-4(a0), but speed will be same for 68000, if i remember 68000 command timings. |
|
16 March 2016, 10:16 | #24 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Another approach . I hope that is ok.
Code:
move.w Var_NumberOfSprites(a5),d6 ; Get number of sprites in list subq.w #2,d6 ; Subtract 2 for dbf value blt.s .NoSort ; No sorting required if number of sprites is 1 or less lea Var_SpritePointerList(a5),a3 ; Get pointer to start of our sprite ptr list moveq #0,d1 ; d1 = flag (will be set to -128 if a swap occured) .SortLoopOuter move.l a3,a0 ; a0 = start of sprite ptr list move.w d6,d7 ; d7 = loop counter (number of sprites-2) move.l (a0)+,a1 ; Get first address of sprite data .SortLoopInner move.l (a0)+,a2 ; Get next address of sprite data move.w EB_YPos(a1),d0 ; Get Y position of sprite data 1 in d0 cmp.w EB_YPos(a2),d0 ; Compare Y position of sprite data 2 with Y position of sprite data 1 (in d0) ble.s .NoSwap ; Important!! Less than OR equal = No Swap.. otherwise swaps forever equal values.... move.l a1,-4(a0) ; swap move.l a2,-8(a0) move.l a1,a2 moveq #-128,d1 ; d1 = -128.. There has been a swap .NoSwap move.l a2,a1 dbf d7,.SortLoopInner ; Loop through all sprite pointers add.b d1,d1 ; Check the d1 flag bcs.s .SortLoopOuter ; If flag is not 0, there was a swap in the last pass, so need to pass through again .NoSort |
16 March 2016, 10:19 | #25 |
Registered User
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
|
Interesting thread, hope you don´t mind if I follow. I´m currently working on a shoot´em-up called RESHOOT. See here:
https://www.patreon.com/posts/reshoot-on-a1200-4837606 Some of its gamedesign is based on putting as many objects on screen as possible, therefore I can totally immersive myself into your thoughts. As the game is targeting AGA-machines, I decided to go for different solutions than what you discuss here. Reason: The A1200 is sometimes faster in calculating stuff internally than reading values from a table. The base of my y-sorting is a large chunk of memory - lets call it coordbuffer. Each frame, the object animation code takes the y-coord of a sprite and uses it as a pointer onto an address within this coordbuffer, then puts the sprite coords there. Simplified pseudocode would be like this: move.l coordbuffer,a0 move howManySprites,d6 .loop move.l spritecoords,d7 move.l d7,(a0,d7*8) dbra d6,.loop The actual spritedrawing code then loops through coordbuffer, reads (coords), clears (coords) and builds up a dma-list using movem.l commands. I tested several approaches of y-sorting. In the case of RESHOOT, as the sprite engine is used in a rather uncontrollable environment of a game, this solution in most cases was a considerably faster than sorting a coord-list. |
16 March 2016, 10:45 | #26 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Ah, like some kind of bucket-sort almost...
I did think about this, but for me (especially on the A500) ,the overhead of crawling through the entire buffer, could add a lot of cycles to the code. Currently, my sort takes between 4 to 8 raster lines on an A500 for 32 sprites, depending on how unsorted the list becomes from frame to frame (generally not too unsorted as things don't move THAT fast) My building of sprite DMA lists takes approx 12 raster lines (this includes movement update, as I can update the positions, write them back, and then have them in registers still ready for insert) for 32 sprites, and my culling (sprites out of screen bounds check, and removal from list) is another 4 or so lines... |
16 March 2016, 10:48 | #27 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
-----------------------
|
16 March 2016, 10:49 | #28 |
Registered User
Join Date: Oct 2015
Location: France
Posts: 82
|
@buzzybee : nice idea, how do you manage several objects with the same y coord for a frame ?
|
16 March 2016, 10:52 | #29 | |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Quote:
|
|
16 March 2016, 11:36 | #30 |
Registered User
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
|
@majikeyric: I scale the pointer by a factor of 4, so 4 sprites can be on the same line. Plus, the code checks if these four "slots" in spritebuffer are already occupied before writing coords. In yes-case, the code does a litte search for the best available slot; in most cases by adding y+1 and skipping the uppermost line of the sprite in the drawing code.
Also, the code reminds itself of the uppermost and lowermost sprite coords, and searches the buffer only within these y-coordinates / y-slots. Therefore not the whole buffer needs to be searched through every time; only the area between the lowest and highest sprite. This saves a lot of memory access. |
16 March 2016, 14:38 | #31 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Another one approach.
Code:
move.w Var_NumberOfSprites(a5),d6 ; Get number of sprites in list subq.w #2,d6 ; Subtract 2 for dbf value blt.s .NoSort ; No sorting required if number of sprites is 1 or less lea Var_SpritePointerList(a5),a3 ; Get pointer to start of our sprite ptr list moveq #0,d1 ; d1 = flag (will be set to -128 if a swap occured) .SortLoopOuter move.l a3,a0 ; a0 = start of sprite ptr list move.w d6,d7 ; d7 = loop counter (number of sprites-2) move.l (a0)+,a2 ; Get first address of sprite data .SortLoopInner move.l a2,a1 move.l (a0)+,a2 ; Get next address of sprite data move.w EB_YPos(a1),d0 ; Get Y position of sprite data 1 in d0 cmp.w EB_YPos(a2),d0 ; Compare Y position of sprite data 2 with Y position of sprite data 1 (in d0) dbge.b d6,.SortLoopInner ; Important!! Less than OR equal = No Swap.. otherwise swaps forever equal values.... bmi.b .checkFlag move.l a1,-4(a0) ; swap move.l a2,-8(a0) move.l a1,a2 moveq #-128,d1 ; d1 = -128.. There has been a swap subq.w #1,d6 bpl.b .SortLoopInner .checkFlag add.b d1,d1 ; Check the d1 flag bcs.s .SortLoopOuter ; If flag is not 0, there was a swap in the last pass, so need to pass through again .NoSort Do you have any statistics how many swaps (min - max) usually are done ? And What is amount of OuterLoop (min - max). Then perhaps another sort method would be better. |
16 March 2016, 16:22 | #32 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
No statistics yet, I am finding the winUAE debugger a little clunky to use at the moment.
I need to code up an on-screen text overlay for debugging. Right now there is an error in my plexing code somewhere, with some sprites "flickering" at the top of the screen, so I am going to try and track down that bug before I look to implement any more optimisation that could make the code harder to follow However, your latest should speed things up a bit more |
16 March 2016, 17:37 | #33 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Ok, bug fixed.. was not an actual bug in the code, but was a problem with setting a long-word constant in the assembler
back to optimising... |
16 March 2016, 17:39 | #34 | |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Quote:
I think really, there would normally be a maximum of 1 or 2 per frame at most... only when a sprite passes another sprite. Due to the nature of the way I will use this, most will be travelling in a downward direction on the screen... unless fired toward the player from the bottom.. but even so, I imagine a minimum number of swaps per update.. however, always good to have a system that can cope with any situation when sprites are "triggered" and inserted, they are inserted at an already sorted position |
|
13 April 2016, 20:39 | #35 |
Registered User
Join Date: Jan 2015
Location: London/UK
Posts: 227
|
Glad to see you coding again DanScott, i am sure you will do a very nice game.
As Chuck Rock 2, I hope you add balls tentacles and complete ingame soundtrack (MUSIC+SFX), I am sure the musicians will help you. |
04 May 2016, 14:33 | #36 | |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Quote:
Code:
add.w #$80,d0 lsr.w #1,d0 swap d0 clr.w d0 addx.w d0,d0 There is a possibility to save 4 cycles more if we assume that high word of D0 contains zero, then you can remove clr.w d0. |
|
04 May 2016, 20:16 | #37 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
if I can save a couple of cycles per sprite, by having the $80 (or whatever x offset add) in a data reg, then that is cool... especially when there are 40 or 50 HW sprites to update each frame... every cycle counts and freeing 1.5k or so, of table is good too
|
05 May 2016, 23:57 | #38 |
Registered User
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 40
|
If you change the or.l into add.l (the bits don't overlap anyway), then you can bake the $80 into the table (by adding $400000) and save the add completely.
|
06 May 2016, 08:46 | #39 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Really nice optimization. Now If my calculation are correct, routine without beginning lea will take 48 cycles on 68000.
Another idea to save next 8 cycles is to remove 2* add.w d1,d1. Mean instead of storing just y then store y*4. Code:
lea HW_SpriteYTable,a1 lsr.w #1,d0 swap d0 clr.w d0 addx.w d0,d0 add.l (a1,d1.w),d0 |
06 May 2016, 20:14 | #40 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
wow this is great some nice thinking there Blueberry & Asman
I am (coincidently) already storing my positions << 2, because I am using a 2 bit fractional value. I will need to AND out those 2 bits I will look at working this magic into what I already have |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Translating sprite coordinates to playfield coordinates | nandius_c | Coders. Asm / Hardware | 15 | 31 July 2014 23:17 |
32 and 64 bit sprite control words question | FrenchShark | Coders. General | 8 | 10 January 2008 02:32 |
Cpu problems...in other words, BRAIN | oldpx | support.Hardware | 2 | 24 March 2003 22:01 |
Swear words | Kodoichi | project.EAB | 19 | 14 December 2001 00:53 |
|
|