English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 15 March 2016, 14:54   #21
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Sorry for not having studied the code as such to know if it can be traversed backwards, but I've always been a big fan of cocktailsort instead of bubblesort. It is practically the same, but gets the outliers in place much faster.

Also, you can change bubblesort from being in-place to always writing values to a new second representation of the list which has a more constant time (does a sort like that have its own name?). Doing it with cocktailsort you can ping-pong from old to new to old again as if nothing had changed.
NorthWay is offline  
Old 15 March 2016, 15:08   #22
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@DonScott - right. A0 needs to be +4. Sorry for confusing.
Asman is offline  
Old 15 March 2016, 17:41   #23
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,960
Quote:
Originally Posted by DanScott View Post
No sure if that change in #18 will work. the sub from d0, will only test for the last sub in the d7 dbf loop

Won't the (A0)+,(A0)+ then -(A0),-(A0) bring A0 back to it's original value.. A0 needs to be +4 for the next iteration

Yes, if there is only one swap, it will pass through again. This is why perhaps an insertion sort would be better.. it can do it in one pass.. but there are extra overheads in the insert upwards routine
Right, I seen this a few minutes after.
If you exchange A1 and A2 registers, you can use movem.l a1/a2,-4(a0), but speed will be same for 68000, if i remember 68000 command timings.
Don_Adan is offline  
Old 16 March 2016, 10:16   #24
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Another approach . I hope that is ok.
Code:
		move.w	Var_NumberOfSprites(a5),d6	; Get number of sprites in list
		subq.w	#2,d6				; Subtract 2 for dbf value
		blt.s	.NoSort				; No sorting required if number of sprites is 1 or less 

		lea	Var_SpritePointerList(a5),a3	; Get pointer to start of our sprite ptr list
		moveq	#0,d1				; d1 = flag (will be set to -128 if a swap occured)
.SortLoopOuter
		move.l	a3,a0				; a0 = start of sprite ptr list
		move.w	d6,d7				; d7 = loop counter (number of sprites-2)
		move.l	(a0)+,a1			; Get first address of sprite data
.SortLoopInner
		move.l	(a0)+,a2			; Get next address of sprite data
		move.w	EB_YPos(a1),d0			; Get Y position of sprite data 1 in d0
		cmp.w	EB_YPos(a2),d0			; Compare Y position of sprite data 2 with Y position of sprite data 1 (in d0)
		ble.s	.NoSwap				; Important!! Less than OR equal = No Swap.. otherwise swaps forever equal values....
		move.l	a1,-4(a0)			; swap
		move.l	a2,-8(a0)
		move.l	a1,a2
		moveq	#-128,d1			; d1 = -128.. There has been a swap
.NoSwap
		move.l	a2,a1
		dbf	d7,.SortLoopInner		; Loop through all sprite pointers
		add.b 	d1,d1				; Check the d1 flag
		bcs.s	.SortLoopOuter			; If flag is not 0, there was a swap in the last pass, so need to pass through again
.NoSort
Less read access but swap takes more cycles. If we assume that I didn't make a mistake.
Asman is offline  
Old 16 March 2016, 10:19   #25
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Interesting thread, hope you don´t mind if I follow. I´m currently working on a shoot´em-up called RESHOOT. See here:

https://www.patreon.com/posts/reshoot-on-a1200-4837606

Some of its gamedesign is based on putting as many objects on screen as possible, therefore I can totally immersive myself into your thoughts. As the game is targeting AGA-machines, I decided to go for different solutions than what you discuss here. Reason: The A1200 is sometimes faster in calculating stuff internally than reading values from a table.

The base of my y-sorting is a large chunk of memory - lets call it coordbuffer. Each frame, the object animation code takes the y-coord of a sprite and uses it as a pointer onto an address within this coordbuffer, then puts the sprite coords there. Simplified pseudocode would be like this:

move.l coordbuffer,a0
move howManySprites,d6
.loop
move.l spritecoords,d7
move.l d7,(a0,d7*8)
dbra d6,.loop

The actual spritedrawing code then loops through coordbuffer, reads (coords), clears (coords) and builds up a dma-list using movem.l commands.

I tested several approaches of y-sorting. In the case of RESHOOT, as the sprite engine is used in a rather uncontrollable environment of a game, this solution in most cases was a considerably faster than sorting a coord-list.
buzzybee is offline  
Old 16 March 2016, 10:45   #26
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Ah, like some kind of bucket-sort almost...

I did think about this, but for me (especially on the A500) ,the overhead of crawling through the entire buffer, could add a lot of cycles to the code.

Currently, my sort takes between 4 to 8 raster lines on an A500 for 32 sprites, depending on how unsorted the list becomes from frame to frame (generally not too unsorted as things don't move THAT fast)

My building of sprite DMA lists takes approx 12 raster lines (this includes movement update, as I can update the positions, write them back, and then have them in registers still ready for insert) for 32 sprites, and my culling (sprites out of screen bounds check, and removal from list) is another 4 or so lines...
DanScott is offline  
Old 16 March 2016, 10:48   #27
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
-----------------------
DanScott is offline  
Old 16 March 2016, 10:49   #28
majikeyric
Registered User
 
majikeyric's Avatar
 
Join Date: Oct 2015
Location: France
Posts: 82
@buzzybee : nice idea, how do you manage several objects with the same y coord for a frame ?
majikeyric is offline  
Old 16 March 2016, 10:52   #29
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Quote:
Originally Posted by Asman View Post
Another approach . I hope that is ok.
Code:
		move.w	Var_NumberOfSprites(a5),d6	; Get number of sprites in list
		subq.w	#2,d6				; Subtract 2 for dbf value
		blt.s	.NoSort				; No sorting required if number of sprites is 1 or less 

		lea	Var_SpritePointerList(a5),a3	; Get pointer to start of our sprite ptr list
		moveq	#0,d1				; d1 = flag (will be set to -128 if a swap occured)
.SortLoopOuter
		move.l	a3,a0				; a0 = start of sprite ptr list
		move.w	d6,d7				; d7 = loop counter (number of sprites-2)
		move.l	(a0)+,a1			; Get first address of sprite data
.SortLoopInner
		move.l	(a0)+,a2			; Get next address of sprite data
		move.w	EB_YPos(a1),d0			; Get Y position of sprite data 1 in d0
		cmp.w	EB_YPos(a2),d0			; Compare Y position of sprite data 2 with Y position of sprite data 1 (in d0)
		ble.s	.NoSwap				; Important!! Less than OR equal = No Swap.. otherwise swaps forever equal values....
		move.l	a1,-4(a0)			; swap
		move.l	a2,-8(a0)
		move.l	a1,a2
		moveq	#-128,d1			; d1 = -128.. There has been a swap
.NoSwap
		move.l	a2,a1
		dbf	d7,.SortLoopInner		; Loop through all sprite pointers
		add.b 	d1,d1				; Check the d1 flag
		bcs.s	.SortLoopOuter			; If flag is not 0, there was a swap in the last pass, so need to pass through again
.NoSort
Less read access but swap takes more cycles. If we assume that I didn't make a mistake.
Will certainy try this, as it is less likely to swap due to the nature of the sprite movement
DanScott is offline  
Old 16 March 2016, 11:36   #30
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
@majikeyric: I scale the pointer by a factor of 4, so 4 sprites can be on the same line. Plus, the code checks if these four "slots" in spritebuffer are already occupied before writing coords. In yes-case, the code does a litte search for the best available slot; in most cases by adding y+1 and skipping the uppermost line of the sprite in the drawing code.

Also, the code reminds itself of the uppermost and lowermost sprite coords, and searches the buffer only within these y-coordinates / y-slots. Therefore not the whole buffer needs to be searched through every time; only the area between the lowest and highest sprite. This saves a lot of memory access.
buzzybee is offline  
Old 16 March 2016, 14:38   #31
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Another one approach.
Code:
		move.w	Var_NumberOfSprites(a5),d6	; Get number of sprites in list
		subq.w	#2,d6				; Subtract 2 for dbf value
		blt.s	.NoSort				; No sorting required if number of sprites is 1 or less 

		lea	Var_SpritePointerList(a5),a3	; Get pointer to start of our sprite ptr list
		moveq	#0,d1				; d1 = flag (will be set to -128 if a swap occured)
.SortLoopOuter
		move.l	a3,a0				; a0 = start of sprite ptr list
		move.w	d6,d7				; d7 = loop counter (number of sprites-2)
		move.l	(a0)+,a2			; Get first address of sprite data

.SortLoopInner
		move.l	a2,a1
		move.l	(a0)+,a2			; Get next address of sprite data
		move.w	EB_YPos(a1),d0			; Get Y position of sprite data 1 in d0
		cmp.w	EB_YPos(a2),d0			; Compare Y position of sprite data 2 with Y position of sprite data 1 (in d0)
		dbge.b	d6,.SortLoopInner		; Important!! Less than OR equal = No Swap.. otherwise swaps forever equal values....
		bmi.b	.checkFlag
		move.l	a1,-4(a0)			; swap
		move.l	a2,-8(a0)
		move.l	a1,a2
		moveq	#-128,d1			; d1 = -128.. There has been a swap
		subq.w	#1,d6
		bpl.b	.SortLoopInner

.checkFlag
		add.b 	d1,d1				; Check the d1 flag
		bcs.s	.SortLoopOuter			; If flag is not 0, there was a swap in the last pass, so need to pass through again
.NoSort
But in this case I'm not sure if above approach is correct. If there is no mistake then for nearly sorted list should be faster then previous one.

Do you have any statistics how many swaps (min - max) usually are done ?
And What is amount of OuterLoop (min - max). Then perhaps another sort method would be better.
Asman is offline  
Old 16 March 2016, 16:22   #32
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
No statistics yet, I am finding the winUAE debugger a little clunky to use at the moment.

I need to code up an on-screen text overlay for debugging.

Right now there is an error in my plexing code somewhere, with some sprites "flickering" at the top of the screen, so I am going to try and track down that bug before I look to implement any more optimisation that could make the code harder to follow

However, your latest should speed things up a bit more
DanScott is offline  
Old 16 March 2016, 17:37   #33
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Ok, bug fixed.. was not an actual bug in the code, but was a problem with setting a long-word constant in the assembler

back to optimising...
DanScott is offline  
Old 16 March 2016, 17:39   #34
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Quote:
Originally Posted by Asman View Post
Do you have any statistics how many swaps (min - max) usually are done ?
And What is amount of OuterLoop (min - max). Then perhaps another sort method would be better.

I think really, there would normally be a maximum of 1 or 2 per frame at most... only when a sprite passes another sprite.

Due to the nature of the way I will use this, most will be travelling in a downward direction on the screen... unless fired toward the player from the bottom.. but even so, I imagine a minimum number of swaps per update.. however, always good to have a system that can cope with any situation

when sprites are "triggered" and inserted, they are inserted at an already sorted position
DanScott is offline  
Old 13 April 2016, 20:39   #35
Toni Galvez
Registered User
 
Join Date: Jan 2015
Location: London/UK
Posts: 227
Glad to see you coding again DanScott, i am sure you will do a very nice game.

As Chuck Rock 2, I hope you add balls tentacles and complete ingame soundtrack (MUSIC+SFX), I am sure the musicians will help you.
Toni Galvez is offline  
Old 04 May 2016, 14:33   #36
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Quote:
Originally Posted by DanScott View Post
Code:
		;	d0 = Screen X , d1 = Screen Y

		lea	HW_SpriteXTable,a0
		lea	HW_SpriteYTable,a1

		add.w	d0,d0
		add.w	d0,d0
		move.l	(a0,d0.w),d0
		add.w	d1,d1
		add.w	d1,d1
		or.l	(a1,d1.w),d0
You can replace XTable stuff (mean 2*add.w and move.l - it takes 26 cycles on 68000 and 2 read), with code
Code:
    add.w   #$80,d0
    lsr.w   #1,d0
    swap    d0
    clr.w   d0
    addx.w  d0,d0
It takes 28 cycles on 68000 and no read access. But if you store $80 in other data register you can save another 4 cycles and above block of code takes 24 cycles. And you can remove whole XTable
There is a possibility to save 4 cycles more if we assume that high word of D0 contains zero, then you can remove clr.w d0.
Asman is offline  
Old 04 May 2016, 20:16   #37
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
if I can save a couple of cycles per sprite, by having the $80 (or whatever x offset add) in a data reg, then that is cool... especially when there are 40 or 50 HW sprites to update each frame... every cycle counts and freeing 1.5k or so, of table is good too
DanScott is offline  
Old 05 May 2016, 23:57   #38
Blueberry
Registered User
 
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 40
If you change the or.l into add.l (the bits don't overlap anyway), then you can bake the $80 into the table (by adding $400000) and save the add completely.
Blueberry is offline  
Old 06 May 2016, 08:46   #39
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Really nice optimization. Now If my calculation are correct, routine without beginning lea will take 48 cycles on 68000.
Another idea to save next 8 cycles is to remove 2* add.w d1,d1. Mean instead of storing just y then store y*4.
Code:
	lea	HW_SpriteYTable,a1

	lsr.w	#1,d0
	swap	d0
	clr.w	d0
	addx.w	d0,d0
	add.l	(a1,d1.w),d0
Of course you can also store x value with same way (routine will take 36 cycles instead of 40c but requires more memory reads).
Asman is offline  
Old 06 May 2016, 20:14   #40
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
wow this is great some nice thinking there Blueberry & Asman

I am (coincidently) already storing my positions << 2, because I am using a 2 bit fractional value. I will need to AND out those 2 bits

I will look at working this magic into what I already have
DanScott is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Translating sprite coordinates to playfield coordinates nandius_c Coders. Asm / Hardware 15 31 July 2014 23:17
32 and 64 bit sprite control words question FrenchShark Coders. General 8 10 January 2008 02:32
Cpu problems...in other words, BRAIN oldpx support.Hardware 2 24 March 2003 22:01
Swear words Kodoichi project.EAB 19 14 December 2001 00:53

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:50.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.27906 seconds with 13 queries