03 March 2018, 22:28 | #1 |
Registered User
Join Date: Jan 2014
Location: Cambs / UK
Posts: 356
|
Code optimization.
Hi All,
I have put together another simple game. I can get it to work nicely on an A1200, but it is awful slow on an A500. Just wondering if anyone with better skills than me could look to see what optimisations can be made. Not sure if I am hitting some sort of Blitz limit for moving stuff on screen but it seems unlikely, it is much more likely to be my poor skills. EDIT: I have put a self booting adf in the zone. The source code is on the disk. Cheers Gary Last edited by gazj82; 03 March 2018 at 22:40. |
03 March 2018, 22:40 | #2 |
Registered User
Join Date: Jan 2014
Location: Cambs / UK
Posts: 356
|
I have put a self booting adf in the zone. The source code is on the disk.
|
04 March 2018, 01:55 | #3 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
I've just had a quick look, and there's nothing really graphically intensive going on there so you shouldn't be hitting any limits like that. Without spending more time with it on a slow machine, my guess would be that you're just doing too many CPU-bound operations per frame. You've got lots and lots of Zone stuff going on, that all eats up CPU time with every check (and you're checking a lot with a busy zone table). A better approach might be to use some sort of simple map system, since the maze is essentially a grid anyway. Also, lots of QWrap commands within the main loop aren't a good thing - they're designed to work on quick type variables, and it might be faster to come up with your own word-based operation instead. You could use it as a macro to save repeating the code in your listing.
Also, a couple of tips: Indent all your loops and constructs. It will make the code a lot easier to read! There are block indent functions in the editor menus so you don't have to do each line by hand... Second, check out the Blitz manual, there's a piece about testing performance by poking the background colour so that you can see if a certain section is taking too long. It might come in useful, e.g. poking the background red before you test for collisions, then poking it black again. If you get a red background, then you know the collision detection is taking longer than the rest of the frame to execute. |
04 March 2018, 02:04 | #4 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
As Daedalus managed to write while I was still reading the sources, profile your code by poking a different colour to $dff180 at the beginning of each subroutine.
You can possibly save a few cycles by making some extremely similar if…endif blocks into a Select…case clause. In other spots, you first check if this > that, then immediately afterwards check if this <= that. Replace the second check with an else instead. Unrelated to optimisation, you can shorten your code by not calling Getashape tens of times at the beginning of the program. Move that code to another program and save the grabbed shapes to a file which you load or include in your main program. |
04 March 2018, 12:42 | #5 |
Registered User
Join Date: Jan 2014
Location: Cambs / UK
Posts: 356
|
I knew if anyone was going to take the time to help and painstakingly go through my disorganised code it was going to be you two. So thank you very much I do appreciate it.
Daedalus :- I learn't quite a lot using the zone stuff. I think I did ask before on the best way to implement this and a few ideas were given. This was the one I found the easiest to understand at the time. Could you explain a map system a bit more if you don't mind. As I don't quite know what that means or entails. There are a lot of qwrap commands, mostly in the counters section. Again my self taught hobby coding lets me down and I really don't know what you mean when you mention your substitute. Yep my coding structure is god awful. I really do try with my indentation at the start of the project but I soon find it becomes very messy and hard to manage. I will look through the options in the editor as I have never looked at this, hopefully it will help. The change in background is a great tip for trying to track these slowdowns. Thanks for that. idrougge :- Thanks idrougge, that background colour tip is great. I knew there could probably be some improvements made here. I just didn't know what was best. i.e I didn't know case was quicker than lots of if/endif statements. Your definitely right about better use of else. I forget this as most of my previous xp is with C64 basic where else does not exist. I have seen about saving and loading shapes in the Blitz manual and often thought I should be implementing it. I'm guessing the loading times may also be reduced against loading a whole fullscreen image too. Some of my failure to split this out is again related to my very monolithic C64 structure type where everything I did was under one listing. I must try to get away from this. |
04 March 2018, 12:47 | #6 |
Registered User
Join Date: Jan 2014
Location: Cambs / UK
Posts: 356
|
Daedulus :-
After another read I think I might know what you mean about a grid approach. Every position is 32 pixels away from another one. I don't know what exactly the number are at the moment. But it is probably something like 10 horizontal positions and 8 vertical positions. I think you mean I should store a 10x8 grid with data stored in each position of what directions are available in each. I guess this could be stored in a 4 bit word to cover all available positions. I can then lookup this table every time an enemy / player needs to make a decision about where to go next. Hopefully this would be quicker than testing a huge zone table. |
04 March 2018, 13:33 | #7 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
|
Another good speed trick is to minimize the amount of operations with NEWTYPE variables like " enemy(id)\x " , because they are lot slower than normal variables.
What I usually do in my own programs is that I have lots of temporary "alias variables" like x,y,speed,direction,energy, and so on, and use these in my game object loops. So for example if I have a collision check loop that needs to check 10 collisions, which makes comparisons with Newtype variables "enemy(id)\x" and "enemy(id)\y" 10 times, then before the checks begin I put these two into temporary variables "x" and "y", and do all the collisions checks with them. --- So instead of having 10 checks like this: if Zone (enemy(id)\x, enemy(id)\y) > -1 ...I would instead use: x = enemy(id)\x y = enemy(id)\y And then make the 10 checks with those: if Zone (x, y) > -1 ...and if x and y can be changed during the loop, then in the end of the loop the original newtype variables of course need to be updated. enemy(id)\x = x enemy(id)\y = y --- This method speeds up all operations a lot, and it should be used for all newtype variables that are used more than one time in the loop. So even if you just have two lines like this: enemy(id)\x + 1 enemy(id)\x + 1 Then this method is about 25 % faster: x = enemy(id)\x x + 1 x + 1 enemy(id)\x = x And the more operations you make with the same newtype variable, the greater the speed benefit will be when you convert it to a normal variable first. I believe that the slowness of the Newtype variables is caused by them being "references" to variables, rather than actual variables. So every time the program sees stuff like "enemy(id)\x + 1" it first needs to do some processing to get the actual variable behind that newtype reference, and then it does the + 1 operation, and after this it stores the result back to the location where the "enemy(id)\x" is pointing at, which I think causes a second dereferencing operation, and so most CPU time goes into solving those references. |
04 March 2018, 14:31 | #8 | ||||||
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Quote:
Quote:
Quote:
The code layout stuff is more for your own benefit really, it's something you'll curse when you go back to your code later and try to figure out how it works I recently recovered some of my old Atari 8-bit code from cassette, and it's horrendous! I have serious trouble understanding most of it. But your techniques are fine, some people struggle to display a sprite on screen so you're already doing well in that regard Quote:
Code:
If var.w > 100 var.w - 100 Else If var.w < 0 var.w + 100 End If End If Quote:
Quote:
Slowdown isn't something I'd noticed before, so I ran a quick test. It's in WinUAE with an A1200/030 setup, running in "cycle exact" mode, so I appreciate things could be a little screwy, but anyway... This was my test code: Code:
NPrint "Press Enter" x$=Edit$(1) NEWTYPE .test dummy.l sec.l End NEWTYPE DEFTYPE .test test framecount.l = 0 SetInt 5 framecount + 1 End SetInt test\dummy = 10 dmy.l = 10 sec.l = 0 framecount.l = 0 For i.l = 1 To 1000000 test\sec = test\sec + test\dummy Next i NPrint framecount End Code:
For i.l = 1 To 1000000 sec = sec + dmy Next i Hmmmm, more investigation needed... Anyway, if it does help speed things up in certain circumstances then great, I haven't noticed any issues myself using Newtypes in arcade-style code (my issues come from a lack of game design creativity ). Last edited by Daedalus; 04 March 2018 at 14:41. |
||||||
04 March 2018, 17:54 | #9 | |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
|
Quote:
But in my post I mainly meant newtypes that are located in a List, so that the variables look like: test(list position)\variable In these cases the speed difference is even more noticeable, when the method that I described is used. |
|
04 March 2018, 18:31 | #10 |
Registered User
Join Date: Feb 2018
Location: London / UK
Posts: 112
|
I can't access the zone since I'm new etc and can't check the source so sorry if this isn't relevant but one thing you should remember on 68000 is that multiply and divide are really slow. You should always use a table for them in any realtime code if the values aren't massive (well, over 8 bits really or separate tables for some few fixed values) and you can't change the code to do with shifts. So a straight table lookup for multiply; table and shift down (maybe 16 bits or less depending on needed accuracy) for division.
Last edited by dodke; 04 March 2018 at 18:40. |
04 March 2018, 20:02 | #11 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
Yes, I did notice some arithmetic to the lines of new_value = old_value + constant_value + (other_value * 2).
Any normal compiler will optimise x * 2 as x + x (addition is much faster than multiplication), but I'm not sure Blitz's braindead compiler does. |
04 March 2018, 20:14 | #12 | |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Quote:
Yep, it's AmiBlitz I was using too, so you're right - there could be some sort of optimisation (or bug fix) it does that gives it an edge over the Blitz Basic compiler. As for multiplication, that could indeed be another source of slowdown, and as idrougge says, I'm not sure Blitz is clever enough to optimise such things as multiply by two. A handy, fast alternative that works to multiply and divide by powers of 2 is bit-shifting. Provided you're happy to ignore any overflow or decimal places, shifting a binary number left multiplies it by 2 each digit, and shifting it right divides it by 2. Blitz has these operators, so for example, to divide by 8: Code:
answer.l = src LSR 3 Code:
answer.l = src LSL 1 |
|
04 March 2018, 21:39 | #13 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
src LSL 1 is still slower than src + src. Shifts on the plain 68000 just aren't very fast.
|
04 March 2018, 22:14 | #14 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Hmmm, good point... Bad example for just one shift, it's probably better suited to multiplying or dividing by 32 (the size of the OP's map blocks) for example.
|
05 March 2018, 10:57 | #15 | ||
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
|
Quote:
Quote:
So this code... Code:
a = 0 x = 0 repeat x + 1 a + 1 until a = 10000 Code:
a = 0 x = 0 *variable1 = x repeat *variable1 + 1 a + 1 until a = 10000 Code:
x = enemy(id)\x y = enemy(id)\y if Zone (x, y) > -1 Code:
*x_pointer = enemy(id)\x *y_pointer = enemy(id)\y if Zone (*x_pointer, *y_pointer) > -1 --- --- --- Here are some other code optimization ideas: 1. One easy way to measure speed in this program is to add "score = VPos" to the "hud" drawing routine, and then add "Gosub hud" just before the main loop VWait command. The "score" text will then tell the position of the display beam after all game logic has been done and only VWait remains. The lower the number, the faster the frame has finished. Although this method can be somewhat inaccurate if the main loop takes several frames to finish; it only tells the beam position, not how many frames have passed. If you also want to know the amounts of frames passed, then you have to use a VBlank interrupt and put a frame counter there. This too is quite simple to do, and the code that Daedalus posted actually shows how to do it, so you just put lines like this to your code before the main loop starts: Code:
framecount.w = 0 SetInt 5 framecount + 1 End SetInt The vertical blank interrupt can also have other code in it, but don't put anything slow like Blits or lots of calculations there, because this will cause problems. But things like sprite display commands and joystick reads are OK to be put there, and this guarantees sprite movement at 50 FPS no matter how slow the game is. Although I don't think this game needs to go that far; a simple pacman game should always run at 50 FPS without relying on tricks like this. 2. At the routine "pillupdate", I found this structure: Code:
If pill(pid)\state=3 pid+1 Pop If : Goto pillupdate End if Code:
If pill(pid)\state=3 then pid+1 Also I tested this change and it improved speed by several scan lines. So it also seems that the Pop instruction is very slow. --- Also in the "spinnerupdate" routine there are more Pop If's used in this way, in a code that seems to check if anything is inside the spinning center piece, and if so it tries to make a "Pop If:Pop For:Return" to end the routine before the spinner update happens. But the routine doesnt seem to work, and the player can get stuck inside the spinner walls. And I tried to comment out the entire check code, and nothing changed. So it seems that the routine does nothing, because everything after the Pop If's doesnt get executed, because the "Pop If" causes an instant jump out of the current "If...End if" structure. I actually quickly tried to fix that routine, using a more simple method to end the loop: - I added "a=0" at the start of "spinnerupdate". - I deleted both "Pop If" lines in the two checks that follow, and replaced them with "a=1". - I added a line "If a=1 then Return" right before the spinner draw routine starts. Which theoretically should have caused some change, but it didnt seem to have any effect, the spinner still moved, and player could get stuck inside it. 3. Also, about entering the BLITZ mode: the Blitz manual recommends that a VWAIT 250 should be used before entering BLITZ mode. This is a safety wait so that the disk drive has time to stop before BLITZ "shuts down" the operating system. Right now after loading the files the program goes to BLITZ mode right away without any waits, and this is why the disk drive motor keeps on running during the game. Although it seems that almost no one cares about this BLITZ mode safety wait, including the official Blitz example programs. |
||
05 March 2018, 17:48 | #16 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
|
I did a little bit more testing on the speed of pointers vs normal variables, this time comparing results on A500 and A1200.
Here are the test results for time consumed in 10000 increment operations: --- A500 WORDS (.w) Variables: 10 frames, VPos at 129 Pointers: 11 frames, VPos at 94 LONGS (.l) Variables: 11 frames, VPos at 95 Pointers: 11 frames, VPos at 95 A1200 WORDS (.w) Variables: 4 frames, VPos at 151 Pointers: 4 frames, VPos at 113 LONGS (.l) Variables: 4 frames, VPos at 113 Pointers: 4 frames, VPos at 205 --- So when we use words, on A500 pointers are about 10 % slower than variables, but with longs the speeds are exactly the same. And on A1200, when we use words pointers are about 5 % faster than variables, but when we switch to longs then the variables are suddenly faster. These results are from the classic Blitz 2.1, with cycle exact ON. The test configurations both had 2 MB of chip ram, with no Fast Ram. I tested with fast ram too, but it just made all cases faster, and otherwise the results were the same. So I guess that it comes down to the differences of the processors...I think someone just mentioned that the 68020 is better suited for longs, and the 68000 works better with words. That would explain these somewhat strange results. |
05 March 2018, 18:23 | #17 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Yeah, the 68000 needs to transfer in 16-bit chunks which will make pointers slower (since they're 32-bit) than the 16-bit words, and also explains why pointers are the same speed as longs. I think the results are very tight when you're talking fractions of a frame, but interesting results all the same. I really must put some tests together on real hardware...
As for the FPU angle, I don't think that's it since I've used Newtypes successfully in AmiBlitz code that runs fine without an FPU. |
05 March 2018, 22:36 | #18 | ||||
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
Quote:
Quote:
Quote:
Quote:
Code:
a=1 If a>0 Pop If NPrint "Hello" EndIf |
||||
06 March 2018, 10:55 | #19 | |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
|
Quote:
So then this code that I pointed out earlier... Code:
If pill(pid)\state=3 pid+1 Pop If : Goto pillupdate End if --- --- But I got more advice for you. I tested the speed of the "Zone()" instruction, and I was shocked how slow it is. The "SetZone" is fast, but the actual "Zone()" check is mega slow. I tested on both A500 and A1200, with the following results: A500: 1 zone() check takes 38 scan lines. A1200: 1 zone() check takes 9 scan lines. This means that just 10 zone() checks will take an entire frame on an A500. And even A1200 could handle just around 30 checks in a frame, if it had nothing else to do. The size of the Zone in my test was 100*100, and I had pre-defined 5 Zones, but checked only one. Making the Zone area smaller made it slightly faster, but not much. Also lowering the amount of the zones had a small effect, but not much. It's the Zone() check itself that is super slow. To put things into perspective, 38 scan lines wasted on A500 is equal of making 60 division operations. So definitely get rid of the Zones; they are the number one reason for the slowness of the game. And if you don't want to make a new tile map based collision system, then probably the easiest way would be to replace the Zones with RectsHit. So have something like this: collision=0 if RectsHit(player_x, player_y, 1, 1, zone_x, zone_y, zone_width, zone_height) then collision=1 Or make a Gosub where you put a big Select Case structure which has all the zones in 31 Cases (I think the game had 31 zones). So have something like: Code:
X_to_check = player_x ; variables for zone check Y_to_check = player_y HitZone_to_check = 24 ; zone number Gosub checkzones if collision=1 then do stuff Code:
.checkzones collision=0 select HitZone_to_check case 0 if RectsHit(X_to_check, Y_to_check, 1, 1, ZONE_0_START_X, ZONE_0_START_Y, ZONE_0_WIDTH, ZONE_0_HEIGHT ) then collision=1 case 1 if RectsHit(X_to_check, Y_to_check, 1, 1, ZONE_1_START_X, ZONE_1_START_Y, ZONE_1_WIDTH, ZONE_1_HEIGHT ) then collision=1 case 24 if RectsHit(X_to_check, Y_to_check, 1, 1, ZONE_24_START_X, ZONE_24_START_Y, ZONE_24_WIDTH, ZONE_24_HEIGHT ) then collision=1 end select Return And all those ZONE variables (ZONE_24_START_X, ZONE_24_START_Y, ZONE_24_WIDTH, ZONE_24_HEIGHT) you just replace with the corresponding SetZone values. This way you should be able to use RectsHit as a direct replacement for the Zone() commands. You'll have more lines of code, but it'll be considerably faster, because 1 RectsHit was about 50 times faster than Zone() when I tested it. Last edited by Master484; 06 March 2018 at 13:37. Reason: fixed a code mistake |
|
06 March 2018, 13:34 | #20 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Wow... I knew the Zones were slow, but that's very slow! I can only guess that it's doing some sort of full mathematical analysis of the zone, instead of taking easy shortcuts. RectsHit is a good alternative, and an even simpler version just for a single pixel might avoid some overheads within RectsHit for dealing with two rectangles:
Code:
If x >= zonex1 If x <= zonex2 If y >= zoney1 If y <= zoney2 collision = True End If End If End If End If |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
vasm movem optimization issue? | dalton | Coders. Asm / Hardware | 2 | 23 September 2016 14:02 |
3D Graphics: possible optimization? | sandruzzo | Coders. General | 3 | 26 February 2016 08:01 |
Loop optimization + cycle counts | losso | Coders. Asm / Hardware | 8 | 05 November 2013 11:50 |
Looking for 68000 binary optimization utility | amigoun | request.Apps | 2 | 23 October 2011 00:36 |
ARM Assembler Optimization | finkel | Coders. General | 10 | 01 December 2010 11:56 |
|
|