14 January 2021, 15:59 | #1 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Informed optimisation
I've started trying to optimise my 3D engine a bit.
Step 1: take some measurements: Step 2: ??? Step 3: Profit! Anyway, what do I do now? It's not the calculations that take up the majority of the time, and it's not painting the polygons either, it's "everything else". I feel stuck because I don't know how to identity that "everything else" or what its problems could be. I know not to expect this code to be super fast, because it is doing a lot, 2 aircraft, a tank, a control tower and runway, plus ground, sky, sun (out of sight), shadows. And it's nearly all in C. I'm not going to change that yet, but I wonder if it's hiding some inefficiency, such as not being able to use registers effectively, that a restructure or reordering of my code would magically fix. My models are overly complex, there are no levels of detail, but I figure that only helps to identify the performance problems. Anyway, that's all. It's good to share. Last edited by Ernst Blofeld; 14 January 2021 at 16:30. |
14 January 2021, 16:59 | #2 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
In my experience with GCC you're not going to see revolutionary improvements by turning your existing code into assembler. So, you are wise to avoid that until you've taken the C code as far as you can.
Without code to look at it's hard to say how it could be improved. Using 32bit ints easily adds up because it's double the amount of memory you're shuffling around and 32bit instructions are slower. So it pays off to constrain your data to 16bits everywhere. Shifts are terribly slow so it helps to keep them to a minimum or always 16bit shifts so the compiler will use a swap instruction. Maybe you can share one part of the code so you can get feedback that might lead to ideas you can propagate through the rest of the code. The only way to make progress is to chip away at each part of the code making it slightly faster until all that works starts to add up. But at the moment you probably don't know what to do with any one piece of code, so pick one to share and maybe people will have good suggestions. |
14 January 2021, 17:10 | #3 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
Can you suggest a function to start with? Something that won't turn the crowds away? |
|
14 January 2021, 17:13 | #4 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
Another generally slow approach that is common in C code to using pointers in data structures. Again, slow because they are 32bits.
It's more efficient to use indices that will be 16bits. Ideally you want to avoid indirection as much as possible simply because it adds up to more data to read. Getting into the weeds a little if you do need indirection it's also pretty slow to use indices because the compiler will need to scale them up to whatever you are indexing. In 68000 the addressing modes only allow for byte indexing. So, where possible you want to store indices that are pre-multiplied with the size that they are indexing. The compiler is quite good at dealing with code like this: Code:
int x = table[preMulIndex / sizeof(int)]; |
14 January 2021, 17:15 | #5 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
Quote:
Maybe the 3D math stuff, since lots of people will already know how that works in theory. |
|
14 January 2021, 17:20 | #6 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
The biggest chunk of the calculations is _MobileEntity3D_calculate, at 21% of the total. I can see what I do to package that up into something readable? |
|
14 January 2021, 17:27 | #7 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
Code:
void _MobileEntity3D_calculate(Entity * this_Entity) { BEGIN_FUNCTION Entity3D * this_Entity3D = (Entity3D *) this_Entity; MobileEntity3D * this_MobileEntity3D = (MobileEntity3D *) this_Entity; extern Camera * currentCamera; extern const WORD viewingDistance; Transformation3D transformation; CombineTransformations(&transformation, ¤tCamera->transformation3D, &this_Entity->transformation3D); Transformation3D shadowTransformation; CombineTransformations(&shadowTransformation, ¤tCamera->transformation3D, &this_MobileEntity3D->shadowTransformation3D); const Model3D * model3D = this_Entity3D->model3D; Point3D * vertex = model3D->vertices; Calculation3D * calculation = this_Entity->calculations3D; Calculation3D * shadowCalculation = this_MobileEntity3D->shadowCalculations3D; for (UWORD i = 0; i < model3D->numVertices; i++, vertex++, calculation++, shadowCalculation++) { LongPoint3D * transformed = &calculation->transformed; LongPoint3D * projected = &calculation->projected; ApplyTransformation(transformed, &transformation, vertex); if (transformed->x >= viewingDistance) *projected = (LongPoint3D) { transformed->x, transformed->y * viewingDistance / transformed->x, transformed->z * viewingDistance / transformed->x }; else *projected = *transformed; LongPoint3D * transformedShadow = &shadowCalculation->transformed; LongPoint3D * projectedShadow = &shadowCalculation->projected; ApplyTransformation(transformedShadow, &shadowTransformation, vertex); if (transformedShadow->x >= viewingDistance) { *projectedShadow = (LongPoint3D) { transformedShadow->x, transformedShadow->y * viewingDistance / transformedShadow->x, transformedShadow->z * viewingDistance / transformedShadow->x }; } else *projectedShadow = *transformedShadow; } END_FUNCTION } Possibly, this may be calculating shadows for one object that doesn't need them. That may be 3% there. |
|
14 January 2021, 17:29 | #8 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
|
The first thing that I find interesting is that the shadow layer takes a considerable amount of time, nearly as much time as the rest (24 vs 32%). Perhaps you can 'cheat' the shadows somehow by integrating the preparations into the main render code?
The second thing I find noteworthy is that a lot of time is spend in various render (but not painting) routines. It seems to me that these are the ones where optimization might be most useful. In general, there's a bunch of different things that can be optimized in C/C++ that aren't always clear. In particular, compilers sometimes push local arrays onto the stack instead of using a block of allocated memory and simple pointer references. The type of variables you create can also have a significant impact on performance, volatile vs non-volatile etc can be quite the difference. And indeed, on 68000 it pays to try and keep everything that can fit in 16 bits in 16 bit fields. |
14 January 2021, 17:44 | #9 | |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
Quote:
Firstly you are using divides for the projection, this is going to be slow. It's fairly easy to improve this with a 1/x table and replace the divides with multiplies. The other thing is that for each vertex you're branching based on the view distance. I presume this is to avoid projection divides for distant vertices. It would be better to make that decision for an entire object once and run it through two different code paths, rather than branching for each vertex. Also you are saving the full 3D vertex for the projected form. Maybe you only need the 2D point after it's projected. There is also quite a bit of redundant duplicate code, the optimizer can't help as much as you'd like with this stuff. So make sure you write the code out so things like "viewingDistance / transformed->x" are only done once, not twice. There's a bunch of indirection that might also be causing things to be slower than necessary. You have things in Calculation3D structures when it might be better to pull them out into separate arrays. Struct of arrays rather than array of structs. Last edited by Jobbo; 14 January 2021 at 17:51. |
|
14 January 2021, 17:54 | #10 | |||
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
There are different transformations for the object vs its shadow, each of which is the combination of the model transform and camera transform, with the shadow one having an extra bit of flattening to make it a shadow (setting a bunch of stuff to zero). Perhaps the conventional wisdom of combining all the matrices together doesn't apply here? Quote:
Quote:
|
|||
14 January 2021, 17:58 | #11 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
|
A weird idea, but it can help to try and make your compiler use registers more efficiently: try to use fewer variables in functions. The 68000 has 8 data registers and 8 (though due to stack usually really only 7) address registers. In practice this does not translate to 15 variables being used as quite a few operations require temporary registers for use or require both address and data registers.
The upshot is that if you can keep the number of variables in use at any time low, the compiler won't have to swap around registers to stack as much to keep it all working. |
14 January 2021, 18:07 | #12 | ||||
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
Quote:
Quote:
Quote:
|
||||
14 January 2021, 18:15 | #13 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
|
|
14 January 2021, 18:19 | #14 | |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Quote:
Code:
move.w (a2,d2.w),d2 ;16384 / z muls d2,d0 ;x muls d2,d1 ;y add.l d0,d0 add.l d0,d0 swap d0 ;/16384, back to normal scale |
|
14 January 2021, 18:38 | #15 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Yes.
Problem is I'm dealing with some big things as well as some smaller things. My runway is 1.5km long, my jet is 16m. The model I use for my jet is in millimetres. The model I use for the runway is in metres. (where there are 1024 millimetres in a metre and a pixel happens to be exactly one millimetre square) At the moment everything ends up in millimetres in a LONG because after doing all the 2:14 fixed point rotation multiplications I shift right only 4 places for the runway. It all starts to get complicated. But, unless I change things, Z is a LONG, so I can't build a table for it. But, are the Z divides the thing to fixate on, when the whole calculation section is only a quarter of the total? |
14 January 2021, 19:40 | #16 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
It sounds like you are being constrained by the fact that you need the same code to handle large objects such as the runway and relatively small objects such as the plane.
To handle these two cases most efficiently you probably want to split the code into two separate paths. For example near plane clipping is no doubt important for the runway. But for the plane you might as well just clip the whole plane if it gets too near the camera. Similarly and related to the requirements for optimizing the projection, you might want two different sets of code dealing with the two different scales. Ultimately you want to do what it fastest for the data you have, not something generic. It's a certainty that most games back in the day would treat the objects quite differently than they do the environment. |
14 January 2021, 19:44 | #17 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
For the reciprocal z table you want to limit the table by limiting how far away you are willing to draw objects. And you want to limit the precision by some factor. So you might changes the code from this:
a = b / z; to a = b * recip[z >> 4]; You're trading away some precision but it's quite manageable if you limit the far plane. |
14 January 2021, 20:19 | #18 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
OK, let me report back tomorrow. I already have separate calculation functions for Entity2Ds (the runway), FixedEntity3Ds (the tower) and MobileEntity3Ds (tanks and planes), so calculating them in different ways should be achievable.
|
14 January 2021, 23:14 | #19 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
|
How are you actually rendering your polygons?
|
15 January 2021, 03:36 | #20 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
The ground is done with the blitter, just one bitplane. The rest are done with the CPU, the asphalt of the runway and the shadows are also just one bitplane, as is the sun although that's a precalculated circle.
To draw the polygons I go round the edges using Bresenham to populate an array of start and end points (anti clockwise so down is a start point and up is an end). I then go through this array, having remembered the y min and max to start and stop at, drawing horizontal lines one long word at a time. That last bit came from elsewhere, but I changed it to write longs rather than words. Last edited by Ernst Blofeld; 15 January 2021 at 06:01. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68000 optimisation | Galahad/FLT | Coders. Asm / Hardware | 9 | 20 August 2016 00:29 |
Picasso IV optimisation | Tony Landais | support.Hardware | 10 | 01 September 2006 19:54 |
|
|