English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. General (https://eab.abime.net/forumdisplay.php?f=37)
-   -   How did games like Starglider 2 get such a high frame rate? (https://eab.abime.net/showthread.php?t=93740)

deimos 09 August 2018 15:42

How did games like Starglider 2 get such a high frame rate?
 
I'm playing around with some 3D graphics code at the moment, thinking about maybe doing something more with it, if only I can make it fast enough.

I'm animating a familiar looking chequered ball made up of 128 polygons. it's just moving around the screen while rotating around its centre. About half of the polygons get back face culled and so aren't rendered, leaving me with with around 64 to draw to the screen. I'm not currently doing any hidden surface removal apart from the backface culling, not even depth sorting. And it looks ok, but it's just way too slow.

I'm getting less than 2 fps. I want more like 8.

I'm only doing rotate and translate of the object at the moment, the camera is fixed. I'm doing a 3x3 matrix translation for the rotation, 2.14 fixed point maths (in assembly language for the multiplies), then translating it and doing the division for the perspective view. I'm not doing 4x4 matrices at the moment because they're not needed with a fixed camera.

From the timings I've done, I know I spend 1/3 of my time doing the moving the ball and doing the 3D calculations, and about 2/3 of the time drawing the polygons. Drawing the polygons roughly splits into half of the time tracking the edges and half the time drawing horizontal lines from one side of the polygon to the other, making the whole split approx 1/3 3D calculations, 1/3 2D calculations 1/3 writing to screen memory.

So, even if I continue optimising my code and get it running twice as fast, I still won't be anywhere near my goal.

Does anyone know what tricks games like Starglider 2 did back in the day to get such a high framerate?

AChristian 09 August 2018 17:04

how are you doing the calculations that involve sin cos and pi?
using lookup tables?
i was chatting with the programmer of carrier command on youtube years ago, he mentioned something about doing very imprecise but fast calculations involving pi.
i told him he should write a book or something :) but dont think he did.

deimos 09 August 2018 17:07

Quote:

Originally Posted by AChristian (Post 1259887)
how are you doing the calculations that involve sin cos and pi?
using lookup tables?

Yes, lookup tables:

Code:

WORD SinTable_14[2048] = {
    0x0000, 0x0032, 0x0064, 0x0096, 0x00c9, 0x00fb, 0x012d, 0x015f,
    0x0192, 0x01c4, 0x01f6, 0x0228, 0x025b, 0x028d, 0x02bf, 0x02f1,
    0x0323, 0x0356, 0x0388, 0x03ba, 0x03ec, 0x041e, 0x0451, 0x0483,
    0x04b5, 0x04e7, 0x0519, 0x054b, 0x057d, 0x05af, 0x05e1, 0x0613,
    0x0645, 0x0677, 0x06a9, 0x06db, 0x070d, 0x073f, 0x0771, 0x07a3,
    0x07d5, 0x0807, 0x0839, 0x086b, 0x089c, 0x08ce, 0x0900, 0x0932, ...


AChristian 09 August 2018 17:40

ok can't think of anything else

Did you create your 3d engine yourself or from a book?

There is a book for the zx spectrum on how to create a 3d engine, it was written in the 80's but is freely available online. I could direct you to that if you would like.

(because it could contain speed advice, even though its for a different processor)

I assume are you drawing the polygons using the amiga's ability to draw horizontal lines until it detects a black pixel?

deimos 09 August 2018 18:32

Quote:

Originally Posted by AChristian (Post 1259898)
are you drawing the polygons using the amiga's ability to draw horizontal

I have tested with the Blitter's area fill and settled on the non-Blitter polygon fill that I'm currently using. The Blitter did make things slightly faster (2.2fps vs 1.8fps), but adds complexity, and you need to clip, which you don't with a scanline fill.

My numbers (1/3 1/3 1/3) tell me that even if I could make drawing to the screen free, I'd still only get 5fps. That's why I think the Starglider guys might have had some tricks.

And besides, Starglider 2 was available for the ST, so the Blitter can't be the secret.

AChristian 09 August 2018 19:02

interesting, thanks I wish I was in contact with this carrier command bloke still. I guess he could actually help you out :)

roondar 09 August 2018 19:09

I'm not an expert in 3D gaphics, but there are a few things these games did to speed things up I do know:

1) almost all of them ran in low colour modes & resolutions.
2) almost all of them didn't run full screen.
3) a number of them used the blitter to clear the screen while doing the calculations for the next frame to draw.
4) the ones that did use the blitter (as I understand it) made a 1BPL version of any filled polygon in a seperate buffer first (using blitter line/blitter fill) and then copied it over to whatever bitplanes where needed.

And 5) they tended to draw simple shapes only (no nice looking spheres for instance), meaning less math was needed for drawing the polygons and each of the polygons was a simple shape.

Edit: I'm also pretty sure that any game showing circles just used 2D circles instead of 3D representations.

However, there is probably more going on than just this :)

grond 09 August 2018 19:31

MULs and DIVs can be replaced by ADDs and SUBs if you do log() and exp() on them via LUTs. With the relatively low precision required for low screen resolutions and low polygon objects, the tables will be reasonably small.

Amigajay 09 August 2018 19:33

Yes only half the screen is used for the playing area for starters, then less than half of that is the actual chequered grid, plus if you look when you fly up high the draw distance is very close.
Coupled with the 3D objects pop up very close so they arent drawn in from the horizon like some fly simulators.

I guess more tricks are used here than flight sims to fool the eye the 3D world is larger than it actually is.

deimos 09 August 2018 19:44

Thanks for your answer. I think these are all valid points, but I think I've already considered all of them.

1) SG2 looks like it's in 16 colour mode (I've tried counting colours on screen). I've tried dialling down the depth I'm using and I can see that the last 1/3 of my "pipeline" is directly proportional to the number of planes I have to update, which is as you would expect, but there's not enough savings here.
2) True, SG2 and others use half the screen for the view outside and half for the cockpit / dials, etc. But the object I'm drawing fits in that half, so right now there's no savings by cutting half the screen off.
3) Yes, I'm doing that - blitting the screen memory full of zeros takes less time than my calculations stage so is effectively free.
4) Yes, and if I go back to blitter filling I'll double check all this, but I'm not seeing huge benefits from the blitter at the moment. Bigger polygons might make the difference more significant. A blitter queue would be like magic, but like magic, they don't seem to exist in reality.
5) Yes, the ball is not realistic for a game object, but I've tried counting the polygons in a busy SG2 scene, and I think my ball has about the same, so I think it's probably a good test.

Quote:

Originally Posted by roondar (Post 1259909)
I'm not an expert in 3D gaphics, but there are a few things these games did to speed things up I do know:

1) almost all of them ran in low colour modes & resolutions.
2) almost all of them didn't run full screen.
3) a number of them used the blitter to clear the screen while doing the calculations for the next frame to draw.
4) the ones that did use the blitter (as I understand it) made a 1BPL version of any filled polygon in a seperate buffer first (using blitter line/blitter fill) and then copied it over to whatever bitplanes where needed.

And 5) they tended to draw simple shapes only (no nice looking spheres for instance), meaning less math was needed for drawing the polygons and each of the polygons was a simple shape.

Edit: I'm also pretty sure that any game showing circles just used 2D circles instead of 3D representations.

However, there is probably more going on than just this :)


deimos 09 August 2018 19:50

I will look into this, though I've been able to see differences in the end result when I've changed the number of fractional bits used, so I'm wary of losing accuracy.

Quote:

Originally Posted by grond (Post 1259918)
MULs and DIVs can be replaced by ADDs and SUBs if you do log() and exp() on them via LUTs. With the relatively low precision required for low screen resolutions and low polygon objects, the tables will be reasonably small.


deimos 09 August 2018 19:53

I know SG2's draw distance is small, and less than the (fake?) chequered grid, but I think I'm trying to process around the same the number of polygons as you'd get if you count all the enemies you see on a busy screen.

Quote:

Originally Posted by Amigajay (Post 1259919)
Yes only half the screen is used for the playing area for starters, then less than half of that is the actual chequered grid, plus if you look when you fly up high the draw distance is very close.
Coupled with the 3D objects pop up very close so they arent drawn in from the horizon like some fly simulators.

I guess more tricks are used here than flight sims to fool the eye the 3D world is larger than it actually is.


jotd 09 August 2018 21:46

one trick is the fast inverse square root function. Used in DOOM or Quake to compute some 3D stuff, probably not in Starglider. Precomputed tables/masks are probably the key.

coder76 09 August 2018 22:22

It isnt clear from your writings how you fill your polygons, but you should fill only 1 bitplane per surface, so if you have a 4 bitplane screen, then you have only 4 colors available (0001, 0010, 0100, 1000). Fast filled fullscreen polygons that run in 50 FPS on an A500 usually have only 3-4 colors and are done with blitter.

chb 09 August 2018 22:28

Quote:

Originally Posted by jotd (Post 1259956)
one trick is the fast inverse square root function. Used in DOOM or Quake to compute some 3D stuff, probably not in Starglider. Precomputed tables/masks are probably the key.

No, that one relies on some properties of the binary representation of floating point numbers and makes only sense when there's a reasonably fast fpu present. On an 68000 it would be much slower than directly computing it with fixed point arithmetic, let alone using a table-based approach.
The Wikipedia article on this topic is actually quite good:
https://en.wikipedia.org/wiki/Fast_inverse_square_root

A possible way to speed up coordinate calculations is to exploit object symmetries, e.g. your sphere is highly symmetric, so that there's a lot less independent coordinate values than for a set of random points. This allows you to precompute some of the products in the matrix multiplication: If your coordinate vectors (x,y,z) are e.g. (a,a,a),(a,-a,a),(-a,-a,a),(-a,a,a),(a,a,-a),(a,-a,-a),(-a,-a,-a),(-a,a,-a) for a simple cube, you only have to precompute the product of every rotation matrix element with 'a' and carry out the additions and subtractions. So in that case only nine multiplications instead of 216 72 for random coordinates. For a generic 3D-engine you'd probably need a bit of clever coordinate transformation ping pong to make it work.

deimos 09 August 2018 23:11

I'm not sure I understand how to exploit symmetry when arbitrary rotations of the object are allowed. I guess the mirror YZ plane rotates through the same raw pitch roll as the points, but then what's the maths to reflect across this now arbitrary plane?

Quote:

Originally Posted by chb (Post 1259966)
No, that one relies on some properties of the binary representation of floating point numbers and makes only sense when there's a reasonably fast fpu present. On an 68000 it would be much slower than directly computing it with fixed point arithmetic, let alone using a table-based approach.
The Wikipedia article on this topic is actually quite good:
https://en.wikipedia.org/wiki/Fast_inverse_square_root

A possible way to speed up coordinate calculations is to exploit object symmetries, e.g. your sphere is highly symmetric, so that there's a lot less independent coordinate values than for a set of random points. This allows you to precompute some of the products in the matrix multiplication: If your coordinate vectors (x,y,z) are e.g. (a,a,a),(a,-a,a),(-a,-a,a),(-a,a,a),(a,a,-a),(a,-a,-a),(-a,-a,-a),(-a,a,-a) for a simple cube, you only have to precompute the product of every rotation matrix element with 'a' and carry out the additions and subtractions. So in that case only nine multiplications instead of 216 for random coordinates. For a generic 3D-engine you'd probably need a bit of clever coordinate transformation ping pong to make it work.


deimos 09 August 2018 23:13

Sure, but I'm after something more general than that. A game, not a demo.

Quote:

Originally Posted by coder76 (Post 1259965)
It isnt clear from your writings how you fill your polygons, but you should fill only 1 bitplane per surface, so if you have a 4 bitplane screen, then you have only 4 colors available (0001, 0010, 0100, 1000). Fast filled fullscreen polygons that run in 50 FPS on an A500 usually have only 3-4 colors and are done with blitter.


chb 09 August 2018 23:57

Quote:

Originally Posted by deimos (Post 1259972)
I'm not sure I understand how to exploit symmetry when arbitrary rotations of the object are allowed. I guess the mirror YZ plane rotates through the same raw pitch roll as the points, but then what's the maths to reflect across this now arbitrary plane?

My idea was mostly about reducing the number of values the coordinates can take. I was assuming you are working always with the original coordinates, or do you rotate the rotated coordinates again, instead of computing only new rotation matrices? In that case this trick does not work, of course. But I do not see a need to do so, as you can always multiply your old rotation matrix with a new one to get the same effect?

Matrix multiplications are distributive, so R (p1 +p2) = R p1 + R p2 and vice versa (R rotation matrix). So have your object points in a coordinate space where it has maximal symmetry, and which can be transformed to your world system by a translation (addition of the vector to the origin): p_world = p_object + p_origin. To rotate your object in the world system, p'_world = R p_world = R (p_object + p_origin) = R p_object + R p_origin. The last term is constant for all points of your object, so needs to be calculated only once for every rotation matrix.

Having your objects in separate coordinate spaces has also other benefits, like quick translation (change only the origin vector), and you might get higher precision/faster math: if you have a large world where 2.14 is not sufficient, do e.g. the rotation in the object system in 2.14 precision or whatever you like and only the rotation of the translation vectors and the translation in 18.14.

grond 10 August 2018 08:12

If you really want to do vector multiplications, you'll run out of registers. Hence, use MULs with constants for the camera vectors (like MULS.w #cam,Dn) and replace the dummy value with your updated camera values for each frame (self-modifying code). In this way you can do point projection without having to juggle with registers.

deimos 10 August 2018 09:21

I always start with original coordinates in object space. I fill in my rotation matrix and apply this to my coordinates and add the translation for the object's position to get coordinates in world space. The camera is fixed at 0,0,0 for now, otherwise I'd move to 4x4 matrices and combine the move to camera coordinates into one matrix.

I can see advantages in modelling my objects by taking advantage of their symmetry, they'll be easier to design by hand that way, but unless I'm missing something, I don't think I can use it to reduce the computations I need to do.

Quote:

Originally Posted by chb (Post 1259982)
My idea was mostly about reducing the number of values the coordinates can take. I was assuming you are working always with the original coordinates, or do you rotate the rotated coordinates again, instead of computing only new rotation matrices? In that case this trick does not work, of course. But I do not see a need to do so, as you can always multiply your old rotation matrix with a new one to get the same effect?

Matrix multiplications are distributive, so R (p1 +p2) = R p1 + R p2 and vice versa (R rotation matrix). So have your object points in a coordinate space where it has maximal symmetry, and which can be transformed to your world system by a translation (addition of the vector to the origin): p_world = p_object + p_origin. To rotate your object in the world system, p'_world = R p_world = R (p_object + p_origin) = R p_object + R p_origin. The last term is constant for all points of your object, so needs to be calculated only once for every rotation matrix.

Having your objects in separate coordinate spaces has also other benefits, like quick translation (change only the origin vector), and you might get higher precision/faster math: if you have a large world where 2.14 is not sufficient, do e.g. the rotation in the object system in 2.14 precision or whatever you like and only the rotation of the translation vectors and the translation in 18.14.



All times are GMT +2. The time now is 11:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05253 seconds with 11 queries