English Amiga Board


Go Back   English Amiga Board > Main > Amiga scene

 
 
Thread Tools
Old 25 February 2015, 11:57   #21
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by mc6809e View Post
But you can't usually render a poly directly to a frame buffer unless it's the only poly to be rendered, unless I'm missing something.
One can render an entire 3-colour convex polyhedron and fill it with a single blit, and then copy it to the screen with another single blit. All the ships on elite, for instance, are convex polyhedra!

Quote:
This whole discussion of speed ignores one important point, however, and that's the ability of the Amiga to perform operations concurrently with the CPU. For 3d games where many multiplies and divides are performed, many cycles are left free for the blitter. Huge gains are possible.

But again there's a problem. To gain maximum concurrency, poly fill/render must happen with data that has already been computed since poly render order can only be known after all calculation have been performed. This means that calculations for the next frame must occur while the previous frame is being built.
I don't understand why it makes a difference. Whether you fill a polygon with the CPU or the Blitter, you need to have calculated it first.

It is easy enough to sort all the objects by Z-distance first, then one can multiply the points of the next object by the rotation matrix and calculate screen co-ordinates, while the previous object is still being drawn.
Mrs Beanbag is offline  
AdSense AdSense  
Old 25 February 2015, 20:36   #22
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 281
Quote:
Originally Posted by Mrs Beanbag View Post
One can render an entire 3-colour convex polyhedron and fill it with a single blit, and then copy it to the screen with another single blit. All the ships on elite, for instance, are convex polyhedra!
That seems like it would often take more time. There's the larger off screen buffer to clear. And twice as much memory to cover with the fill. I can see how it might be competitive for small polys, but I'm not sure there's a big win with that method over simply filling one plane and then blitting to multiple planes, especially if you need more than three colors.

Quote:
Originally Posted by Mrs Beanbag View Post
I don't understand why it makes a difference. Whether you fill a polygon with the CPU or the Blitter, you need to have calculated it first.

It is easy enough to sort all the objects by Z-distance first, then one can multiply the points of the next object by the rotation matrix and calculate screen co-ordinates, while the previous object is still being drawn.
Unless your rotation matrix calculation time exactly matches blitter render time, either the CPU or the blitter will be left waiting at some point. For a small poly, the blitter will wait for the CPU to finish. For a large poly, the CPU will have to wait for the blitter to finish.

And there typically are plenty of other calculations that must be done in a program like Elite. It isn't all rotation matrix calculations. It would be desirable to overlap rendering with all those other calculations, too.

Here's what I would try if I had the skills:

The Amiga hardware permits the copper to run a list that programs the blitter. I'd use the copper for most of the rendering. With some cleverness, it's possible to get the copper to cross vblanks without misprogramming the blitter allowing for CPU/blitter concurrency. Of course the CPU has to reprogram bitplane pointers at some point before the top of the display using an interrupt (VBlank or even timer based on HSync), but getting CPU/blitter concurrency is probably more than worth it.

There's even an opportunity to use the blitter together with CPU for occlusion culling by using BZero together with an occlusion mask.

Going all out, one could even use a tile-based (16x16 pixels) system together with occlusion culling to minimize the amount of rendering done by the blitter of scenes that have large polys obscuring most of the background polys.

The render pipeline (running concurrent with calculations for the next frame) would look something like:

Clear poly buffer (will hold multiple single bitplane filled polys)
Clear occlusion buffer to all 1s (zeroed areas will indicate an area covered by a poly)
Clear render buffer
Draw polys/fill to multiple poly buffer (holds multiple single plane filled polys)
Sort polys front to back
Occlusion cull by using AB blit with D=A*B (test to see if new poly would add to scene) and draw black (zeroed) single bitplane polys to occlusion buffer if BZero is nonzero (to reduce unnecessary blits to frame buffer)
Sort non-occluded polys back to front
Render tiles of non-occluded polys using LF to proper plane for correct color

The key to making this fast is to allow the CPU to calculate a frame ahead while much of the blitting is done concurrently. The CPU would have to be diverted to help with occlusion culling and sorting from time to time, but most of the other blits can be done with copper lists. It the CPU is allowed to begin calculating one frame ahead, a great deal of concurrency can be taken advantage of.

For instance, once a copper list is built to clear buffers and draw lines/fill polys in the poly buffer, the CPU can immediately move to calculating the coordinates of next set of polys for the next frame and building the next copper list while the previous copper list is rendering.

It's all a great deal of work, and while the additional rendering steps would tend to make for slower rendering of scenes with very few polys, as the number of polys grows, the scheme should quickly overtake simpler methods I think. The idea is to make fast cases that have many polys. Occlusion culling might even make 640x400 8-colors usable.
mc6809e is offline  
Old 25 February 2015, 22:10   #23
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by mc6809e View Post
That seems like it would often take more time. There's the larger off screen buffer to clear. And twice as much memory to cover with the fill. I can see how it might be competitive for small polys, but I'm not sure there's a big win with that method over simply filling one plane and then blitting to multiple planes, especially if you need more than three colors.
The off-screen buffer is not "larger" - why would it be larger? It's exactly as bis as the object you want to draw. And you don't need to clear it, you can draw the lines in one buffer and fill to a different destination. Then you only need to XOR over the lines again.

If you don't see how filling several polygons at one is quicker than filling them all separately and then separately copying them into the destination then i don't know what to say. On a single object, polygons will overlap significantly along the edges, and for smaller objects this is ever more the case. You can also "cancel out" shared edges before you even draw them. You can of course have as many planes as you like in the destination buffer. Just using minterms, you can have a 3 colour object copy to a 32 colour screen with colours remapped anywhere into the destination.

Also you can draw the first object to be drawn directly onto the screen, and dynamically re-order the palette. If this object is a concave interior, you can have as many colours as you like and fill the whole thing in one pass.

Last edited by Mrs Beanbag; 25 February 2015 at 22:23.
Mrs Beanbag is offline  
Old 26 February 2015, 07:08   #24
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 281
Quote:
Originally Posted by Mrs Beanbag View Post
The off-screen buffer is not "larger" - why would it be larger? It's exactly as bis as the object you want to draw. And you don't need to clear it, you can draw the lines in one buffer and fill to a different destination. Then you only need to XOR over the lines again.

If you don't see how filling several polygons at one is quicker than filling them all separately and then separately copying them into the destination then i don't know what to say.

<...>

Also you can draw the first object to be drawn directly onto the screen...
Ah, that final statement suggests to me that I maybe I haven't been clear enough.

What I'm talking about is first drawing one filled single bitplane polygon of a larger object in a buffer. Then by using that single bitplane polygon, write zeros or ones as appropriate directly onto the bitplanes of next frame to be displayed (as in a double buffered scheme) by setting the correct LF for each plane to set the color.

There's no extra destination or separate area where an object is built and then blitted into the scene. Each polygon of the object is written directly onto the next image that will be displayed. After all the polygons of all the objects have been drawn, a simple change of bitplane pointers is enough to display the new frame.

If the polygons are simply drawn back to front, then there's no need for hidden surface removal and there's no need to restrict one's self to convex polyhedra or to objects that have no holes.

Now maybe I misunderstand you, but as far as I can tell, you advocate for drawing the boundaries of several polygons of an object, taking care not to draw the lines of hidden facets, filling them all at once to draw a complete object, then blitting that whole object into the next frame to be displayed along with other objects.

Is that the case?

I'm curious, then, how do you quickly determine which facets are hidden on a non-regular polyhedron? The method I'm familiar with involves computing surface normal vectors.
mc6809e is offline  
Old 26 February 2015, 12:48   #25
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by mc6809e View Post
Ah, that final statement suggests to me that I maybe I haven't been clear enough.
maybe the both of us.

Quote:
What I'm talking about is first drawing one filled single bitplane polygon of a larger object in a buffer. Then by using that single bitplane polygon, write zeros or ones as appropriate directly onto the bitplanes of next frame to be displayed (as in a double buffered scheme) by setting the correct LF for each plane to set the color.
What i'm talking about is first drawing multiple filled 2-bitplane polygons of a large object in a buffer. Then by using that 2 bitplane sub-object (it need not be entirely closed), write zeros or ones as appropriate directly onto the bitplanes of next frame to be displayed by setting the correct LF for each plane to set the color (use all three blitter sources to do the copy, so we can set minterms to map source values 1, 2, 3 into any colour combination on the destination).

So you can render three or more polygons with only two fills and one copy per bitplane of destination screen. This buffer need not be cleared afterwards, because we can use a separate buffer to render the lines and just clear that by going over the lines a second time. Each line will only need drawing exactly twice, with shared edges eliminated.

Consider a simple cube. When seen from some angle where any 3 sides are visible, the bounding boxes of these sides will significantly overlap. Doing 3 slightly smaller fills/copies won't be as fast as doing 2 fills and 1 copy.

Quote:
If the polygons are simply drawn back to front, then there's no need for hidden surface removal and there's no need to restrict one's self to convex polyhedra or to objects that have no holes.
I don't know why you wouldn't do hidden surface removal in any case. The calculation is very simple (either by normal vectors or the clockwise/anticlockwise test) and is certainly faster than drawing the polygon. Plus it leaves you with the problem of sorting each individual polygon, which is not a trivial thing to do (simple Z sorting is not reliable enough and sorting can't be done in linear time in general). I have actually used a sort of manually-constructed BSP to build up more complex shapes.

Objects with holes don't cause a problem, neither do completely concave objects. For instance one could render a cube with a tunnel through by rendering the tunnel and then rendering the cube with holes in.

One can also set any of the sub-object colours to transparent and effectively use one of the source bitplanes as a mask, so you can render concave features onto things AFTER drawing them. You can also use non-convex/concave objects to do weird effects since overlapping polygons will XOR with each other, as well as 2-colour surface decals.

Here is an illustration showing the potential overlap of polygon bounding boxes on a rotated cube:

The overlap could be even more when we consider we have to snap to word boundaries.

Last edited by Mrs Beanbag; 26 February 2015 at 13:11. Reason: added illustration
Mrs Beanbag is offline  
Old 01 March 2015, 22:04   #26
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 281
Thanks for your thoughts, Beany. Appreciated. But by my math, your cube example actually argues for doing three single-bitplane fills.

Doing three separate one-bitplane fills I get:

14 words x 316 lines = 4424 words filled (red area)
15 words x 241 lines = 3615 words filled (green area)
15 words x 243 lines = 4460 words filled (blue area)

Total words to be filled = 12499
Total DMA cycles for fill = 12499*3 = 37497 DMA cycles

For a five plane display, cutting each polygon into the new frame with ACD blit = 3 x 5 x 12499 = 187485 DMA cycles.

Total DMA cycles = 224982

Filling two planes with all three cube faces to be filled all at once:

25 words x 400 lines x 2 planes = 20000 words to be filled.
Total DMA cycles for fill = 60000 DMA cycles

ABCD blit into a five plane frame = 4 x 5 x 10000 = 200000 DMA cycles.

Total DMA cycles = 260000

Perhaps the better method is simply very dependent on the pattern of the polygons that make up the object. For filling, the cube provides too much empty area that must unnecessarily be touched. Some other shape might be faster (a polyhedron with tessellated faces and few colors perhaps?)

The ABCD blit is a nice trick, but it too requires that the blitter touch much empty space in the case of the cube. Again, which is better must be object dependent.

With respect to calculating away hidden surfaces, using the debugger to examine several 3d poly games running on an emulated A500 and looking at the ratio of time used for calculation versus polygon draw, I can say, except for demos with a very small number of objects, that the ratio of calculation time to polygon fill/render time is usually greater than 1:1. In larger worlds, many calculations are done that result in nothing being drawn. Many polygons are clipped away or are located behind the viewer. This suggests that it might be better to use the blitter to cover more distant polys than to introduce additional calculation for hidden surface removal, provided the CPU and blitter can be made to run concurrently.
mc6809e is offline  
Old 01 March 2015, 22:32   #27
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
i feel like you have cheated me a little here since the area to fill is only 24 words x 380 lines x 2 planes = 18240, or 54720 fill cycles + 182400 copy cycles = 237120 in total. So i get your point but it is a closer call than you thought.

Also one need not necessarily fill the entirety of the bounding box for both planes (for instance one plane only contains faces A and B while the other only contains B and C) and it saves having to draw some lines twice. But as you say it depends on the object, a cube is a simple example but imagine an object that has many triangles in the same colour, like a 2 or 3 colour boing ball.

You always have to calculate which polygons are behind the viewer, you can't draw those in any case. There are also many calculations that you have to do per-polygon whether they are visible in the final scene or not. But if you are not doing hidden surface removal, how do you order the polygons? Simple Z-sorting is not enough for something like an L-shaped object, for example. Whereas with hidden surface removal one can often use a fixed drawing order.

When polygons are very distant it might be quicker to fill them than to do a couple of multiplies required for testing visibility, but we have to do some multiplies anyway to calculate the screen co-ordinates of all the points.

EDIT: also consider the fill operation includes an idle cycle, which can be used by CPU.

Last edited by Mrs Beanbag; 01 March 2015 at 23:04.
Mrs Beanbag is offline  
Old 02 March 2015, 06:57   #28
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 281
Quote:
Originally Posted by Mrs Beanbag View Post
i feel like you have cheated me a little here since the area to fill is only 24 words x 380 lines x 2 planes = 18240, or 54720 fill cycles + 182400 copy cycles = 237120 in total. So i get your point but it is a closer call than you thought.
Sorry

Quote:
Originally Posted by Mrs Beanbag View Post
You always have to calculate which polygons are behind the viewer, you can't draw those in any case. There are also many calculations that you have to do per-polygon whether they are visible in the final scene or not.
Oh, I just meant that since the CPU is usually busy with other calculations of all sorts while the blitter is less taxed, adding a cross product calculation for a surface normal plus dot product to see its relation to a ray from the camera just adds more calculation to an already burdened CPU. It might be worth it just to give the blitter more work even if the blitter would take more time, assuming it runs concurrently with the CPU. It would depend on how much internal operation the CPU was performing (plenty for multiplies and divides).

Quote:
Originally Posted by Mrs Beanbag View Post
But if you are not doing hidden surface removal, how do you order the polygons? Simple Z-sorting is not enough for something like an L-shaped object, for example. Whereas with hidden surface removal one can often use a fixed drawing order.
My intuition tells me something else, but that's not proof of course. I want to say that any object for which surface normals work should also work using Z-sorting. I have no proof though so I'll have to think about it more.

Quote:
Originally Posted by Mrs Beanbag View Post
When polygons are very distant it might be quicker to fill them than to do a couple of multiplies required for testing visibility, but we have to do some multiplies anyway to calculate the screen co-ordinates of all the points.

EDIT: also consider the fill operation includes an idle cycle, which can be used by CPU.
That's a good point. The extra plane means 9120 of those extra fill cycles are idle cycles, so it makes things even closer.
mc6809e is offline  
Old 02 March 2015, 12:58   #29
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by mc6809e View Post
Oh, I just meant that since the CPU is usually busy with other calculations of all sorts while the blitter is less taxed, adding a cross product calculation for a surface normal plus dot product to see its relation to a ray from the camera just adds more calculation to an already burdened CPU. It might be worth it just to give the blitter more work even if the blitter would take more time, assuming it runs concurrently with the CPU. It would depend on how much internal operation the CPU was performing (plenty for multiplies and divides).
well first off, it occurred to me overnight, if you are not doing hidden surface removal, you can take your number of DMA cycles and double it! Well my intuition tells me a few multiplies would still be worth it.

We need to calculate a predicate based on a vector triple product of the triangle vertices (assuming we are working with triangles, but any face visibility can be calculated from any co-planar triangle).

P = (Ay.Bz - Az.By).Cx + (Az.Bx - Ax.Bz).Cy + (Ax.By - Ay.Bx).Cz [1]

= Ay.Bz.Cx - Az.By.Cx + Az.Bx.Cy - Ax.Bz.Cy + (Ax.By - Ay.Bx).Cz [2]

= (Bx.Cy - By.Cx).Az + (Ay.Cx - Ax.Cy).Bz + (Ax.By - Ay.Bx).Cz [3]

= {(Bx.Cy - By.Cx)/(Bz.Cz) + (Ay.Cx - Ax.Cy)/(Az.Cz) + (Ax.By - Ay.Bx)/(Az.Bz)}.(Az.Bz.Cz) [4]

but we already did Ax/Az, Ay/Az &c because these are just the screen co-ordinates of the points! so let ax = Ax/Az... and we don't care about multiplying by (Az.Bz.Cz) at all because we only want the sign of the result, not the magnitude! So assuming all of these are positive (but we can just use XOR otherwise)...

p = bx.cy - by.cx + ay.cx - ax.cy + ax.by - ay.bx [5]

= (ax-cx)(by-cy) - (ay-cy)(bx-cx) [6]

Sign(P) = Sign(p) xor Sign(Az) xor Sign(Bz) xor Sign(Cz) [7]

This is only two multiplications and it is good as long as none of the Z co-ordinates are zero (in which case we can't really compute the screen co-ordinates - but it simplifies [3] for us).

Quote:
My intuition tells me something else, but that's not proof of course. I want to say that any object for which surface normals work should also work using Z-sorting. I have no proof though so I'll have to think about it more.
How do you Z-sort a planar object anyway? Sorting points is easy, but which point do you use? The centre of mass?

Consider the following shape:

The sides of the polygon are "walls" from the perspective of the viewer at the bottom. Sorted by their centre point, we can see that faces 4 and 5 get incorrectly drawn over the top of face 3.

Last edited by Mrs Beanbag; 02 March 2015 at 13:25.
Mrs Beanbag is offline  
Old 02 March 2015, 22:59   #30
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 281
Excellent, Bean! Thanks!
mc6809e is offline  
Old 02 March 2015, 23:45   #31
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
of course the other way to do it is to pre-calculate all the normals in the object's co-ordinate system, and instead of rotating the normals, transform the camera position by the inverse object transform, and then use that for dot products. That's only 3 multiplies. You can use that also to do light source shading. Rotate the light source instead of the normals, which is one rotation per object and a dot product per face.

As for Z-sorting for the draw order of the objects, it might be better to sort by distance (or distance squared) and then do some smooth sort such as insertion sort, which has to do less work the fewer objects change order from one frame to the next (best case O(n)). That way, just spinning the camera round won't lead to any change in the ordering, and as long as object and camera motion is fairly sensible, the draw order won't radically change from one frame to the next so the sorting should stay near-linear.
Mrs Beanbag is offline  
Old 21 August 2015, 05:38   #32
ReadOnlyCat
Code Kitten

 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 46
Posts: 1,012
Quote:
Originally Posted by Mrs Beanbag View Post
of course the other way to do it is to pre-calculate all the normals in the object's co-ordinate system, and instead of rotating the normals, transform the camera position by the inverse object transform, and then use that for dot products. That's only 3 multiplies. You can use that also to do light source shading. Rotate the light source instead of the normals, which is one rotation per object and a dot product per face.
In my opinion that is still too much work to do in real time. I would preprocess the objects to produce visibility lists (one bit per face) depending on the angle. This way you only have to compute three dot products per object to determine which list to select and you immediately know which faces you can draw. The cost in memory is minimal (nb-faces bits * nb-view-angles / 8) and the drawing order can probably inferred similarly (4bits per face if restricted to 16 faces).
The visibility list for a 16 faces object would be fetched in a single access.

If an object has a complex geometry, just split it up until the parts can be reasoned about in the same way.

Remember the golden rule: "the fastest computation is the one you don't perform".

Quote:
Originally Posted by Mrs Beanbag View Post
As for Z-sorting for the draw order of the objects, it might be better to sort by distance (or distance squared) and then do some smooth sort such as insertion sort, which has to do less work the fewer objects change order from one frame to the next (best case O(n)). That way, just spinning the camera round won't lead to any change in the ordering, and as long as object and camera motion is fairly sensible, the draw order won't radically change from one frame to the next so the sorting should stay near-linear.
The same principle can be used here: encode the scene semantically and use this higher level knowledge to avoid computations.
Clustering objects by proximity allows to reduce the range on which to apply sorting: sort the groups by Z squared, then inside each group sort the objects.
Best case stays at O(n), but worst and average cases become much better.
You can also avoid sorting if you have higher level knowledge about object trajectories: if two objects move at constant speed the moment they reach equal Z can be computed in advance. It's a very specific and unlikely example but you get the idea: take one step back to see the bigger picture and you can remove a lot of work.

A bit more on-topic: lately I have been thinking about reducing blitter bandwidth when blitting polygons and Mrs Beanbag's suggestion to reorder the palette struck a chord.
It seems possible to reduce the number of bitplanes to draw by subdividing the screen in horizontal sections, each section with a roughly constant number of colors drawn. Areas with only a few colors drawn can have their palette reordered by the copper to blit on the minimum number of bitplanes possible.
This is not Amiga specific per se but the copper makes this extremely low overhead.

I haven't gotten into detailing the needed computations but I would say that this is at least worth considering.
ReadOnlyCat is offline  
AdSense AdSense  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Blitter poly-line draw & filling - the 100th Herpes Coders. Asm / Hardware 12 14 May 2014 02:34
CPU Filling vs. Blitter Filling Routine victim Coders. General 18 26 January 2014 03:15
Blitter filling routine used in games Codetapper Coders. General 2 26 January 2012 11:20
Filling with the blitter... Lonewolf10 Coders. Tutorials 7 13 September 2011 15:30
Would you like some polygons? Dalai Retrogaming General Discussion 7 05 October 2004 01:20

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:57.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Page generated in 0.35370 seconds with 11 queries