View Single Post
Old 05 January 2018, 19:19   #1
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 819
Optimizing Wolf3D-style rendering

As I promised in the other Wolfenstein 3D thread, I started a new thread for a more technical discussion on texture mapped wolf3d-style rendering. Let's keep the discussion on topic, and leave daydreaming (and preferrably discussion about non-texturemapped approaches) in another thread

I built another preview of my routine (http://dekadence64.org/wolf3d_v2.lha), which should run at least a bit smoother than the previous one. Still room for more optimizing though.

I also included a binary blob of raycasted data (wolf.3d), if anyone wants to try out the wall rendering, without having to write a raycaster. The format is pretty simple, 1024 frames of 320 bytes each. Every frame consists of 2 bytes per slice (160 slices in total), first byte being the height of the wall (0-127) and second byte being the texture u-coordinate (0-63 for texture one, 64-127 for texture two, textures being 64x64 in size).

So, about the wall rendering. The simple approach would be to have unrolled loops of code for rendering the wall from top to bottom, basically amounting to a lot of:
Code:
...
move.b 0(a1),0(a0)
move.b 64(a1),160(a0)
move.b 128(a1),320(a0)
move.b 192(a1),480(a0)
...
For slightly better performance I'm not rendering top-down, but instead from the middle towards top and bottom (which makes generating the code on the fly slightly easier). Also, I've rotated the textures 90 degrees, so that I can in some cases read from the texture without offsets, making use of post increments.

So, for cases where the wall size is smaller than the height of the texture, I still use offsets. But when we start stretching the texture (wall size 64 and above), I drop the offsets and start doing post increments instead. So for example for doubled size I'd do:
Code:
...
move.b (a1),0(a0) ; zoom factor 2.0, each texel drawn two times
move.b (a1)+,160(a0)
move.b (a1),320(a0)
move.b (a1)+,480(a0)
...
Further improvements could be to optimize zoom factor >2.0 like this:
Code:
...
move.b (a1)+,d2 ; zoom factor of 3.0, each texel drawn three times
move.b d2,0(a0)
move.b d2,160(a0)
move.b d2,320(a0)
move.b (a1)+,d2
move.b d2,480(a0)
...
Could also be a good idea to have a look the cases for zoom factor <1.0, to see if a combination of post increments and addq.l #value,(a1) could speed up the rendering, like:
Code:
...
move.b (a1)+,0(a0)
move.b (a1)+,160(a0)
addq.l #1,a1 ; skipping an additional byte
move.b (a1)+,320(a0)
move.b (a1)+,480(a0)
...
One idea I had would be to rotate the whole chunkybuffer 90 degrees, which would let me draw to the buffer with the destination using (a0)+ instead of offset(a0), saving a lot of cycles. That would of course require rewriting the c2p-routine.

I hope my explanation made any sense, let's see if anyone might have some other (hopefully better) approaches for this.
britelite is offline  
 
Page generated in 0.04322 seconds with 11 queries