This was very impressive. The main things are: organizing the pixels in a blitter-friendly interleaved format (2-pass blitter C2P), very tight pre-compiled CPU scalers using the stack register, running the blitter completely in parallel with the CPU-based rendering.
There were also a game specific tricks (how the visibility is pre-computed, only 3 ceiling heights and pre-computing each case, etc.).
I also liked his Wolf3D style demo with the blitter scaler. It looked kind of janky in motion, but using the blitter to do scaling was an interesting idea.
|