I found the attached text file just recently while poking around in an old source archive and I thought it might be interesting. It explains how the "hardware chunky" mode that VK uses works and includes some asm source code to show how it's done in practice.
Make of it what you will