Nice. Tried it with a few modules and seems to sound fine (but I'm not musician). Had a brief look at the source code, and I think you can speed up the mixing code with a relatively simple change. The mixing macros all look more or less like:
Code:
movem.w (a3,d2.l*X),d4{/d5}
;
add.w a4,d7
addx.l d1,d2
When unrolled you have a 2-3 (depending on X) cycle "change/use" register stall on d2 (MC68060UM ยง10.3) . If you move the add/addx sequence a few instructions up (which I think should be possible) it can be avoided.
Probably move.w (...),d4 \ ext.l d4 is also faster than movem.w (..),d4 since ext is (pOEP|sOEP), but would require more changes.