With all channels enabled in area mode the blitter will use every available cycle, so there are no "free" (unused) ones.
However, if you don't have blitter nasty enabled, you will find that you can actually fit in more productive work "for free" since the memory cycles "stolen" by the CPU would otherwise go to waste waiting for the blitter to finish.
The code before "WAITBLIT" needs 29 memory cycles and will complete after roughly 150 CCKs (It will go: Internal cycles, wait 3 CCKs, memory cycle) while your blit needs 16*5*4=320.
Inside the display area with 5 bplanes each doing 22 fetches per line there will be 223 - 22*5 (display DMA) - 227/5 (CPU stealing every fifth) = 68 or so left for the blitter, and the bltting will take ~4.7 scanlines to finish (~1.8 outside).
So if my estimates are right, you can get in around 51 memory accesses outside the display area, and 183 inside it with the CPU (-8 for the blitter wait loop) without slowing things down.
Try to put in 40 nops before WAITBLIT and see if it affects performance notably