Would probably still eat up a lot of DMA time, wouldn't it?
Well here's the thing.

Currently I have to do 3 blits for each background plane, their sizes are 19 words wide by up to 184 pixels in depth (3,496 words of data).

So each frame it does 3 blits of 3,496 (10,488) to render the entire background with the blitter.

[Incidentally if you look closely on round 5 of Rygar I just about have enough cycles at the start of the frame for it not to need a double buffer

But with the hardware sprite technique described I would need to copy in 5 words by 184 depth (3,680 words of data) for each sprite, multiply that by the 4 sprites and it's (14,720).

So the way I look at things... 10,488 words are processed with my current solution versus 14,720 with my proposed solution, however... and this is the kicker, if I double buffer the hardware sprites and run the parallax at 25 frames (which is pretty normal for a background layer) then I can half the words processed per frame cutting it down to 7,360.

So frame 1, update buffer 1 with sprites 0 and 1, frame 2 update buffer 2 with sprites 2 and 3...loop.

I have a hangover today, so all this might be just a load of shit that I'm thinking...but it feels do-able.
