The problem with that chip is that you have to make the conversion manually.
I don't remember most of the details, but it was lame indeed.
It is basically a serial to parallel converter, you write the source chunky pixels in a specific order that triggers the hw (super cheap logic, you got the reverse logic already on the video output), and read back the bitplane values.
It is considerably faster, than anything you could write in sw on a stock a1200, but still you have to make tons of memory accesses to read/write the values, only the middle processing is handled. Since we speak about a stock setup all the memory access is subject to bus arbitration on chip memory, that makes the whole experience painfully slow and wasteful.
A dedicated framebuffer would have helped a lot more.
Thing to notice: c2p was needed to compensate for p2c (actually s2p vs p2s) conversion present in the hw...
So even better for the miggy architecture, a trigger that would have turned on/off the already existant p2c hw... the reason why c2p conversion was needed in the first place