In my processor design MOVEM was easier to support than MOVEP. Now that is a dead project, not a complete 68k processor and optimized for high clock speeds (for a FPGA that is) so it may not have been representative. Still:
MOVEM is only complicated in that it uses a special format to indicate registers to store/load. Otherwise it consists of straight stores/loads using increment or decrement mode. This is handled with a sequencer in parallel with the decoder.
MOVEP in comparison require splitting/concatenation of register data, something not used elsewhere in the design. This means either one have to complicate the cache access path or use µops+a temporal register to handle that. It also stores/loads bytes while increasing the address by two so this requires special handling.
In short MOVEP touches more critical spots than MOVEM. For a speed demon design this can be a huge problem.