I don't think just grafting in the loader would be that difficult in theory. Apart from RAM the only real hassle I had in Uridium32 was with INT2 (music player clashes with loader).
But in practice, as Codetapper said, the bigger challenge would be to make the memory map less fragmented. This would need to be done first to provide space for the loader and buffers so they wouldn't be over written. This would probably be best achieved using WHDLoad as some kind of test harness and then remove the resload dependancies later.
If I can help you guys out with anything give me a nudge