Datatypes is an interesting idea; the only problem with it is that Woopsi isn't an OS, it's a GUI library. The larger I make the library, the smaller the applications that use it will have to be (DS homebrew ROMs have a size limit of 4MB). I'm looking into Amiga bitmap font support, though. It might be that I write loads of functionality, but make it easy to strip out the bits people don't need, or just release several packages - light, normal, full-fat.
Regarding horizontal window movement, I didn't implement solid window moving. Well, I did, but only in the slow full-window redraw way, which is commented out in the code in favour of the XOR rectangle.
Solid window moving is a fairly easy problem to solve, though, so I might add it back in at some point. If a window is moved vertically by any number of pixels, the standard DMA copy routine will work. If we move up, we start at the first row of pixels in the window and copy them to the new location, then work down - any pixels overwritten have already been copied. If we move the window down, we start at the bottom row of pixels and copy them to the new location. Same idea - any overwritten pixels are already copied.
When moving just horizontally, we work out how far the window is moving. If it's moving more than the width of the window, we can just do a straight DMA copy, since we're not overwriting anything. If we move 10 pixels to the right (for example) and the window is 100 pixels wide, we know that we can copy the right-most 10 pixels to the new location without overwriting anything, so we do that. The next 10 pixels can be copied over the previous pixels we copied, so we do that. We continue to loop until all of the pixels have been copied - in this case, we need 10 iterations per row. Moving left is the same, except we start with the first 10 pixels.
You can see that larger movements are more efficient than smaller movements - moving the window in the above example by 1 pixel results in 100 iterations of the copy routine per row of pixels. However, since we're just copying memory and not having to completely recalculate the window, it should be many times faster than the code I've currently disabled.
An even better approach would be to buffer every window in RAM, then use a simple one-copy-per-row method of blitting the window to the screen. The major problem with this is that each window would then require much more RAM than they do at present (current RAM usage of each window is fairly negligible), and as RAM is also limited the trade-off for eyecandy isn't worth it.