Unfortunately Voodoo 3 also supports configurable byteswapping and it is used in some modes.
Modes that currently have correct colors: can have direct access and should be fast enough. Modes that currently have wrong colors: will need indirect vram access and will become slower.
btw, at least 3.x Picasso96 allows 2 32-bit modes, ARGB and BGRA. One has correct colors, another does not.
|