Basically it just mixing of the samples, 2 samples -> 1 sample (simplest approach: add both sample bytes and divide by 2). That way you can have 8 virtual channels even though only 4 real channels are available. Check the StarTrekker replayer for an example.
|