Thanks Alex. I found it via reverse engineering. The transpose values are stored per voice where voice is a single division in a pattern.
The note transpose is just added to the note index. So you can just change the note. I think the reason is because voices can reuse notes and maybe can save some bytes this way.
The sound transpose is just added to the note's instrument index. So you can also change the instrument by reusing a note as well.
Moreover each note has some flags which control if they are transposable. One bit for note transpose and one for sound transpose. By default a note is transposable in both ways.
|