More than 4 sound channels - how it works? - Page 6

musojon74 · 16 January 2016, 16:44

Slightly ot but related. I remember getting excited about St Pipemania having sampled music. Then I ripped the samples and all notes were separate as above.

pandy71 · 16 January 2016, 20:33

Quote:

Originally Posted by Mrs Beanbag

resample all the samples before playback, to all the actually used notes. Will use a lot of memory...

Partially - you can perform upsampling/donwsampling on the fly but still - CPU usage will be high and quality suboptimal.

ReadOnlyCat · 17 January 2016, 16:15

Quote:

Originally Posted by Megol

The Covox or the 8088? :P

A resistor ladder DAC coupled to an amplifier sounds surprisingly good for 8 bit playback, it's when one starts with 12..16 bits the resistor values begin to be critical IIRC.

If you want to hear something worse: [ Show youtube player ]

Jump to 6:59 for a 4 channel playback routine on the PC speaker :P

Completely off topic but this 8088 demo is insanely impressive.
Just incredible.

Mrs Beanbag · 17 January 2016, 17:21

Quote:

Originally Posted by pandy71

Partially - you can perform upsampling/donwsampling on the fly but still - CPU usage will be high and quality suboptimal.

yes that's why i said do it before playback, not on the fly.

meynaf · 18 January 2016, 11:06

Quote:

Originally Posted by Megol

In itself it doesn't do too much, helps keeping cache misses down + allows some shortcuts like simplifying changing pitch. It's in combination with other tricks it starts paying off big.

This kind of trick is useful for 68000 where we don't have much processing power. If you have problems with "cache misses" then you don't need these tricks at all.

Quote:

Originally Posted by Megol

Nope. Think table lookup with a small data cache -> 64*256 =16kiB, 16*256 = 4kiB.

You can't do that with a table lookup (or you have to say what exactly is in the table !).
And again, if you have a data cache, you don't really have the problem (your cpu is fast enough to do better).

Quote:

Originally Posted by Megol

It's been done before so... If one have short samples + memory not used by anything else then it can be useful. Sure, not a tool for arbitrary mixing but for playing a custom module.

I still don't see the point.

Quote:

Originally Posted by Megol

Maybe 68k but not for x86.

Not necessarily ! Prefetch can make that fail as well.

Quote:

Originally Posted by Megol

But the kind of SMC I'm talking about here uses large enough code blocks that flushing the cache isn't a problem. Perhaps I should call it dynamic code generation?

The time you'll take to generate the code will nullify the benefit i'm afraid...

Quote:

Originally Posted by Megol

Eh... Nope. Personal experience talking. More advantages the more channels are mixed of course.

Personal experience ? You've coded it ?

Quote:

Originally Posted by Megol

How do you handle per channel volumes if not by multiply? You may do the multiply via lookup tables but it's still there. In hardware one can "dither" samples but that is essentially multiplication via PWM+smoothing filter.

If using a lookup table you don't need a multiply. Well, of course you can consider that it's still a multiply in some way - it's just a precomputed one.
But the LUT has another advantage. A multiply would give you a 14bit entity, where a LUT gives you another 8bit value - it's mul & rescale in one instruction. Might also give you signed to unsigned conversion for free, which is better for adding.

Quote:

Originally Posted by Megol

And I thought we were talking about optimization not if a processor may mix channels?

If you have a fast enough cpu, all these quality loss tricks become useless. So the first question you might ask yourself is : on which cpu is the code gonna run.

By the way, what will all these tricks buy ? Fast mixing ? But why the heck was mixing needed at first place ? To get more channels ? But for what did one need more channels ? For better quality music ? But many tricks listed in this thread (such as less notes, less volume levels) will in fact lower the quality !
I'm afraid that for a game the usual 3ch music + 1ch sfx is still the best bet and if you need more channels then anything badly lowering the quality is out of question...

Quote:

Originally Posted by Mrs Beanbag

resample all the samples before playback, to all the actually used notes. Will use a lot of memory...

And will take a lot of time at startup.

If you have heaps of memory then perhaps you also have heaps of disk space and pre-mixing the whole music would give better results...

pandy71 · 18 January 2016, 11:40

Quote:

Originally Posted by Mrs Beanbag

yes that's why i said do it before playback, not on the fly.

Sample skipping/sample repeating can be performed on the fly - not sure how fast plain MC68k can perform such operations on bytes - lowpass filtering for sure is beyond real time MC68k capabilities.

Perhaps blitter can be used in a clever way to perform such operations (on samples without filtering)?

meynaf · 18 January 2016, 11:55

Quote:

Originally Posted by pandy71

lowpass filtering for sure is beyond real time MC68k capabilities.

For 68000 yes. But 030+ can do that and more.
(Anyway you don't need it, it's the replay hardware which does that stuff.)

pandy71 · 18 January 2016, 12:53

Quote:

Originally Posted by meynaf

For 68000 yes. But 030+ can do that and more.
(Anyway you don't need it, it's the replay hardware which does that stuff.)

Only if reply HW is equipped with some decent SRC and sadly this is not Amiga case - luckily this can be implemented in sample creation stage at a cost of lost quality - low pass filter applied to sample allowing you to perform mixing with sample rate conversion without further low pass filtering - i assume that to further simplify only even sampling factors are applicable...

daxb · 18 January 2016, 15:12

Offtopic but...
I would guess the best solution for music + sfx in games is http://aminet.net/mus/play/ptplayer.lha by phx or similar stuff. It exists (incl. source code), can be used and you don`t need to shrink quality or artists freedom. Excerpt from ptplayer.readme:

Quote:

This player is quite optimized and has some useful features for game
developers:

- Insert external sound effects into the replayed module.

- A fast master volume for the replayed music.

- No busy waiting. DMA and repeat pointers are set with timer interrupts.

- E8 command can be used as a trigger for your main program.

- Lots of tables for best performance. No multiplications or divisions.

The sound fx system will always block the channel which would be free for
the longest period. This has the effect that the replayed song is often not
disturbed at all. Up to four sound fx can be played at the same time (which
would block all four music channels for this period).

The master volume is always applied to the music, but does not affect
external sound fx.

For multichannel (mostly 8 - 16 ch) 030+ is needed. Of course it depends on the kind of game. If the game needs all or most of cpu time multichannel is a no go.

Megol · 19 January 2016, 14:50

Quote:

Originally Posted by meynaf

This kind of trick is useful for 68000 where we don't have much processing power. If you have problems with "cache misses" then you don't need these tricks at all.

So you idea of optimizing only applies to uncached processors?

Personally I think even a relatively modern processor with SIMD instructions should use optimized code. That includes sound mixing.

Quote:

You can't do that with a table lookup (or you have to say what exactly is in the table !).
And again, if you have a data cache, you don't really have the problem (your cpu is fast enough to do better).

The whole idea of optimization is to have fast execution.

I gave cache misses as an example of the advantage, it can help even on a non-cached processor. Think of it as allowing a routine to process several samples where the volume and pitch is constant.

Quote:

I still don't see the point.

It is a way to make multi-channel playback possible. I thought that was what we were discussing but given your response it seems I was mistaken.

Quote:

Not necessarily ! Prefetch can make that fail as well.

No. I've written self modifying code on processors where it is a problem and the solution is easy: a branch. It flushes the prefetch queue unless the processor is buggy.
[Not that it is a problem on x86, since the 80286 the processor detects writes into the prefetched queue and flushes it.]

Quote:

The time you'll take to generate the code will nullify the benefit i'm afraid...

Nope. Unless we are talking about a processor that can't do anything but mix sounds - as my original post stated one have to choose the mixing approach to fit the target. Generated and self-modifying code is still a useful tool for modern machines, though the kind of SMC that one use for e.g. 6502 optimization isn't too useful given the overheads.

Quote:

Personal experience ? You've coded it ?

Yes. Several routines.

Quote:

If using a lookup table you don't need a multiply. Well, of course you can consider that it's still a multiply in some way - it's just a precomputed one.
But the LUT has another advantage. A multiply would give you a 14bit entity, where a LUT gives you another 8bit value - it's mul & rescale in one instruction. Might also give you signed to unsigned conversion for free, which is better for adding.

Personally I don't like that kind of truncating mixing unless one really have to. One can often hear the quality drop.

Quote:

If you have a fast enough cpu, all these quality loss tricks become useless. So the first question you might ask yourself is : on which cpu is the code gonna run.

Yes, as I wrote earlier one have to choose how to do it with a specific target in mind. For a really slow processor (8086, 6502, 68k (with time to do a game in parallel)) the only tricks usable is to reduce quality and make a tight mixing routine.

Quote:

By the way, what will all these tricks buy ? Fast mixing ? But why the heck was mixing needed at first place ? To get more channels ? But for what did one need more channels ? For better quality music ? But many tricks listed in this thread (such as less notes, less volume levels) will in fact lower the quality !
I'm afraid that for a game the usual 3ch music + 1ch sfx is still the best bet and if you need more channels then anything badly lowering the quality is out of question...

For some games the quality isn't critical but the amount of sounds playing may be. Fixed playback frequency (=no music or at least very limited music playback) and lower precision samples with a fixed per sound volume isn't a problem then.
But (again from experience) if one want to mix 32+ 16bit channels with sample interpolation (at least LERP), 256 volume levels and perhaps even a dithered 8bit playback option then one have to optimize even on a relatively fast machine with caches.

Saying that one doesn't _need_ to optimize for a relatively modern machine may be correct if one likes the sloppy shit that counts as software nowadays. I don't.

meynaf · 20 January 2016, 10:43

Quote:

Originally Posted by Megol

So you idea of optimizing only applies to uncached processors?

Personally I think even a relatively modern processor with SIMD instructions should use optimized code. That includes sound mixing.

What is suggested here isn't just optimizing. It's damaging the sound or greatly reducing the artist's freedom, for getting a few extra clock cycles available. If you have a fast cpu, whether it takes 1% or 0.1% charge doesn't matter. So you're better off by just making your mixing code reasonably fast, without using any dirty trick.

Quote:

Originally Posted by Megol

The whole idea of optimization is to have fast execution.

I gave cache misses as an example of the advantage, it can help even on a non-cached processor. Think of it as allowing a routine to process several samples where the volume and pitch is constant.

This doesn't say how a table can be useful here.

Quote:

Originally Posted by Megol

It is a way to make multi-channel playback possible. I thought that was what we were discussing but given your response it seems I was mistaken.

Multi-channel playback is already possible. It does not necessarily need tricks that damage the sound.

Quote:

Originally Posted by Megol

No. I've written self modifying code on processors where it is a problem and the solution is easy: a branch. It flushes the prefetch queue unless the processor is buggy.
[Not that it is a problem on x86, since the 80286 the processor detects writes into the prefetched queue and flushes it.]

And now you get a costly flush that voids the speed benefit.
Anyway we're supposed to be on Amiga here, so whether x86 support SMC or not, is irrelevant.

Quote:

Originally Posted by Megol

Nope. Unless we are talking about a processor that can't do anything but mix sounds - as my original post stated one have to choose the mixing approach to fit the target. Generated and self-modifying code is still a useful tool for modern machines, though the kind of SMC that one use for e.g. 6502 optimization isn't too useful given the overheads.

I still don't think it's useful for mixing audio, especially on 68000.
There IS overhead for writing that code.
There IS overhead in branching to it.
So unless the gain is big - which i don't think it's gonna be - that method isn't worth.
But feel free to prove me wrong by posting a mixing routine here

Quote:

Originally Posted by Megol

Yes. Several routines.

Then why not posting a few here ?

Quote:

Originally Posted by Megol

Personally I don't like that kind of truncating mixing unless one really have to. One can often hear the quality drop.

There isn't much drop in using a 8bit-to-8bit 256 vals x 64 vols table.
The mere fact you have to fit the end result to 8bit counts a lot more.

Quote:

Originally Posted by Megol

Yes, as I wrote earlier one have to choose how to do it with a specific target in mind. For a really slow processor (8086, 6502, 68k (with time to do a game in parallel)) the only tricks usable is to reduce quality and make a tight mixing routine.

You're comparing apples and oranges.
The 6502 is far too slow to even consider mixing audio in real time. Barely playing one channel at low frequency is already taking most of its time, forbidding a game in parallel.
The 68k is a lot more powerful. A simple 68000 can mix 4ch in 25khz. A 68020 can do 16. A 68030 can do 32ch with interpolation.
The 8086 is intermediate between 6502 and 68000 - but again, only 68k is relevant here.

Quote:

Originally Posted by Megol

For some games the quality isn't critical but the amount of sounds playing may be. Fixed playback frequency (=no music or at least very limited music playback) and lower precision samples with a fixed per sound volume isn't a problem then.

For some games ? Which ones ? I can't see any example that currently exists.
If you have many possible simultaneous SFX then you don't need that many channels ; a simple priority system is enough.

Quote:

Originally Posted by Megol

But (again from experience) if one want to mix 32+ 16bit channels with sample interpolation (at least LERP), 256 volume levels and perhaps even a dithered 8bit playback option then one have to optimize even on a relatively fast machine with caches.

This isn't the same as putting all the artist's freedom to trash by using fixed pitch+volume, damage the quality by using pre-scaled samples, or doing potentially incompatible SMC code.

Quote:

Originally Posted by Megol

Saying that one doesn't _need_ to optimize for a relatively modern machine may be correct if one likes the sloppy shit that counts as software nowadays. I don't.

Again what's suggested here isn't just optimizing.
And high quality audio mixing is no match for actual cpus.

drhex · 26 January 2016, 13:26

Newcomer to the forum here!

When using the Bresenham algorithm for drawing lines, one could have almost any ratio between the width and height of the line and so the code must be general enough to handle an arbitrary ratio.
But when mixing samples, there ought to be a more managable number of possible ratios between the two samplerates (assuming two virtual channels per hardware channel), meaning that it should be possible to write a handtuned version for each ratio which could then omit the compares and branches required in a general Bresenham.

(Maybe that is what Megol was referring to with "Reduce the number of sample frequencies + use generated code for changing pitch."?)

meynaf · 26 January 2016, 14:22

Quote:

Originally Posted by drhex

Newcomer to the forum here!

Then welcome

Quote:

Originally Posted by drhex

When using the Bresenham algorithm for drawing lines, one could have almost any ratio between the width and height of the line and so the code must be general enough to handle an arbitrary ratio.
But when mixing samples, there ought to be a more managable number of possible ratios between the two samplerates (assuming two virtual channels per hardware channel), meaning that it should be possible to write a handtuned version for each ratio which could then omit the compares and branches required in a general Bresenham.

(Maybe that is what Megol was referring to with "Reduce the number of sample frequencies + use generated code for changing pitch."?)

Having a branch in a mixing routine for frequency shift would be too costly.
So frequency shift is done with fixed-point. Something like that :

Code:

 add.w d0,d1
 addx.w d2,d3

Here d2:d0 is the sampling ratio, d1 is a counter, and d3 is the offset inside the sample.
No branch. Nothing that can be optimised for a specific ratio.

Mrs Beanbag · 26 January 2016, 19:39

this is probably better than the

Code:

swap D1
add.l D0,D1
swap D1

method i've seen elsewhere

drhex · 27 January 2016, 09:01

Ok, with addx there is no need for branches, nice.

Quote:

Originally Posted by meynaf

Nothing that can be optimised for a specific ratio.

I think there is. Let's first look at the inner loop in the general case that can handle any ratio:

a0 points to prescaled input samples to be played back at the rate of the hardware output
a1 points to prescaled input samples to be played back at 0.8 times the hardware rate
a2 output of mixed samples
d0 fractional rate, preloaded with 0.8 * 65536
d1 integer index of a1
d3 current fractional position of a1
d4 preloaded with zero (integer of 0.8)

Code:

add.w   d0, d3  ; advance fractional position
addx.w d4, d1  ; move forward 0 or 1 step depending on carry
move.b (a1,d1.w), d2   ; load sample
add.b    (a0+), d2   ; mix samples
move.b d2, (a2+)   ; write output

The handtuned version I was referring to would look like this (for handling 5 output bytes at ratio 0.8)

Code:

REPEAT 4
  move.b (a0+), d2   ; load sample
  add.b    (a1+), d2   ; mix samples
  move.b d2, (a2+)   ; write output
ENDREPEAT
  move.b (a0+), d2   ; load sample
  add.b   (a1), d2      ; mix samples without advancing position
  move.b  d2, (a2+)

meynaf · 27 January 2016, 09:32

Quote:

Originally Posted by drhex

d4 preloaded with zero

D4 being the integer part of the fixpoint, it can be 0 or it can be 1 and sometimes more.
It can happen that the sample is of a higher (or equal) frequency than the replay freq (a likely story if we want to grab every cpu cycle we can !).

Quote:

Originally Posted by drhex

The handtuned version I was referring to would look like this (for handling 5 output bytes at ratio 0.8)

This doesn't work. The two channels you're mixing here MUST have the exact same pitch - making frequency change useless as they could just be played at the original freq. And of course you won't play music this way.

drhex · 27 January 2016, 10:21

Quote:

Originally Posted by meynaf

This doesn't work. The two channels you're mixing here MUST have the exact same pitch

To be fair, I have not actually tested the code so clearly there could be something wrong with it. I realize it will not sound optimal due to the lack of interpolation, but it has been said before that interpolation would be too much for the cpu to handle.
The code in my previous post would advance 4 bytes into one sample for every 5 bytes into the other, thus maintaining a ratio of 0.8.

Would you please explain in more detail why you think my solution would not work (and is it the optimized one that doesn't work or my attempt at a general solution as well?)

meynaf · 27 January 2016, 11:26

Quote:

Originally Posted by drhex

To be fair, I have not actually tested the code so clearly there could be something wrong with it. I realize it will not sound optimal due to the lack of interpolation, but it has been said before that interpolation would be too much for the cpu to handle.
The code in my previous post would advance 4 bytes into one sample for every 5 bytes into the other, thus maintaining a ratio of 0.8.

Would you please explain in more detail why you think my solution would not work (and is it the optimized one that doesn't work or my attempt at a general solution as well?)

Your code is for a ratio of 0.8, but this ratio must be the same for both channels and so your code can't mix two channels with a different pitch.
The code may work as expected, but it doesn't do anything useful.

drhex · 27 January 2016, 12:06

I am assuming here that it would require too much memory to have a sampling of every instrument for every key on the keyboard, and that one therefore has only one sampling per octave with the same sampling frequencies used for all instruments.
In order to playback the other notes, the playback frequency will have to be adjusted accordingly.
Now, if one wants a single hardware voice to playback two simultaneous notes, those two notes may not have the same frequency. Their frequencies may perhaps relate to each other as 1 to 0.8. Then one could use the code above to have them played back at this ratio with respect to each other.
One handtuned version for each ratio that could occur for two simultaneous notes would have to be written, of course.
That would make it possible to play two virtual voices per hardware voice with one of them having its playback frequency controlled by hardware and the other having its frequency determined by which handtuned code is activated.
Thus: a doubling of the number of available voices where each virtual voice can have its own frequency, and no need for "add + addx" in the inner loop.
Sounds useful to me.

meynaf · 27 January 2016, 12:41

Quote:

Originally Posted by drhex

I am assuming here that it would require too much memory to have a sampling of every instrument for every key on the keyboard, and that one therefore has only one sampling per octave with the same sampling frequencies used for all instruments.
In order to playback the other notes, the playback frequency will have to be adjusted accordingly.
Now, if one wants a single hardware voice to playback two simultaneous notes, those two notes may not have the same frequency. Their frequencies may perhaps relate to each other as 1 to 0.8. Then one could use the code above to have them played back at this ratio with respect to each other.
One handtuned version for each ratio that could occur for two simultaneous notes would have to be written, of course.
That would make it possible to play two virtual voices per hardware voice with one of them having its playback frequency controlled by hardware and the other having its frequency determined by which handtuned code is activated.
Thus: a doubling of the number of available voices where each virtual voice can have its own frequency, and no need for "add + addx" in the inner loop.
Sounds useful to me.

Seems i misinterpreted it a little. It's not two channels with the same freq, but two channels with a ratio between them.
But it doesn't change a thing. In fact it's even worse...

You won't play music this way : there are too many different notes to be played. And the ratios are rarely, if ever, something as simple as 0.8. Even if you have only 10 possible freqs, think about the combinations - that's not just 10 routines to write. 10 possible freqs for two channels mean 100 combinations. You can swap the min and max to reduce that to about half. Some are multiple of each other and this removes a few combinations as well. But that's still many, many routines to write, and that for a very limited set of possible frequencies.

You're also taking the risk of having sound distortion. If you change the replay freq, you have to do it exactly at the same time you start the new buffer. So you're dependent on cpu speed and dma speed.

There are many other possible problems, such as when your output buffer size isn't an integer multiple of what your inner loop produces...

26 January 2016, 19:39	#114
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	this is probably better than the Code: swap D1 add.l D0,D1 swap D1 method i've seen elsewhere

27 January 2016, 12:06	#119
drhex Registered User Join Date: Jan 2016 Location: Knivsta / Sweden Posts: 20	I am assuming here that it would require too much memory to have a sampling of every instrument for every key on the keyboard, and that one therefore has only one sampling per octave with the same sampling frequencies used for all instruments. In order to playback the other notes, the playback frequency will have to be adjusted accordingly. Now, if one wants a single hardware voice to playback two simultaneous notes, those two notes may not have the same frequency. Their frequencies may perhaps relate to each other as 1 to 0.8. Then one could use the code above to have them played back at this ratio with respect to each other. One handtuned version for each ratio that could occur for two simultaneous notes would have to be written, of course. That would make it possible to play two virtual voices per hardware voice with one of them having its playback frequency controlled by hardware and the other having its frequency determined by which handtuned code is activated. Thus: a doubling of the number of available voices where each virtual voice can have its own frequency, and no need for "add + addx" in the inner loop. Sounds useful to me. Last edited by drhex; 27 January 2016 at 12:41.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Sound channels switched?	bLAZER	support.WinUAE	21	28 October 2014 08:43
A600: missing sound channels	cosam	support.Hardware	28	23 May 2010 06:43
More that 4 Sound Channels???	Dragon3d	support.WinUAE	8	01 February 2008 17:30
shufflepuck cafe 4 channels sound is crazy	turrican3	support.WinUAE	5	08 November 2007 15:41
help sound 4 channels	turrican3	support.WinUAE	37	13 April 2007 09:17

16 January 2016, 16:44	#101
musojon74 Registered User Join Date: Dec 2007 Location: The World Age: 50 Posts: 476	Slightly ot but related. I remember getting excited about St Pipemania having sampled music. Then I ripped the samples and all notes were separate as above.

26 January 2016, 13:26	#112
drhex Registered User Join Date: Jan 2016 Location: Knivsta / Sweden Posts: 20	Newcomer to the forum here! When using the Bresenham algorithm for drawing lines, one could have almost any ratio between the width and height of the line and so the code must be general enough to handle an arbitrary ratio. But when mixing samples, there ought to be a more managable number of possible ratios between the two samplerates (assuming two virtual channels per hardware channel), meaning that it should be possible to write a handtuned version for each ratio which could then omit the compares and branches required in a general Bresenham. (Maybe that is what Megol was referring to with "Reduce the number of sample frequencies + use generated code for changing pitch."?)

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)