8chn audio mixing: 5 to 1 (+3) vs. 4x2 to 4x1

TCH · 05 July 2022, 19:33

Out of curiousity: which is supposed to be the faster approach; if one mixes 5 channel into one (and uses 3 real channels aside the mixed), or if mixes 4x2 channels to 4x1?

Don_Adan · 05 July 2022, 23:06

Quote:

Originally Posted by TCH

Out of curiousity: which is supposed to be the faster approach; if one mixes 5 channel into one (and uses 3 real channels aside the mixed), or if mixes 4x2 channels to 4x1?

I dont think that exist 5 to 1 mixing, i know only 4 to 1 mixing plus 3 real channels. For very easy mixing (same period, same volume) 2 to 1 is fastest, useful fo SFX. For music 4 to 1 is fastest. Anyway this is dependent too to coder knowledge and code quality.

TCH · 06 July 2022, 08:57

4 channels mixed to 1, plus 3 real channels are only 7 channels. There are 8 channel tracker programs, like OctaMED. If 4 to 1 is faster for music, then they must use 5 to 1, right? Why 4 to 1 is faster, if volume or period differences can occur?

Hannibal · 06 July 2022, 08:58

I think 4-to-1 mixing+3 real channels is faster than 3 2-to-1 +1 real, because the CPU only does 4 reads 1 write as opposed to 6 reads 3.

I never tried the 3x2-to-1 mixer, but I made a 4-channel sfx mixer (limited to same period and full or half volume per sample) for fun a while back. I got some really fast inner loops by using codegen for all permutations of how many active samples were at half or full volume level. I used the 3 regular channels for music, and the other 4 could then either be sfx drums or sfx. Since they are software mixed, those samples can stay in fastmem, and I only needed just over 1k chip mem for a double buffer that was updated every frame.

It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together

TCH · 06 July 2022, 09:15

That makes sense. And 8 reads and 4 writes would take even more time. I should thought of that.

I try to make something similar, although only for music right now.

You wrote Banshee? That game was awesome.

pink^abyss · 06 July 2022, 11:58

Quote:

Originally Posted by Hannibal

It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together

Music+Sfx in Amiga games: Twice the fun, double the trouble

roondar · 06 July 2022, 12:45

Quote:

Originally Posted by Hannibal

I think 4-to-1 mixing+3 real channels is faster than 3 2-to-1 +1 real, because the CPU only does 4 reads 1 write as opposed to 6 reads 3.

I experimented with software mixing a while back, and I concur, you can certainly mix 3x2 to 1 (and it will most probably sound better than 4 to 1), but there's more overhead involved.

Plus, a 4-to-1 mixer can abuse the fact that reading/adding/writing long values is quite a bit quicker than reading/adding/writing bytes or words, while not leading to a noticable difference in sound quality.

Quote:

It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together

That was exactly the point of my experiments, yes

. Software mixing can be reasonably fast and probably fast enough for SFX in most games. I wrote an article about it which includes a video explanation and source code/demo program for such a mixer (it mixes 4 channels into one@11KHz using 3,2% CPU time on an A500), so if you (or any other thread readers) are interested, it can be found here: http://powerprograms.nl/amiga/audio-mixing.html

no9 · 06 July 2022, 13:39

Quote:

Originally Posted by TCH

I try to make something similar, although only for music right now.

Looking forward to hear that!

Once on some Atari forum I've found such information about Face The Music Amiga tracker. If that's correct I wonder if this might bring some gains and how significant would it influence the quality. FTM wasn't that bad quality-wise.

Quote:

There were a few other Amiga music players with 8 channels; FaceTheMusic had a particularly clever implementation which ran the hardware channel at the same rate as the highest sampling rate of the pair. Therefore only one channel needed to be resampled and only that channel needed sample end checking inside the tight mixing loop.

Don_Adan · 06 July 2022, 14:21

Quote:

Originally Posted by no9

Looking forward to hear that!

Once on some Atari forum I've found such information about Face The Music Amiga tracker. If that's correct I wonder if this might bring some gains and how significant would it influence the quality. FTM wasn't that bad quality-wise.

Face The Music's mixing idea/replay was used in Delitracker 2 (for 8 Voices NotePlayer). Is good enough, but often can freezes 68000 (7 MHz). Then You can test quality of this mixer for every 4-8 channels soundformat which has replay as "_note" version.

no9 · 06 July 2022, 15:55

@Don_Adan thanks. I did actually listen to this in the Face The Music itself. It was ok with mild occasional distortions. But for some reason I don't trust that implementation of this alghorithm in the FTM was optimized to the last bit. Maybe it could be implemented better, or there are the other limitations in this approach I'm not aware of.

Hannibal · 06 July 2022, 21:09

In case anyone is interested, here are my inner loops from the mixer for fixed rate half/full volume mixing. You need samples eor'ed with $80808080 as that was faster for the mixing code. It is loop unrolled to do 8 samples per loop, so samples must be 8-byte aligned.

Quality loss: there isn't any quality loss for a single full-volume 8-bit sample, a very slight quality loss for multiple samples if they are loud, as the 8-bit values are summed and then clamped to +127 to -128. Half-volume samples lose 1 bit of precision. The main reason I stayed away from variable pitch is that you lose a lot of quality, even if you invest in linear interpolation and use a 28khz playback buffer. Plus, it's slow as hell to mix :-)

Other uses: Having a looping sample buffer like this also makes it easy (or at least, easier) to stream samples from disk. I didn't try, but also think you can use the buffers for basic echo/reverb, which combined with the low pass filter could add a little more realism and variety to different regions of a game.

Playback Buffers: I didn't include the code that manages the double buffering, which was pretty tricky (and that code isn't clean enough to share). For me it runs in vblank interrupt and balances back and forth between the 2 nearest buffer sizes. You could also do this mixing in an audio interrupt, but then you can't control where during the vblank the mixing happens - or you could triple-buffer and on some frames just skip the third buffer.

Code Format: Sorry about all the weird alignment - My autoformatter assumes tab=4 spaces.

Code:

cColorClocksPerSecondPAL:		equ	3546895
cMinColorClocksPerSamplePAL:	equ	124	; approx 28603hz samples - 121-124 for channel 1 to 4


cMixingBufferSegmentSizeBits:	equ	3
cMixingBufferSegmentSize:		equ	(1<<cMixingBufferSegmentSizeBits)
cMixingFrameBufferSize:			equ	((cColorClocksPerSecondPAL/(cMinColorClocksPerSamplePAL*50))+cMixingBufferSegmentSize)&(-cMixingBufferSegmentSize)



.mixerInnerLoopTable:
		dc.w	.innerLoop_0_0-.mixerInnerLoopTable
		dc.w	.innerLoop_0_1-.mixerInnerLoopTable
		dc.w	.innerLoop_0_2-.mixerInnerLoopTable
		dc.w	.innerLoop_0_3-.mixerInnerLoopTable
		dc.w	.innerLoop_0_4-.mixerInnerLoopTable
		dcb.w	3, 0

		dc.w	.innerLoop_1_0-.mixerInnerLoopTable
		dc.w	.innerLoop_1_1-.mixerInnerLoopTable
		dc.w	.innerLoop_1_2-.mixerInnerLoopTable
		dc.w	.innerLoop_1_3-.mixerInnerLoopTable
		dcb.w	4, 0

		dc.w	.innerLoop_2_0-.mixerInnerLoopTable
		dc.w	.innerLoop_2_1-.mixerInnerLoopTable
		dc.w	.innerLoop_2_2-.mixerInnerLoopTable
		dcb.w	5, 0

		dc.w	.innerLoop_3_0-.mixerInnerLoopTable
		dc.w	.innerLoop_3_1-.mixerInnerLoopTable
		dcb.w	6, 0

		dc.w	.innerLoop_4_0-.mixerInnerLoopTable

		; registers used when entering mixer inner loop
		; a5=destination (frame B)
		; d0=samples needed this inner loop-1



		;mixer inner loop registers
			; a1-a4=samples (UNSigned 8-bit: 0-255)
			; a0=clamptable
			; d1=accumulator for sample 1 and 3
			; d2=accumulator for sample 0 and 2
			; d3/d4=work registers
			; d5=mask for sample 1 and 3 ($00ff00ff)
			; d6=mask for sample 0 and 2 ($ff00ff00)
			; d7=0
ReadFirstVoice: macro								; Private    : used for inner loop
				move.l	(a1)+, d1
				move.l	d1, d2
				and.l	d5, d1
				and.l	d6, d2
			endm
AddVoice:	macro									; Private    : used for inner loop
				move.l	(\1)+, d3
				move.l	d3, d4
				and.l	d5, d3
				and.l	d6, d4
				add.l	d3, d1
				add.l	d4, d2
				addx.l	d7, d2
			endm
ShiftHalfVolume: macro								; Private    : used for inner loop
				lsr.l	#1, d1
				ror.l	#1, d2
				and.l	#$0fff0fff, d1
				and.l	#$ff0fff0f, d2
			endm
StoreVoice:	macro									; Private    : used for inner loop
				ror.l	#8, d2
				move.b	0(a0, d2.w), 2(a5)
				move.b	0(a0, d1.w), 3(a5)
				swap	d2
				swap	d1
				move.b	0(a0, d2.w), 0(a5)
				move.b	0(a0, d1.w), 1(a5)
				add.w	#4, a5
			endm
StoreVoiceHalfsOnly: macro							; Private    : used for inner loop
			ShiftHalfVolume
			StoreVoice
		endm

DefineInnerLoop: macro								; Private    : Generates optimized mixer inner loop
													; Inputs     : Number of half volume samples, number of full volume samples
.innerLoop_\1_\2:
			; read active voice data pointers into a1-a4 into registers
			if		(\1+\2)=0
				moveq	#0, d5
			endif
			if		(\1+\2)=1
				move.l	MixerData_SortedVoicePtrs(a4), a1
				move.l	MixerVoice_Ptr(a1), a1
			endif
			if		(\1+\2)=2
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a2
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
			endif
			if		(\1+\2)=3
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a3
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
				move.l	MixerVoice_Ptr(a3), a3
			endif
			if		(\1+\2)=4
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a4
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
				move.l	MixerVoice_Ptr(a3), a3
				move.l	MixerVoice_Ptr(a4), a4
			endif
			if		(\1+\2)>1
				lea		ClampTable-(\1*$40+\2*$80)(pc), a0
				move.l	#$00ff00ff, d5
				if		MixerVersion=1
					move.l	#$ff00ff00, d6
				endif
				if		MixerVersion=2
					move.w	#$0fff, d6
				endif

				moveq	#0, d7
			else
				if		(\1=1) & (\2=0)
					move.l	#$80808080, d4
					move.l	#$fefefefe, d5
				endif
				if		(\1=0) & (\2=1)
					move.l	#$80808080, d4
				endif

			endif
.loop_\1_\2:
			; mixes one segment from a1-a4 into a5
			if		(\1+\2)=0
				; no channels
				rept	cMixingBufferSegmentSize/4
					move.l	d5, (a5)+
				endr
			endif
			if		(\1=1) & (\2=0)
				; single channel half volume
				rept	cMixingBufferSegmentSize/4
					move.l	(a1)+, d2
					if		1
						; correct
						eor.l	d4, d2
						and.l	d5, d2
						move.l	d2, d3
						ror.l	#1, d2
						and.l	d4, d3
						add.l	d3, d2
					else
						; fast
						and.l	#$fefefefe, d2
						ror.l	#1, d2
						sub.l	#$40404040, d2
					endif
					move.l	d2, (a5)+
				endr
			endif
			if		(\1=0) & (\2=1)
				; single channel full volume
				rept	cMixingBufferSegmentSize/4
					move.l	(a1)+, d2
					eor.l	d4, d2
					move.l	d2, (a5)+
				endr
			endif
			if		(\1+\2)>1
				rept	cMixingBufferSegmentSize/4
					; available registers:
					; d2-d7
					; d2-d6=work registers
					if		\1>=1
						ReadFirstVoice
					endif
					if		\1>=2
						AddVoice a2
					endif
					if		\1>=3
						AddVoice a3
					endif
					if		\1>=4
						AddVoice a4
					endif
					if		\2=0
						StoreVoiceHalfsOnly
					else
						if		((\1+\2)>=1) & (\1<1)
							; there are only full volume voices
							ReadFirstVoice
						else
							; there is a mix between half and full voices, so shift the accumulated values before adding fu
							ShiftHalfVolume
						endif
						; if there are at least 2 voices in total, but less than 2 are half volume, add voice 2 at full volume
						if		((\1+\2)>=2) & (\1<2)
							AddVoice a2
						endif
						if		((\1+\2)>=3) & (\1<3)
							AddVoice a3
						endif
						if		((\1+\2)>=4) & (\1<4)
							AddVoice a4
						endif
						StoreVoice
					endif
				endr
			endif
			dbra	d0, .loop_\1_\2
			if		(\1+\2)>1
				lea		MixerData(pc), a4
			endif

			bra		.innerLoopComplete
		endm
		DefineInnerLoop 0, 0
		DefineInnerLoop 0, 1
		DefineInnerLoop 0, 2
		DefineInnerLoop 0, 3
		DefineInnerLoop 0, 4
		DefineInnerLoop 1, 0
		DefineInnerLoop 1, 1
		DefineInnerLoop 1, 2
		DefineInnerLoop 1, 3
		DefineInnerLoop 2, 0
		DefineInnerLoop 2, 1
		DefineInnerLoop 2, 2
		DefineInnerLoop 3, 0
		DefineInnerLoop 3, 1
		DefineInnerLoop 4, 0



	dcb.b	cMixerVoiceCount*$80-$80, $80
.temp:									set		$80
	rept	128
		dc.b	.temp
.temp:									set		.temp+1
	endr
ClampTable:
.temp:									set		$0
	rept	128
		dc.b	.temp
.temp:									set		.temp+1
	endr
	dcb.b	cMixerVoiceCount*$80-$80, $7f

DanScott · 07 July 2022, 00:18

there might be a quicker way of doing the clamping, to avoid those indexed memory reads from the clamp table

TCH · 07 July 2022, 22:50

Quote:

Originally Posted by no9

Looking forward to hear that!

I'm only experimenting yet, so it will be not released soon.

Quote:

Originally Posted by Hannibal

In case anyone is interested, here are my inner loops from the mixer for fixed rate half/full volume mixing.

Thank you for sharing it, i'm sure i can learn tricks from it.

Hannibal · 08 July 2022, 02:53

Dan, yeah I am not too happy about having to do clamping like that, but I couldn’t think of another way without losing sample precision and volume (storing as 6-bit only)
What are you thinking?

DanScott · 08 July 2022, 19:10

Quote:

Originally Posted by Hannibal

Dan, yeah I am not too happy about having to do clamping like that, but I couldn’t think of another way without losing sample precision and volume (storing as 6-bit only)
What are you thinking?

There's a way of clamping unsigned byte adds to 255, so was thinking there might be a similar way for signed -128 to +127 range to.

The way for unsigned after adding to d0:

subx.b d1,d1
or.b d1,d0

EDIT: I actually started a thread about this some time ago:

https://eab.abime.net/showthread.php?t=106727

So maybe lookup is the best way for signed....

ross · 08 July 2022, 19:58

Quote:

Originally Posted by DanScott

The way for unsigned after adding to d0:

subx.b d1,d1
or.b d1,d0

This is nice

, but the biggest problem is that it cannot be made scalable (it can only work in mixing two samples).

saimon69 · 08 July 2022, 20:10

Quote:

Originally Posted by ross

This is nice

, but the biggest problem is that it cannot be made scalable (it can only work in mixing two samples).

I think that is what we used on Powder to add some sound effects, and i remember the coder tell me to keep one of the channel (0? not sure now) as empty as possible

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Using blitter for sound mixing?	idrougge	Coders. Asm / Hardware	20	23 December 2022 16:13
Sample mixing	mds	Coders. Language	1	05 July 2022 15:55
Performance update: Audio Mixing version 2.0 for Games example + source	roondar	Coders. Asm / Hardware	45	27 February 2021 18:23
Audio Mixing for Games example + source	roondar	Coders. Asm / Hardware	34	30 April 2019 10:49
atlon 64 3800+ 2.4 ghz vs intel q6600 4x2.4ghz	turrican3	support.WinUAE	10	08 March 2008 19:05

05 July 2022, 19:33	#1
TCH Newbie Amiga programmer Join Date: Jun 2012 Location: Front of my A500+ Age: 38 Posts: 372	8chn audio mixing: 5 to 1 (+3) vs. 4x2 to 4x1 Out of curiousity: which is supposed to be the faster approach; if one mixes 5 channel into one (and uses 3 real channels aside the mixed), or if mixes 4x2 channels to 4x1?

06 July 2022, 08:57	#3
TCH Newbie Amiga programmer Join Date: Jun 2012 Location: Front of my A500+ Age: 38 Posts: 372	4 channels mixed to 1, plus 3 real channels are only 7 channels. There are 8 channel tracker programs, like OctaMED. If 4 to 1 is faster for music, then they must use 5 to 1, right? Why 4 to 1 is faster, if volume or period differences can occur?

06 July 2022, 08:58	#4
Hannibal Registered User Join Date: May 2015 Location: Kirkland, Washington, USA Posts: 56	I think 4-to-1 mixing+3 real channels is faster than 3 2-to-1 +1 real, because the CPU only does 4 reads 1 write as opposed to 6 reads 3. I never tried the 3x2-to-1 mixer, but I made a 4-channel sfx mixer (limited to same period and full or half volume per sample) for fun a while back. I got some really fast inner loops by using codegen for all permutations of how many active samples were at half or full volume level. I used the 3 regular channels for music, and the other 4 could then either be sfx drums or sfx. Since they are software mixed, those samples can stay in fastmem, and I only needed just over 1k chip mem for a double buffer that was updated every frame. It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together

06 July 2022, 09:15	#5
TCH Newbie Amiga programmer Join Date: Jun 2012 Location: Front of my A500+ Age: 38 Posts: 372	That makes sense. And 8 reads and 4 writes would take even more time. I should thought of that. I try to make something similar, although only for music right now. You wrote Banshee? That game was awesome.

06 July 2022, 15:55	#10
no9 Registered User Join Date: Feb 2018 Location: Poland Posts: 352	@Don_Adan thanks. I did actually listen to this in the Face The Music itself. It was ok with mild occasional distortions. But for some reason I don't trust that implementation of this alghorithm in the FTM was optimized to the last bit. Maybe it could be implemented better, or there are the other limitations in this approach I'm not aware of.

07 July 2022, 00:18	#12
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,212	there might be a quicker way of doing the clamping, to avoid those indexed memory reads from the clamp table

08 July 2022, 02:53	#14
Hannibal Registered User Join Date: May 2015 Location: Kirkland, Washington, USA Posts: 56	Dan, yeah I am not too happy about having to do clamping like that, but I couldn’t think of another way without losing sample precision and volume (storing as 6-bit only) What are you thinking?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)