English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 05 July 2022, 19:33   #1
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
8chn audio mixing: 5 to 1 (+3) vs. 4x2 to 4x1

Out of curiousity: which is supposed to be the faster approach; if one mixes 5 channel into one (and uses 3 real channels aside the mixed), or if mixes 4x2 channels to 4x1?
TCH is offline  
Old 05 July 2022, 23:06   #2
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by TCH View Post
Out of curiousity: which is supposed to be the faster approach; if one mixes 5 channel into one (and uses 3 real channels aside the mixed), or if mixes 4x2 channels to 4x1?
I dont think that exist 5 to 1 mixing, i know only 4 to 1 mixing plus 3 real channels. For very easy mixing (same period, same volume) 2 to 1 is fastest, useful fo SFX. For music 4 to 1 is fastest. Anyway this is dependent too to coder knowledge and code quality.
Don_Adan is offline  
Old 06 July 2022, 08:57   #3
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
4 channels mixed to 1, plus 3 real channels are only 7 channels. There are 8 channel tracker programs, like OctaMED. If 4 to 1 is faster for music, then they must use 5 to 1, right? Why 4 to 1 is faster, if volume or period differences can occur?
TCH is offline  
Old 06 July 2022, 08:58   #4
Hannibal
Registered User
 
Join Date: May 2015
Location: Kirkland, Washington, USA
Posts: 56
I think 4-to-1 mixing+3 real channels is faster than 3 2-to-1 +1 real, because the CPU only does 4 reads 1 write as opposed to 6 reads 3.

I never tried the 3x2-to-1 mixer, but I made a 4-channel sfx mixer (limited to same period and full or half volume per sample) for fun a while back. I got some really fast inner loops by using codegen for all permutations of how many active samples were at half or full volume level. I used the 3 regular channels for music, and the other 4 could then either be sfx drums or sfx. Since they are software mixed, those samples can stay in fastmem, and I only needed just over 1k chip mem for a double buffer that was updated every frame.

It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together
Hannibal is offline  
Old 06 July 2022, 09:15   #5
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
That makes sense. And 8 reads and 4 writes would take even more time. I should thought of that.

I try to make something similar, although only for music right now.

You wrote Banshee? That game was awesome.
TCH is offline  
Old 06 July 2022, 11:58   #6
pink^abyss
Registered User
 
Join Date: Aug 2018
Location: Untergrund/Germany
Posts: 408
Quote:
Originally Posted by Hannibal View Post
It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together

Music+Sfx in Amiga games: Twice the fun, double the trouble
pink^abyss is offline  
Old 06 July 2022, 12:45   #7
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by Hannibal View Post
I think 4-to-1 mixing+3 real channels is faster than 3 2-to-1 +1 real, because the CPU only does 4 reads 1 write as opposed to 6 reads 3.
I experimented with software mixing a while back, and I concur, you can certainly mix 3x2 to 1 (and it will most probably sound better than 4 to 1), but there's more overhead involved.

Plus, a 4-to-1 mixer can abuse the fact that reading/adding/writing long values is quite a bit quicker than reading/adding/writing bytes or words, while not leading to a noticable difference in sound quality.
Quote:
It feels fast enough to use in games even on A500. If I had thought about that back in the day, I could probably have used it on Banshee to have music+sfx together
That was exactly the point of my experiments, yes . Software mixing can be reasonably fast and probably fast enough for SFX in most games. I wrote an article about it which includes a video explanation and source code/demo program for such a mixer (it mixes 4 channels into one@11KHz using 3,2% CPU time on an A500), so if you (or any other thread readers) are interested, it can be found here: http://powerprograms.nl/amiga/audio-mixing.html
roondar is offline  
Old 06 July 2022, 13:39   #8
no9
Registered User
 
no9's Avatar
 
Join Date: Feb 2018
Location: Poland
Posts: 352
Quote:
Originally Posted by TCH View Post
I try to make something similar, although only for music right now.
Looking forward to hear that!


Once on some Atari forum I've found such information about Face The Music Amiga tracker. If that's correct I wonder if this might bring some gains and how significant would it influence the quality. FTM wasn't that bad quality-wise.

Quote:
There were a few other Amiga music players with 8 channels; FaceTheMusic had a particularly clever implementation which ran the hardware channel at the same rate as the highest sampling rate of the pair. Therefore only one channel needed to be resampled and only that channel needed sample end checking inside the tight mixing loop.
no9 is offline  
Old 06 July 2022, 14:21   #9
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by no9 View Post
Looking forward to hear that!


Once on some Atari forum I've found such information about Face The Music Amiga tracker. If that's correct I wonder if this might bring some gains and how significant would it influence the quality. FTM wasn't that bad quality-wise.
Face The Music's mixing idea/replay was used in Delitracker 2 (for 8 Voices NotePlayer). Is good enough, but often can freezes 68000 (7 MHz). Then You can test quality of this mixer for every 4-8 channels soundformat which has replay as "_note" version.
Don_Adan is offline  
Old 06 July 2022, 15:55   #10
no9
Registered User
 
no9's Avatar
 
Join Date: Feb 2018
Location: Poland
Posts: 352
@Don_Adan thanks. I did actually listen to this in the Face The Music itself. It was ok with mild occasional distortions. But for some reason I don't trust that implementation of this alghorithm in the FTM was optimized to the last bit. Maybe it could be implemented better, or there are the other limitations in this approach I'm not aware of.
no9 is offline  
Old 06 July 2022, 21:09   #11
Hannibal
Registered User
 
Join Date: May 2015
Location: Kirkland, Washington, USA
Posts: 56
In case anyone is interested, here are my inner loops from the mixer for fixed rate half/full volume mixing. You need samples eor'ed with $80808080 as that was faster for the mixing code. It is loop unrolled to do 8 samples per loop, so samples must be 8-byte aligned.

Quality loss: there isn't any quality loss for a single full-volume 8-bit sample, a very slight quality loss for multiple samples if they are loud, as the 8-bit values are summed and then clamped to +127 to -128. Half-volume samples lose 1 bit of precision. The main reason I stayed away from variable pitch is that you lose a lot of quality, even if you invest in linear interpolation and use a 28khz playback buffer. Plus, it's slow as hell to mix :-)

Other uses: Having a looping sample buffer like this also makes it easy (or at least, easier) to stream samples from disk. I didn't try, but also think you can use the buffers for basic echo/reverb, which combined with the low pass filter could add a little more realism and variety to different regions of a game.

Playback Buffers: I didn't include the code that manages the double buffering, which was pretty tricky (and that code isn't clean enough to share). For me it runs in vblank interrupt and balances back and forth between the 2 nearest buffer sizes. You could also do this mixing in an audio interrupt, but then you can't control where during the vblank the mixing happens - or you could triple-buffer and on some frames just skip the third buffer.

Code Format: Sorry about all the weird alignment - My autoformatter assumes tab=4 spaces.

Code:
cColorClocksPerSecondPAL:		equ	3546895
cMinColorClocksPerSamplePAL:	equ	124	; approx 28603hz samples - 121-124 for channel 1 to 4


cMixingBufferSegmentSizeBits:	equ	3
cMixingBufferSegmentSize:		equ	(1<<cMixingBufferSegmentSizeBits)
cMixingFrameBufferSize:			equ	((cColorClocksPerSecondPAL/(cMinColorClocksPerSamplePAL*50))+cMixingBufferSegmentSize)&(-cMixingBufferSegmentSize)



.mixerInnerLoopTable:
		dc.w	.innerLoop_0_0-.mixerInnerLoopTable
		dc.w	.innerLoop_0_1-.mixerInnerLoopTable
		dc.w	.innerLoop_0_2-.mixerInnerLoopTable
		dc.w	.innerLoop_0_3-.mixerInnerLoopTable
		dc.w	.innerLoop_0_4-.mixerInnerLoopTable
		dcb.w	3, 0

		dc.w	.innerLoop_1_0-.mixerInnerLoopTable
		dc.w	.innerLoop_1_1-.mixerInnerLoopTable
		dc.w	.innerLoop_1_2-.mixerInnerLoopTable
		dc.w	.innerLoop_1_3-.mixerInnerLoopTable
		dcb.w	4, 0

		dc.w	.innerLoop_2_0-.mixerInnerLoopTable
		dc.w	.innerLoop_2_1-.mixerInnerLoopTable
		dc.w	.innerLoop_2_2-.mixerInnerLoopTable
		dcb.w	5, 0

		dc.w	.innerLoop_3_0-.mixerInnerLoopTable
		dc.w	.innerLoop_3_1-.mixerInnerLoopTable
		dcb.w	6, 0

		dc.w	.innerLoop_4_0-.mixerInnerLoopTable

		; registers used when entering mixer inner loop
		; a5=destination (frame B)
		; d0=samples needed this inner loop-1



		;mixer inner loop registers
			; a1-a4=samples (UNSigned 8-bit: 0-255)
			; a0=clamptable
			; d1=accumulator for sample 1 and 3
			; d2=accumulator for sample 0 and 2
			; d3/d4=work registers
			; d5=mask for sample 1 and 3 ($00ff00ff)
			; d6=mask for sample 0 and 2 ($ff00ff00)
			; d7=0
ReadFirstVoice: macro								; Private    : used for inner loop
				move.l	(a1)+, d1
				move.l	d1, d2
				and.l	d5, d1
				and.l	d6, d2
			endm
AddVoice:	macro									; Private    : used for inner loop
				move.l	(\1)+, d3
				move.l	d3, d4
				and.l	d5, d3
				and.l	d6, d4
				add.l	d3, d1
				add.l	d4, d2
				addx.l	d7, d2
			endm
ShiftHalfVolume: macro								; Private    : used for inner loop
				lsr.l	#1, d1
				ror.l	#1, d2
				and.l	#$0fff0fff, d1
				and.l	#$ff0fff0f, d2
			endm
StoreVoice:	macro									; Private    : used for inner loop
				ror.l	#8, d2
				move.b	0(a0, d2.w), 2(a5)
				move.b	0(a0, d1.w), 3(a5)
				swap	d2
				swap	d1
				move.b	0(a0, d2.w), 0(a5)
				move.b	0(a0, d1.w), 1(a5)
				add.w	#4, a5
			endm
StoreVoiceHalfsOnly: macro							; Private    : used for inner loop
			ShiftHalfVolume
			StoreVoice
		endm

DefineInnerLoop: macro								; Private    : Generates optimized mixer inner loop
													; Inputs     : Number of half volume samples, number of full volume samples
.innerLoop_\1_\2:
			; read active voice data pointers into a1-a4 into registers
			if		(\1+\2)=0
				moveq	#0, d5
			endif
			if		(\1+\2)=1
				move.l	MixerData_SortedVoicePtrs(a4), a1
				move.l	MixerVoice_Ptr(a1), a1
			endif
			if		(\1+\2)=2
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a2
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
			endif
			if		(\1+\2)=3
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a3
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
				move.l	MixerVoice_Ptr(a3), a3
			endif
			if		(\1+\2)=4
				movem.l	MixerData_SortedVoicePtrs(a4), a1-a4
				move.l	MixerVoice_Ptr(a1), a1
				move.l	MixerVoice_Ptr(a2), a2
				move.l	MixerVoice_Ptr(a3), a3
				move.l	MixerVoice_Ptr(a4), a4
			endif
			if		(\1+\2)>1
				lea		ClampTable-(\1*$40+\2*$80)(pc), a0
				move.l	#$00ff00ff, d5
				if		MixerVersion=1
					move.l	#$ff00ff00, d6
				endif
				if		MixerVersion=2
					move.w	#$0fff, d6
				endif

				moveq	#0, d7
			else
				if		(\1=1) & (\2=0)
					move.l	#$80808080, d4
					move.l	#$fefefefe, d5
				endif
				if		(\1=0) & (\2=1)
					move.l	#$80808080, d4
				endif

			endif
.loop_\1_\2:
			; mixes one segment from a1-a4 into a5
			if		(\1+\2)=0
				; no channels
				rept	cMixingBufferSegmentSize/4
					move.l	d5, (a5)+
				endr
			endif
			if		(\1=1) & (\2=0)
				; single channel half volume
				rept	cMixingBufferSegmentSize/4
					move.l	(a1)+, d2
					if		1
						; correct
						eor.l	d4, d2
						and.l	d5, d2
						move.l	d2, d3
						ror.l	#1, d2
						and.l	d4, d3
						add.l	d3, d2
					else
						; fast
						and.l	#$fefefefe, d2
						ror.l	#1, d2
						sub.l	#$40404040, d2
					endif
					move.l	d2, (a5)+
				endr
			endif
			if		(\1=0) & (\2=1)
				; single channel full volume
				rept	cMixingBufferSegmentSize/4
					move.l	(a1)+, d2
					eor.l	d4, d2
					move.l	d2, (a5)+
				endr
			endif
			if		(\1+\2)>1
				rept	cMixingBufferSegmentSize/4
					; available registers:
					; d2-d7
					; d2-d6=work registers
					if		\1>=1
						ReadFirstVoice
					endif
					if		\1>=2
						AddVoice a2
					endif
					if		\1>=3
						AddVoice a3
					endif
					if		\1>=4
						AddVoice a4
					endif
					if		\2=0
						StoreVoiceHalfsOnly
					else
						if		((\1+\2)>=1) & (\1<1)
							; there are only full volume voices
							ReadFirstVoice
						else
							; there is a mix between half and full voices, so shift the accumulated values before adding fu
							ShiftHalfVolume
						endif
						; if there are at least 2 voices in total, but less than 2 are half volume, add voice 2 at full volume
						if		((\1+\2)>=2) & (\1<2)
							AddVoice a2
						endif
						if		((\1+\2)>=3) & (\1<3)
							AddVoice a3
						endif
						if		((\1+\2)>=4) & (\1<4)
							AddVoice a4
						endif
						StoreVoice
					endif
				endr
			endif
			dbra	d0, .loop_\1_\2
			if		(\1+\2)>1
				lea		MixerData(pc), a4
			endif

			bra		.innerLoopComplete
		endm
		DefineInnerLoop 0, 0
		DefineInnerLoop 0, 1
		DefineInnerLoop 0, 2
		DefineInnerLoop 0, 3
		DefineInnerLoop 0, 4
		DefineInnerLoop 1, 0
		DefineInnerLoop 1, 1
		DefineInnerLoop 1, 2
		DefineInnerLoop 1, 3
		DefineInnerLoop 2, 0
		DefineInnerLoop 2, 1
		DefineInnerLoop 2, 2
		DefineInnerLoop 3, 0
		DefineInnerLoop 3, 1
		DefineInnerLoop 4, 0



	dcb.b	cMixerVoiceCount*$80-$80, $80
.temp:									set		$80
	rept	128
		dc.b	.temp
.temp:									set		.temp+1
	endr
ClampTable:
.temp:									set		$0
	rept	128
		dc.b	.temp
.temp:									set		.temp+1
	endr
	dcb.b	cMixerVoiceCount*$80-$80, $7f
Hannibal is offline  
Old 07 July 2022, 00:18   #12
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
there might be a quicker way of doing the clamping, to avoid those indexed memory reads from the clamp table
DanScott is online now  
Old 07 July 2022, 22:50   #13
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Quote:
Originally Posted by no9 View Post
Looking forward to hear that!
I'm only experimenting yet, so it will be not released soon.
Quote:
Originally Posted by Hannibal View Post
In case anyone is interested, here are my inner loops from the mixer for fixed rate half/full volume mixing.
Thank you for sharing it, i'm sure i can learn tricks from it.

Last edited by TCH; 07 July 2022 at 22:50. Reason: Half of the post was on another tab...
TCH is offline  
Old 08 July 2022, 02:53   #14
Hannibal
Registered User
 
Join Date: May 2015
Location: Kirkland, Washington, USA
Posts: 56
Dan, yeah I am not too happy about having to do clamping like that, but I couldn’t think of another way without losing sample precision and volume (storing as 6-bit only)
What are you thinking?
Hannibal is offline  
Old 08 July 2022, 19:10   #15
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Quote:
Originally Posted by Hannibal View Post
Dan, yeah I am not too happy about having to do clamping like that, but I couldn’t think of another way without losing sample precision and volume (storing as 6-bit only)
What are you thinking?
There's a way of clamping unsigned byte adds to 255, so was thinking there might be a similar way for signed -128 to +127 range to.

The way for unsigned after adding to d0:

subx.b d1,d1
or.b d1,d0


EDIT: I actually started a thread about this some time ago:

https://eab.abime.net/showthread.php?t=106727

So maybe lookup is the best way for signed....

Last edited by DanScott; 08 July 2022 at 19:16. Reason: Updated
DanScott is online now  
Old 08 July 2022, 19:58   #16
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by DanScott View Post
The way for unsigned after adding to d0:

subx.b d1,d1
or.b d1,d0
This is nice , but the biggest problem is that it cannot be made scalable (it can only work in mixing two samples).
ross is offline  
Old 08 July 2022, 20:10   #17
saimon69
J.M.D - Bedroom Musician
 
Join Date: Apr 2014
Location: los angeles,ca
Posts: 3,519
Quote:
Originally Posted by ross View Post
This is nice , but the biggest problem is that it cannot be made scalable (it can only work in mixing two samples).
I think that is what we used on Powder to add some sound effects, and i remember the coder tell me to keep one of the channel (0? not sure now) as empty as possible
saimon69 is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Using blitter for sound mixing? idrougge Coders. Asm / Hardware 20 23 December 2022 16:13
Sample mixing mds Coders. Language 1 05 July 2022 15:55
Performance update: Audio Mixing version 2.0 for Games example + source roondar Coders. Asm / Hardware 45 27 February 2021 18:23
Audio Mixing for Games example + source roondar Coders. Asm / Hardware 34 30 April 2019 10:49
atlon 64 3800+ 2.4 ghz vs intel q6600 4x2.4ghz turrican3 support.WinUAE 10 08 March 2008 19:05

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:28.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.16597 seconds with 15 queries