22 October 2020, 13:41 | #41 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,050
|
Ah, so the a1/a4 tables are a small part of a large volume table. OK, it wasn't clear from the source. Not worth it on its own, assuming no use for d4 to make the code even faster, but good to hear that you managed to salvage it.
There is a typo in MIXCF that keeps surviving, should be rol d5, not d7. |
22 October 2020, 13:53 | #42 | |
Registered User
Join Date: Aug 2014
Location: Zagreb / Croatia
Posts: 302
|
Quote:
Code:
move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 ext.w d3 ; <- this one sub.w d3,d7 ; d7.w = sample2-sample1 But my first assumption is that both byte numbers are signed. ("; d3.w = 2x signed 8-bit samples") And that you maybe forgot to extend second number. |
|
22 October 2020, 14:26 | #43 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
a/b: Ah yes, that's just a typo when I'm making my posts, it's correct in the actual source code. I'll edit them again.
Tomislav: It works as intended, and I don't see any problems with it. I'm working with signed word samples in that stage, and the upper word of d3 is never used in the mixing loop. The second sample is properly converted to 8-bit signed word by the ASR shifting (which copies the sign bit). Also here's how it sounds as of now. Playing a 20-channel XM song with linear interpolation at 14-bit 28604Hz on my Amiga 1200 (68030 50MHz): https://www.dropbox.com/s/s6d24ng9hv...play2.mp3?dl=1 It has some quantization noise because all 16-bit samples are converted to 8-bit on load time. Also there is no volume ramping, so it clicks/pops sometimes. 18-20 channels seems about absolute max for a 68030 50MHz, and it will use almost all available CPU time. Last edited by 8bitbubsy; 22 October 2020 at 14:55. |
22 October 2020, 14:51 | #44 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
Quote:
Only after lsl.w #8,d3, ext.w d3 has sense. |
|
23 October 2020, 06:58 | #45 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
Quote:
Stereo mix: Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.w (a6,d5.w*2),d3 move.w (a1,d3.w*4),d5 ; d5.w = left output sample (from volume LUT) swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample ; if A1 and A4 used same table, use move.w (a4,d3.w*4),d5 add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM Code:
; d0.w = bytes to mix MIXCF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.w (a6,d5.w*2),d3 move.l (a1,d3.w*4),d3 ; d3.l = output sample (from volume LUT) add.l d3,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM |
|
23 October 2020, 07:42 | #46 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
One more thing/trick, one command less, you can use this for single sided table version too:
Stereo mix: Code:
; d0.w = bytes to mix MIXSF MACRO move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples move.b d3,d7 ext.w d7 asr.w #8,d3 sub.w d3,d7 ; d7.w = sample2-sample1 move.l d7,d5 rol.l #7,d5 add.w (a6,d5.w*2),d3 move.l (a1,d3.w*4),d5 ; d5.w = left output sample (from volume LUT) ; swap d5 move.w (a4,d3.w*2),d5 ; d5.l = (leftSample << 16) | rightSample ; if A1 and A4 used same table, use move.w (a4,d3.w*4),d5 add.l d5,(a5)+ add.l d6,d7 ; increase sampling position addx.l d1,d2 ENDM |
23 October 2020, 10:42 | #47 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
That table is the same for both L and R volume (it's pre-offset with the current voice volume), and it's already 256kB in size. I'd rather not double that just to make it a few percent faster at max. Thanks anyway!
|
23 October 2020, 11:19 | #48 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
Quote:
move.l (a1,d3.w*2),d5 ; d5 high word = left output sample (from volume LUT) ; swap d5 move.w (a4,d3.w*2),d5 ; d5.w = (leftSample << 16) | rightSample |
|
23 October 2020, 11:29 | #49 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
But is a longword read as fast as a word read on 68020?
|
23 October 2020, 12:09 | #50 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
|
23 October 2020, 13:17 | #51 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
You can check this, but if i remember right word and longword read has same speed for 68020 at even addresses. Only odd even reads has penalty, like f.e this:
move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples |
23 October 2020, 14:07 | #52 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,050
|
020+ has a 32-bit bus, so misaligned 32-bit transfers (e.g. a longword at address 2, 6, ..., as well as any odd address) are slower since it require two 32-bit transfers.
|
23 October 2020, 15:51 | #53 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
That might be a problem, since the pre-offseting off the volume LUTs may mess up the 32-bit alignment in the actual look-up.
|
23 October 2020, 15:57 | #54 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,483
|
I follow this thread casually, but it is very interesting.
I looked at the latest proposed routines and I don't think there are penalties for misalignments. 32-bit accesses are longword aligned and 16-bit accesses are word aligned. The access speed to memory is maximum. |
23 October 2020, 16:07 | #55 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,050
|
Yeah, because the suggested changes rely on doubling the table size to ensure the alignment. And the table size has already been doubled, so that would be 4x the original size.
It's his call, of course ;p. |
23 October 2020, 16:09 | #56 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,483
|
|
23 October 2020, 16:23 | #57 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,024
|
Quote:
BTW. Because extra odd address penalty for move.w (a3,d2.l),d3 ; d3.w = 2x signed 8-bit samples i thinked too about move.b (a3),d3 move.b 1(a3),d7 but too many other changes is necessary for A3 handling.And it can not be fastest. |
|
23 October 2020, 17:22 | #58 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
IIRC misaligned longword access is more deadly than misaligned word access because a word access in the middle of a long doesn't have penalty.
This means that if a0 is longword aligned, word accesses will be : . 0(a0) aligned word, ok . 1(a0) inside longword, ok . 2(a0) aligned word, ok . 3(a0) misaligned access (only 25% cases) |
24 October 2020, 10:18 | #59 | |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,712
|
Quote:
I could still do that method by using d2 as the relative sampling position (d2 = a3+d2 before loop). Then I do "move.l d3,a3" as the first instruction in the loop. That's one move intruction extra, so probably slower in the end? |
|
24 October 2020, 11:18 | #60 | |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
Quote:
It may give some benefit to have two sets of mixing routines: one for integer delta <= 1 where you access every sample, and potentially also are able to re-use the delta between two samples, so that "move.w (a3,d2.l),d3", asr and ext instructions are only necessary once per input sample, not per output sample. And one for delta > 1. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Interpolation new Sound options | Paul | support.WinUAE | 10 | 17 March 2019 20:57 |
Artifacts from non-gamma-aware interpolation | mark_k | support.WinUAE | 5 | 08 January 2018 14:37 |
switch sound interpolation 4 chs | turrican3 | support.WinUAE | 1 | 14 February 2016 10:39 |
Non-linear retrogaming? | Nogg | Retrogaming General Discussion | 5 | 13 October 2007 17:09 |
is time linear | PaulS | request.Demos | 2 | 22 September 2002 12:37 |
|
|