English Amiga Board


Go Back   English Amiga Board > Main > Amiga scene

 
 
Thread Tools
Old 24 May 2018, 12:54   #541
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Gorf View Post
We are talking about next-gen here - so targeting a 7MHz 68000 is not necessary. But the performance problems on low end processors show, that there is room for improvement.
Exactly, something that runs fine on a low end machine will just fly on a fast one !


Quote:
Originally Posted by Gorf View Post
Can you give me some examples of code that take two or more instructions on 68k, but are just one instruction on RISC?
An example of this is arm's predicate+barrelshifter. But they're not as useful as they pretend when alone, and nearly never used together.
So rather than one-liner examples, that can be biased toward some particular architecture, a whole routine would be better (especially one that puts some pressure on the register file).

Why not a code contest ?
Everyone interested designs his own ISA (or chooses an existing one to defend) and then writes some routine (doing something useful).
We could then finally see who's powerful and who's not.
meynaf is offline  
Old 24 May 2018, 13:05   #542
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Gorf View Post
I still do not understand why one would hold multi-gigabytes of instrument-samples in RAM - especially in a highly redundant resolution.
you mentioned some latency related reasons, but that makes still no sense to me at all, since we are in the digital realm:
a higher sampling rate means just more data that needs to be processed. More data representing the same period of time. That means higher demand of RAM, bandwidths and processing power. How can this possibly reduce latency?
Do not search for a technical reason why to do this - there is no valid one.
It's always the same old reason why folks do complicated things in place of simple ones.
Something like :
- Hey, i'm managing complex projects handling several gigabytes of data !
In comparison to :
- Huh, i'm doing my work with only a few MB of memory.
You see ? Usual "we've got the biggest balls" stuff.
Unfortunately they do that without knowing and telling them does not help.
meynaf is offline  
Old 24 May 2018, 13:21   #543
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
I do not want to defend any ISA, but I will start with the m most familiar around here:

Find the maximum and the minimum of two values - both are already in registers d0 and d1 - result needs to be in the same registers.
Code:
sub.l  dl,d0
subx.l d2,d2
and.l  d0,d2
eor.l  d2,d0
add.l  d1,d0
add.l  d2,d1
max is now in d0, min is in d1
Gorf is offline  
Old 24 May 2018, 13:34   #544
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Faster on 68020 and 68030:
Code:
   cmp.l    d0,d1
   bgt.s    .l1
   exg      d0,d1
.l1
Thorham is offline  
Old 24 May 2018, 14:05   #545
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
shorter but not faster ;-)
(my example needs always 12 cycles, yours 6/10 + jmp)
Gorf is offline  
Old 24 May 2018, 14:05   #546
Dunny
Registered User
 
Dunny's Avatar
 
Join Date: Aug 2006
Location: Scunthorpe/United Kingdom
Posts: 1,986
Quote:
Originally Posted by Gorf View Post
While I am not against more than 2 or 4 GB and can see the benefit of a larger address-space for some (rare) applications, your example does not convince me.

I still do not understand why one would hold multi-gigabytes of instrument-samples in RAM - especially in a highly redundant resolution.
you mentioned some latency related reasons, but that makes still no sense to me at all, since we are in the digital realm:
a higher sampling rate means just more data that needs to be processed. More data representing the same period of time. That means higher demand of RAM, bandwidths and processing power. How can this possibly reduce latency?
Imagine you have a track made of around 20 or thirty sampled instruments. Each one has a sample per note (127 of those), one per velocity per note (so each of those 127 notes has 127 samples) and each of those velocities was recorded with 12 or 16 microphones, each having their own spacial properties. That's a pretty extreme example - most instruments have only 30 velocities and are recorded with only three or four mics - but we have to allow for it.

Now for playback of such a track you could render the whole thing to a WAV file (we allow for this) but it takes minutes to do, and making adjustments involves going back to individual samples, so to make things flow a little easier, we keep the lot in memory where necessary. That means that there may be a slight delay when loading in samples that haven't been used yet, but that's fine for editing.

Where it's absolutely not fine is in a live performance. In that situation we cannot tell ahead of time which samples will be needed and we certainly cannot allow any time at all to pull samples off a disk. We need the whole lot in memory.

Then there's effect mixing, which if done on 44.1khz 16bit sound samples quickly gathers aliasing errors so to minimise that we use 192khz 32bit float samples.

It all adds up, I'm afraid, and HDDs (even SSDs) are not yet fast enough.
Dunny is online now  
Old 24 May 2018, 14:18   #547
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Gorf View Post
shorter but not faster ;-)
(my example needs always 12 cycles, yours 6/10 + jmp)
You're both wrong.
They have more or less same speed on 020/030.
Linear example is always 12.
Branch example is 10/14 depending on the case (2+8 if taken, 2+6+6 if not taken).

Of course this can be just 2 cycles on a cpu doing instruction fusing.
meynaf is offline  
Old 24 May 2018, 14:19   #548
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by Gorf View Post
shorter but not faster ;-)
(my example needs always 12 cycles, yours 6/10 + jmp)
The cmp is 2 cycles, the exg is 4 cycles, the bgt.s is 4 cycles when it's not taken and 8 cycles when it is. This adds up to 10 cycles in both cases. Note that this is based on 68030 timings, so it may actually not be faster on 68020.
Thorham is offline  
Old 24 May 2018, 14:22   #549
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
In my memory exg is 6 and branch is 6 when not taken and 8 if taken.
meynaf is offline  
Old 24 May 2018, 14:23   #550
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Dunny View Post
Imagine you have a track made of around 20 or thirty sampled instruments. Each one has a sample per note (127 of those), one per velocity per note (so each of those 127 notes has 127 samples) and each of those velocities was recorded with 12 or 16 microphones, each having their own spacial properties. That's a pretty extreme example - most instruments have only 30 velocities and are recorded with only three or four mics - but we have to allow for it.

Now for playback of such a track you could render the whole thing to a WAV file (we allow for this) but it takes minutes to do, and making adjustments involves going back to individual samples, so to make things flow a little easier, we keep the lot in memory where necessary. That means that there may be a slight delay when loading in samples that haven't been used yet, but that's fine for editing.

Where it's absolutely not fine is in a live performance. In that situation we cannot tell ahead of time which samples will be needed and we certainly cannot allow any time at all to pull samples off a disk. We need the whole lot in memory.

Then there's effect mixing, which if done on 44.1khz 16bit sound samples quickly gathers aliasing errors so to minimise that we use 192khz 32bit float samples.

It all adds up, I'm afraid, and HDDs (even SSDs) are not yet fast enough.
Yet this does not require that big amounts of memory because you could preload just the start of the samples (enough to cover the latency), loading what follows only for samples that are actually used.
meynaf is offline  
Old 24 May 2018, 14:26   #551
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
Quote:
Originally Posted by Dunny View Post
Where it's absolutely not fine is in a live performance. In that situation we cannot tell ahead of time which samples will be needed and we certainly cannot allow any time at all to pull samples off a disk. We need the whole lot in memory.
you have a limited number of keyboards with a limited number of keys.
The maximum a single person can handle is probably a arrangement similar to big pipe organ in a church with all the panels and registers.

So yes, you do know what limited options of samples are needed.
You are not going to evaluate different microphone settings of your samples in a live performance - these are things you chose upfront.

Quote:
Then there's effect mixing, which if done on 44.1khz 16bit sound samples quickly gathers aliasing errors so to minimise that we use 192khz 32bit float samples.
sure you need to have some higher (virtual) sample-rate while calculating the mix - but upscaling should be part of your algorithm. Doubling the value-range to 32bit makes sense - also doubling the rate to 88.2 might be useful - anything more is useless, since all aliasing errors than still left can not influence your hearing experience.

But again: you only need to do that within your calculation, but there is no need to store the instruments in this "quality" since it is only intermediate redundant information.

Quote:
It all adds up, I'm afraid, and HDDs (even SSDs) are not yet fast enough.
with your approach they aren't of course!
first you blow up your data by a factor of >8 without adding information and than you complain about the transfer speed...

Last edited by Gorf; 24 May 2018 at 15:14.
Gorf is offline  
Old 24 May 2018, 14:27   #552
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf View Post
In my memory exg is 6 and branch is 6 when not taken and 8 if taken.
Check the manual. Exg really is 4 and non taken byte branches are 4. Just benched it, and my code always executes in 10 cycles.
Thorham is offline  
Old 24 May 2018, 14:41   #553
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Check the manual. Exg really is 4 and non taken byte branches are 4. Just benched it, and my code always executes in 10 cycles.
Hmm... well... what can i say...

Anyway, instruction timings depend heavily on the implementation, so we'd better favor small code - simply because it's small everywhere.
meynaf is offline  
Old 24 May 2018, 14:54   #554
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
Quote:
Originally Posted by Thorham View Post
Check the manual. Exg really is 4 and non taken byte branches are 4. Just benched it, and my code always executes in 10 cycles.
hmm - according to the user manual EXG is 4 but branching at least 6 - more if it misses the cache.
https://www.nxp.com/docs/en/referenc.../MC68030UM.pdf
(page 11-48)
Gorf is offline  
Old 24 May 2018, 15:04   #555
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
One sure thing is that the code is 6 bytes
meynaf is offline  
Old 24 May 2018, 15:09   #556
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
ok - now we got the 68k case more than covered.
Next ISA please ;-)
Gorf is offline  
Old 24 May 2018, 15:41   #557
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by Gorf View Post
hmm - according to the user manual EXG is 4 but branching at least 6 - more if it misses the cache.
https://www.nxp.com/docs/en/referenc.../MC68030UM.pdf
(page 11-48)
I bench marked my code on a 50mhz 68030, and it really is 10 cycles. Obviously, the code was executed from the cache completely.
Thorham is offline  
Old 24 May 2018, 16:05   #558
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
Quote:
Originally Posted by Thorham View Post
I bench marked my code on a 50mhz 68030, and it really is 10 cycles. Obviously, the code was executed from the cache completely.
good to know! thank you for evaluating
Gorf is offline  
Old 24 May 2018, 16:17   #559
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Gorf View Post
ok - now we got the 68k case more than covered.
Next ISA please ;-)
What ? You ask risc lovers to write actual asm code ? Ahem...

Oh, by the way. This is a little bit boring example ; with my own instruction set it would be single instruction. But who cares.
meynaf is offline  
Old 24 May 2018, 17:13   #560
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
Quote:
Originally Posted by meynaf View Post
What ? You ask risc lovers to write actual asm code ? Ahem...

Oh, by the way. This is a little bit boring example ; with my own instruction set it would be single instruction. But who cares.
I do.

you mentioned instruction fusing... and maybe your instruction set would be a good intermediate representation:

a sophisticated decoder/translator in FPGA would find that both code snippets do the same in the end and can be represented by a single (intermediate) instruction.

The FPGA would take every instruction and identify the group. it can do that in parallel with many instructions.(parallelism)

In the second step it compares every instruction with the one that follows - if it belongs to the right group and such a comparison makes sense. Meanwhile the next group of instructions are passing through step one. (pipelining)

in the third step matching couples of instructions are fused - there can be more than one fusing step. (meanwhile an other group of instructions enters step one und former step one instructions go to comparing in step two....)

Now we would end up with a architecture independent and very short intermediate representation of the code.
Traversing a LUT or a tree each intermediate instruction would be translated in either host-cpu code or send to some special simd-unit in FPGA.

there could be more than one of these decoders/translators allowing for some kind of "speculative translation" of branches.
Gorf is offline  
 


Currently Active Users Viewing This Thread: 2 (0 members and 2 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Has anyone got an Amiga 1200 T12 Gen II? ccorkin support.Hardware 10 14 April 2017 23:18
What do people think about this as next Gen AMIGA? Gunnar Amiga scene 111 05 July 2014 20:59
Classic 1st Gen EA games for the Amiga illy5603 support.Games 8 03 July 2010 02:59
Next-gen Amiga development LaundroMat Coders. General 3 05 October 2002 00:30

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:56.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.14307 seconds with 16 queries