Quote:
Originally Posted by Thorham
Right, I get it. But that means an indexed lea is faster than an indexed move, that's great! Now tell me the other ones, please.
|
An lea is always faster than equivalent move, because it doesn't access memory, it just takes the address of it. And you certainly know that memory accesses aren't exactly what's fastest for a CPU nowadays
As for the next one :
Code:
move.b (a0)+,d0
move.b (a0)+,d2
move.b (a0)+,d3
moveq #-4,d4 ; fc
and.b d4,d2
and.b d4,d3
and.l d0,d4 ; d0=original, d4=and fc
move.l d4,d1
We can then use d0 where the 2 lower bits don't need to be cleared ; we have a copy of the data that's free to make. The part computing the 4-4-4 rgb value is the good place to use it.
Okay, that's only 2 cycles ;-)
The last one was more complex to do. The idea was to move fixed pixel computation at the end, so that before doing it we know that this particular pixel is identical to its predecessor.
It involved much code duplication, due to massive register shortage.
But now this code :
Code:
add.l d4,d5 ; d4=r d5=r+v d6=b
add.l d6,d4 ; r+b r+v b
add.l d6,d6 ; r+b r+v 2b
add.l d5,d6 ; r+b r+v 2b+r+v
beq.s .vbrb ; all together = 0 -> gbrb
is executed earlier, and images with identical rows of pixels will decode faster (this isn't spectacular, but now Visage is ALWAYS beaten
![Big Grin](images/smilies/biggrin.gif)
)
Quote:
Originally Posted by Thorham
Yes, they're long. They're an hour of continuous music each. They were originally encoded in mp3 format, and the guy I got them from burned them to cd, so I encoded them at 320kb to keep them in a good quality. They're both 182 mb large. Still a lot better then 650 mb per cd.
|
Perhaps you can gain a little bit more without altering quality. There is a PC program around called MP3 optimizer or something like that (I could use it at work some time ago) ; for me it has turned a 20Mb 320kbps MP3 into a 16Mb VBR MP3 without any loss.
Quote:
Originally Posted by Thorham
Yeah, the interface sucks, but only because it dumps data to chip mem, witch then has to be copied to fast, that's the real problem. If only it was cpu idle and low mem bandwidth, it would've been a lot better.
|
For that you'd need an SCSI extension plugged in your A1230 board, or something like that. But for IDE there is no DMA channel at all : the CPU simply reads a 16-bit hardware register (= speed of chipmem) and copies data wherever needed. Not efficient (but data really won't go into chipmem).
Quote:
Originally Posted by Thorham
Not me. You might as well get a peecee with those speeds. Trying to get things fast on a 50mhz '030 is part of the fun!
|
But I firmly believe a 3Ghz peecee would be far beaten by an equivalent 68k.
And trying to get things fast on your 030-50 is fun only when you're successful
Quote:
Originally Posted by Thorham
But aren't layers one and two video layers? You shouldn't really need those for mp3s as far as I know.
|
ALL are video layers if you see things like that. MP3 is just version #3 of the audio part, like the two 1 & 2 layers were.
All the MPx formats are just Motion Picture expert group, audio part, layer x.
Note : MP1 and MP2 are somewhat misnamed and can be confused with MPEG-1 and MPEG-2 video formats.
You have MPEG-1 layer 3 and MPEG-2 layer 3 for audio (there is also an unofficial MPEG2.5), just to make things simpler
![crazy](images/smilies/crazy.gif)
Drop an MP3 into DT and you will read e.g. MPEG1-III as format.
What you need to know is that MPEG audio has 3 (now 4) layers, each one being more efficient and more complex than its predecessor. Layer 4 is AAC (M4A).
It's just accidental that MP3 got so widely used as standalone audio.
All first 3 layers are supported by original mpega.library so I want to keep that, even though I have no layer 1 at all and only one layer 2 stream to test
Quote:
Originally Posted by Thorham
That's very interesting, didn't know that, thanks.
|
Any time
Quote:
Originally Posted by Thorham
Now that's optimizing!
|
Indeed.
Quote:
Originally Posted by Thorham
If I comment code properly, I usually just split comments over several lines. If it doesn't fit, I'll compact the comment. Also, English comments are usually short enough ![Wink](images/smilies/wink.gif)
|
English comments may be short, but they're rarely precise enough
I don't see the point in putting more than one space at left of code anyway. Code formatted like that leaves left half for code, right half for comments and it's good for me.
Anyway, code formatting is source of endless discussions between programmers, perhaps it's finally just a matter of personal taste.
Btw. where do you put your opening curly braces ? At the wrong place I bet
Quote:
Originally Posted by Thorham
This:
(...)
And this:
(...)
|
Perhaps it's a little bit of exageration. For me, it is.
If I did my 44000 lines DM1 source like that it'd end up 60000 lines or so
Quote:
Originally Posted by Thorham
Code witch is executed a large number of times in a loop. It doesn't have to fit in the cache, and it can call outside routines if need be. Not much difference.
|
Well, now you can have a look in that big source to see if you spot things which look like that.
Quote:
Originally Posted by Thorham
Great, I forgot about the flush libs thing. Bah ![banghead](images/smilies/banghead.gif)
|
Made any test so far ?
Quote:
Originally Posted by Thorham
Two lines, huh? Then such a program should still be easy to write.
|
As I said, only problem is to actually call the library to do the job.
Quote:
Originally Posted by Thorham
I don't know if I have the software. If not I can probably download it. But, who uses mp1/mp2? My guess is no one.
|
I have one mp2 (coming from Aminet), so what ? Should I really remove functionalities off the lib and delete it ?
Quote:
Originally Posted by Thorham
I think it is.
|
So it is.
But now that I think about it, using an external program would perhaps add some quite random I/O time and lose accuracy. Unsure, though.
Quote:
Originally Posted by Thorham
Not yet. This code is quite tough. Maybe I'll never spot a single thing, so I'm not promising anything. It's fun to try, though.
|
C'mon, kill a few seconds of decoding for me...
Quote:
Originally Posted by Thorham
Not yet. I'm going to keep looking first. Maybe I'll spot more interesting optimizations. If tables won't increase the speed much, it's perhaps better to try them later, when all else has failed.
|
Good luck, pal. You'll need it.
Quote:
Originally Posted by Thorham
Yeah, that's true. For '060 you don't need shifts at all, as far as I know. Makes life simple, though.
|
If it's too simple for you, you can still reschedule things for dual execute... should be fun too.
Quote:
Originally Posted by Thorham
I'd still forget about it for now, if I were you. Just focus on the cpu that needs optimizations the most.
|
But now I see nothing left to do...
Quote:
Originally Posted by Thorham
Haven't seen anything so far...
|
For 3 blocks
adda.l a6,a4 can be removed because A4 is not used, so don't care about them, it's a known thing. But that's neglectable...
Quote:
Originally Posted by Thorham
Yes, I think it is. Just try decoding a single three minute song in good quality to wav and play it back. If it sounds good, it's the program.
|
So if mpega.library can sound good enough, have you tried the version in my archive, with its default settings ?