View Single Post
Old 24 May 2009, 13:39   #122
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Thorham View Post
Right, I get it. But that means an indexed lea is faster than an indexed move, that's great! Now tell me the other ones, please.
An lea is always faster than equivalent move, because it doesn't access memory, it just takes the address of it. And you certainly know that memory accesses aren't exactly what's fastest for a CPU nowadays

As for the next one :
Code:
 move.b (a0)+,d0
 move.b (a0)+,d2
 move.b (a0)+,d3
 moveq #-4,d4   ; fc
 and.b d4,d2
 and.b d4,d3
 and.l d0,d4    ; d0=original, d4=and fc
 move.l d4,d1
We can then use d0 where the 2 lower bits don't need to be cleared ; we have a copy of the data that's free to make. The part computing the 4-4-4 rgb value is the good place to use it.
Okay, that's only 2 cycles ;-)

The last one was more complex to do. The idea was to move fixed pixel computation at the end, so that before doing it we know that this particular pixel is identical to its predecessor.
It involved much code duplication, due to massive register shortage.
But now this code :
Code:
 add.l d4,d5   ; d4=r d5=r+v d6=b
 add.l d6,d4   ; r+b r+v b
 add.l d6,d6   ; r+b r+v 2b
 add.l d5,d6   ; r+b r+v 2b+r+v
 beq.s .vbrb   ; all together = 0 -> gbrb
is executed earlier, and images with identical rows of pixels will decode faster (this isn't spectacular, but now Visage is ALWAYS beaten )

Quote:
Originally Posted by Thorham View Post
Yes, they're long. They're an hour of continuous music each. They were originally encoded in mp3 format, and the guy I got them from burned them to cd, so I encoded them at 320kb to keep them in a good quality. They're both 182 mb large. Still a lot better then 650 mb per cd.
Perhaps you can gain a little bit more without altering quality. There is a PC program around called MP3 optimizer or something like that (I could use it at work some time ago) ; for me it has turned a 20Mb 320kbps MP3 into a 16Mb VBR MP3 without any loss.

Quote:
Originally Posted by Thorham View Post
Yeah, the interface sucks, but only because it dumps data to chip mem, witch then has to be copied to fast, that's the real problem. If only it was cpu idle and low mem bandwidth, it would've been a lot better.
For that you'd need an SCSI extension plugged in your A1230 board, or something like that. But for IDE there is no DMA channel at all : the CPU simply reads a 16-bit hardware register (= speed of chipmem) and copies data wherever needed. Not efficient (but data really won't go into chipmem).

Quote:
Originally Posted by Thorham View Post
Not me. You might as well get a peecee with those speeds. Trying to get things fast on a 50mhz '030 is part of the fun!
But I firmly believe a 3Ghz peecee would be far beaten by an equivalent 68k.
And trying to get things fast on your 030-50 is fun only when you're successful

Quote:
Originally Posted by Thorham View Post
But aren't layers one and two video layers? You shouldn't really need those for mp3s as far as I know.
ALL are video layers if you see things like that. MP3 is just version #3 of the audio part, like the two 1 & 2 layers were.
All the MPx formats are just Motion Picture expert group, audio part, layer x.

Note : MP1 and MP2 are somewhat misnamed and can be confused with MPEG-1 and MPEG-2 video formats.
You have MPEG-1 layer 3 and MPEG-2 layer 3 for audio (there is also an unofficial MPEG2.5), just to make things simpler
Drop an MP3 into DT and you will read e.g. MPEG1-III as format.

What you need to know is that MPEG audio has 3 (now 4) layers, each one being more efficient and more complex than its predecessor. Layer 4 is AAC (M4A).
It's just accidental that MP3 got so widely used as standalone audio.

All first 3 layers are supported by original mpega.library so I want to keep that, even though I have no layer 1 at all and only one layer 2 stream to test

Quote:
Originally Posted by Thorham View Post
That's very interesting, didn't know that, thanks.
Any time

Quote:
Originally Posted by Thorham View Post
Now that's optimizing!
Indeed.

Quote:
Originally Posted by Thorham View Post
If I comment code properly, I usually just split comments over several lines. If it doesn't fit, I'll compact the comment. Also, English comments are usually short enough
English comments may be short, but they're rarely precise enough

I don't see the point in putting more than one space at left of code anyway. Code formatted like that leaves left half for code, right half for comments and it's good for me.

Anyway, code formatting is source of endless discussions between programmers, perhaps it's finally just a matter of personal taste.
Btw. where do you put your opening curly braces ? At the wrong place I bet

Quote:
Originally Posted by Thorham View Post
This:
(...)

And this:
(...)
Perhaps it's a little bit of exageration. For me, it is.
If I did my 44000 lines DM1 source like that it'd end up 60000 lines or so

Quote:
Originally Posted by Thorham View Post
Code witch is executed a large number of times in a loop. It doesn't have to fit in the cache, and it can call outside routines if need be. Not much difference.
Well, now you can have a look in that big source to see if you spot things which look like that.

Quote:
Originally Posted by Thorham View Post
Great, I forgot about the flush libs thing. Bah
Made any test so far ?

Quote:
Originally Posted by Thorham View Post
Two lines, huh? Then such a program should still be easy to write.
As I said, only problem is to actually call the library to do the job.

Quote:
Originally Posted by Thorham View Post
I don't know if I have the software. If not I can probably download it. But, who uses mp1/mp2? My guess is no one.
I have one mp2 (coming from Aminet), so what ? Should I really remove functionalities off the lib and delete it ?

Quote:
Originally Posted by Thorham View Post
I think it is.
So it is.
But now that I think about it, using an external program would perhaps add some quite random I/O time and lose accuracy. Unsure, though.

Quote:
Originally Posted by Thorham View Post
Not yet. This code is quite tough. Maybe I'll never spot a single thing, so I'm not promising anything. It's fun to try, though.
C'mon, kill a few seconds of decoding for me...

Quote:
Originally Posted by Thorham View Post
Not yet. I'm going to keep looking first. Maybe I'll spot more interesting optimizations. If tables won't increase the speed much, it's perhaps better to try them later, when all else has failed.
Good luck, pal. You'll need it.

Quote:
Originally Posted by Thorham View Post
Yeah, that's true. For '060 you don't need shifts at all, as far as I know. Makes life simple, though.
If it's too simple for you, you can still reschedule things for dual execute... should be fun too.

Quote:
Originally Posted by Thorham View Post
I'd still forget about it for now, if I were you. Just focus on the cpu that needs optimizations the most.
But now I see nothing left to do...

Quote:
Originally Posted by Thorham View Post
Haven't seen anything so far...
For 3 blocks adda.l a6,a4 can be removed because A4 is not used, so don't care about them, it's a known thing. But that's neglectable...

Quote:
Originally Posted by Thorham View Post
Yes, I think it is. Just try decoding a single three minute song in good quality to wav and play it back. If it sounds good, it's the program.
So if mpega.library can sound good enough, have you tried the version in my archive, with its default settings ?
meynaf is offline  
 
Page generated in 0.05095 seconds with 11 queries