English Amiga Board - mpega.library faster than itself

Page 1 of 5

Last »

Show 20 post(s) from this thread on one page

English Amiga Board (https://eab.abime.net/index.php)

- Coders. General (https://eab.abime.net/forumdisplay.php?f=37)

- - mpega.library faster than itself (https://eab.abime.net/showthread.php?t=33574)

meynaf

10 December 2007 11:02

mpega.library faster than itself

Hello coders,

As you know, there is no support for mpega.library and the author can't be contacted (it seems).

I asked myself if the integer version of the library, which is actually the fastest Amiga implementation of such a decoder (or I think so :p), could be optimized, and found out it could.
I re-sourced it and got around 10% speed, enough to play in medium quality setting what I previously played in low. :spin
Now most of the (up to) 160 kbps mp3's can be played at 22.05 medium quality and mono on a 50mhz 68030.

To use it, DeliTracker's mpega player will do nicely.

I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).

You will find the actual source (reassembler output, so mostly unreadable) here :
http://meynaf.free.fr/tmp/mpega.lzx
(not in the zone because files don't live long enough there)

Included is the library doc. Unfortunately I don't have the lvo's.

More can surely be done. If someone already did something similar or is interested, you now know where to help... :help

StrategyGamer

10 December 2007 11:13

I had wanted to optimize it myself, timeless ages ago, but I could not find the source. :(

Thanks for doing it for me. :)

Thorham

12 December 2007 00:37

Quote:

Originally Posted by meynaf

You will find the actual source (reassembler output, so mostly unreadable) here :

Unreadable indeed :D

But seriously: muls muls muls and more muls :shocked When I saw that, I thought: Hoo boy. This thing is no joke. Must admit, I didn't read through all of the code, but thats a serious challenge you've gotten yourself into :)

Optimizing this one isn't the same as the instruction juggling I've been doing for your ham rendering engine, which is small and well commented (translators are far from perfect, but they do help with the French). This is a whole different cup of tea :crazy

Quote:

Originally Posted by meynaf

I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).

Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...

This one will separate the men from the boys. One day I will be a man.
Thanks for sharing :great

meynaf

12 December 2007 10:07

Quote:

Originally Posted by Thorham (Post 377935)

Else that would just be too easy :D

Quote:

Originally Posted by Thorham (Post 377935)

Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...

Already done ! What I found was way too superficial. When it comes to actually do something real with the data, you're left to the existing code. I looked into mpg123 and libmad sources, but... er... errhm...

You can try to find something, too. Maybe you'll be more lucky than I was.

What I did for my rewrite project is that :
. reading the file (ok, not too hard :p)
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.

Quote:

Originally Posted by Thorham (Post 377935)

This one will separate the men from the boys. One day I will be a man.
Thanks for sharing :great

No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...

Thorham

12 December 2007 16:15

Quote:

Originally Posted by meynaf

Else that would just be too easy :D

I agree. If it's too easy it's no fun :)

Quote:

Originally Posted by meynaf

Right, I'll see if I can find anything useful. You'd think in this day and age the net is simply filled with the info one needs... But I know, sometimes it's really hard to find what you need, while at other times it's too easy. Maybe I will have more luck.

Quote:

Originally Posted by meynaf

What I did for my rewrite project is that :
. reading the file (ok, not too hard :p)
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.

That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.

Quote:

Originally Posted by meynaf

No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...

Well, hopefully I can find some docs. I've been interested in how the mp3 format works for quite some time now, so this is the perfect opportunity to get to now more about it. I'll keep you posted on anything interesting/useful I find.

meynaf

12 December 2007 16:53

Quote:

Originally Posted by Thorham (Post 378197)

Fingers crossed...

Quote:

Originally Posted by Thorham (Post 378197)

That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.

When I did it, I just wanted to see where I could go without being blocked. But shame on me now if I don't continue :o

Quote:

Originally Posted by Thorham (Post 378197)

Thanks in advance.

Globally, I know how it works, but I need much more than a distant view to write a player...

Thorham

12 December 2007 19:19

Searching for docs and sources didn't yield a whole lot of results, but I think the following links will be interesting.

MP3' Tech - MPEG source codes Has multiple mp3 decoder sources, amongst other things.
MP3 - Hydrogenaudio Knowledgebase Looks like a nice page with some potentially interesting links, including one that describes huffman decoding.
www.eecs.umich.edu/~accheng/doc/MP3Decoder_AllenCheng.pdf Seems to be an in-depth description of mp3 decoding.
www.mp3-tech.org/programmer/docs/fpga_report.pdf Also a more in-depth description. It's a hardware implementation, but that shouldn't matter too much.

These are the best I could find without searching for hours, and although I'm sure they'll make for some interesting reading, I hope some of it is actually of some use to you. If not, I'm going to have to dig a 'little' deeper :D

meynaf

13 December 2007 10:13

These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented...

Maybe you'll have to dig a little deeper :D

Thorham

13 December 2007 10:24

Quote:

Originally Posted by meynaf

Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.

Quote:

Originally Posted by meynaf

Maybe you'll have to dig a little deeper :D

I don't mind digging a little deeper :D

meynaf

13 December 2007 11:11

Quote:

Originally Posted by Thorham (Post 378448)

Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.

I already have the sources of Amp. Quite unreadable. Couldn't make them compile on amiga. :banghead

The more interesting are mpg123, because it's the most readable I've found (but very very slow as it uses floating-point), and libmad (which is much faster as it uses fixed-point, that is, integer).

Quote:

Originally Posted by Thorham (Post 378448)

I don't mind digging a little deeper :D

If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough :D)

What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).

Thorham

13 December 2007 18:06

Quote:

Originally Posted by meynaf

And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.

Quote:

Originally Posted by meynaf

If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough :D)

That's more or less already what I find, except it's a little bit more sticky :D

Quote:

Originally Posted by meynaf

What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).

At first I thought finding good docs on this couldn't be that hard, but it seems like all of them are in the silly math format. Those math people apparently like to write down just about everything in the form of mathematical formulas. I'm afraid this is going to be harder then I thought (but I will try).

Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.

BippyM

13 December 2007 18:07

You two guys should have your oiwn section ;)

Meynaf & Thorams asm coding and chat - There only needs to be 2 members :laughing

Only joking guys.. nice to see some asm action in here :D

meynaf

13 December 2007 18:28

Quote:

Originally Posted by Thorham (Post 378648)

And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.

It sure is not obvious. :banghead

Quote:

Originally Posted by Thorham (Post 378648)

That's more or less already what I find, except it's a little bit more sticky :D

If you can sell it... :laughing

Quote:

Originally Posted by Thorham (Post 378648)

When it's not maths, then it's code. :shocked
Which one do you prefer ?

Quote:

Originally Posted by Thorham (Post 378648)

Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.

Oh, it sure gives all the details, but it doesn't explain anything. :guru

Quote:

Originally Posted by bippym (Post 378650)

You two guys should have your oiwn section ;)

Meynaf & Thorams asm coding and chat - There only needs to be 2 members :laughing

Only joking guys.. nice to see some asm action in here :D

I have nothing against a section :D
(and neither anything against more people to participate, too :agree)

Thorham

14 December 2007 16:24

Quote:

Originally Posted by meynaf

When it's not maths, then it's code. :shocked
Which one do you prefer ?

I will always prefer code.

Quote:

Originally Posted by meynaf

Oh, it sure gives all the details, but it doesn't explain anything. :guru

That surely doesn't make it any easier. I wonder why it's so hard to find good and simple documentation. Doesn't make sense. I have to say I haven't found much stuff thats any more interesting then what I already found, but I won't give up!

meynaf

14 December 2007 16:47

Quote:

Originally Posted by Thorham (Post 379016)

I will always prefer code.

So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ? :D)

Quote:

Originally Posted by Thorham (Post 379016)

I have found some links bringing to documents you had to pay to get. :shocked
I fear there is some sort of copyright blocking free docs :(

Thorham

17 December 2007 16:07

Quote:

Originally Posted by meynaf

So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ? :D)

That depends. Your ham rendering engine was pretty easy to get the grips with, but then again it was also well documented. It's that mpega re-source that is really hard to understand: no remarks except the ones you write yourself, and not a clue about the workings of the code at all.

Quote:

Originally Posted by meynaf

I have found some links bringing to documents you had to pay to get. :shocked
I fear there is some sort of copyright blocking free docs :(

I found a book about mp3 on the net. Of course you have to pay to see all of it :banghead It's not as easy as I thought to find understandable and yet complete documentation about mp3. For, say, 680x0 coding this is much easier as you're just going to find the original Motorola docs! You sure picked one :D

meynaf

21 December 2007 11:16

Quote:

Originally Posted by Thorham (Post 379783)

I don't remember how many docs I picked, but it sure was more than one :D

Quote:

Originally Posted by Thorham (Post 379783)

Not a clue, yes. The only thing I could find was how much cpu a routine used, by setting up color #0 (dff180) to something upon entry, then resetting it to black at the end. The more of the color you see, the more cpu the code takes.

If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.
Here :

Thorham

22 December 2007 23:30

Quote:

Originally Posted by meynaf

I can't remember how often I've used that method, too :D

Very tough to find simple explanations for mpeg decoding :scream However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.

After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion :D

Quote:

Originally Posted by meynaf

If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.

I can certainly have a go at it :) However, the sound output is just about the last thing that should be optimized, because of the low bandwidth requirements cd quality sound has, only about 176kb per second when in raw format. Although I really don't mind having a go at it (and could actually enjoy doing so), the cpu intesive parts (read: the hard parts) are the parts where the real profit is. But you knew that, didn't you :D

meynaf

24 December 2007 10:50

Quote:

Originally Posted by Thorham (Post 381407)

I can't remember how often I've used that method, too :D

And you can't do that on nowadays machines. :laughing

Quote:

Originally Posted by Thorham (Post 381407)

Very tough to find simple explanations for mpeg decoding :scream However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.

There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :

Code:

 * The Layer III formula for requantization and scaling is defined by

 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:

 *

 *   long blocks:

 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *

 *           2^((1/4) * (global_gain - 210)) *

 *           2^-(scalefac_multiplier *

 *               (scalefac_l[sfb] + preflag * pretab[sfb]))

 *

 *   short blocks:

 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *

 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *

 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])

 *

 *   where:

 *   scalefac_multiplier = (scalefac_scale + 1) / 2

Not simple, really :banghead

Quote:

Originally Posted by Thorham (Post 381407)

After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion :D

I did. Definitely option 2.

Quote:

Originally Posted by Thorham (Post 381407)

The part of that stuff is similar to the ham rendering as compared to the jpeg decoding proper, so it's not useless to check.

Remember that we can't play that 16-bit 44.1 data directly ; we have to downsample it before, and prepare it for 14-bit output. My code does this in 5:3 instead of the usual 2:1, leading to 26460hz instead of 22050 (better quality). But, of course, this takes some time.

When I use mpega I'm often at 95% cpu use (when there aren't gaps in the replay !), so it's worth removing whatever we can.

This code must write to chip memory, and there are nasty divides in it. You sure know these things aren't fast ;)

Thorham

24 December 2007 15:56

Quote:

Originally Posted by meynaf

And you can't do that on nowadays machines. :laughing

Maybe you can with palette based screen modes :D

Quote:

Originally Posted by meynaf

There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :

Code:

 * The Layer III formula for requantization and scaling is defined by

 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:

 *

 *   long blocks:

 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *

 *           2^((1/4) * (global_gain - 210)) *

 *           2^-(scalefac_multiplier *

 *               (scalefac_l[sfb] + preflag * pretab[sfb]))

 *

 *   short blocks:

 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *

 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *

 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])

 *

 *   where:

 *   scalefac_multiplier = (scalefac_scale + 1) / 2

Not simple, really :banghead

So this does both in one go, eh? Doesn't that still mean the huffman decoding simply has to be written to output the variable length data, after which the scaling and re-quantization are handled :confused Maybe I just don't get enough of it, yet :D

Quote:

Originally Posted by meynaf

95% is pretty steep. I suppose optimizing the 14bit routine really should be done then. Although I still believe most of the gain will come from finding optimizations in the really heavy parts of the code :p

It's a big shame the audio dma can only be doubled by doubling the screen scan rate, otherwise the down-sampling wouldn't be needed and one could just chop off two bits, would be faster and sound better.

By the way, have you ever thought of a 15bit routine by any chance? I know I should probably not be bringing this up (will slow things down), but I just couldn't resist :D

All times are GMT +2. The time now is 01:07.

Page 1 of 5

Last »

Show 20 post(s) from this thread on one page

Page generated in 0.06447 seconds with 11 queries