mpega.library faster than itself

meynaf · 10 December 2007, 11:02

Hello coders,

As you know, there is no support for mpega.library and the author can't be contacted (it seems).

I asked myself if the integer version of the library, which is actually the fastest Amiga implementation of such a decoder (or I think so

), could be optimized, and found out it could.
I re-sourced it and got around 10% speed, enough to play in medium quality setting what I previously played in low.

Now most of the (up to) 160 kbps mp3's can be played at 22.05 medium quality and mono on a 50mhz 68030.

To use it, DeliTracker's mpega player will do nicely.

I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).

You will find the actual source (reassembler output, so mostly unreadable) here :
http://meynaf.free.fr/tmp/mpega.lzx
(not in the zone because files don't live long enough there)

Included is the library doc. Unfortunately I don't have the lvo's.

More can surely be done. If someone already did something similar or is interested, you now know where to help...

StrategyGamer · 10 December 2007, 11:13

I had wanted to optimize it myself, timeless ages ago, but I could not find the source.

Thanks for doing it for me.

Thorham · 12 December 2007, 00:37

Quote:

Originally Posted by meynaf

You will find the actual source (reassembler output, so mostly unreadable) here :

Unreadable indeed

But seriously: muls muls muls and more muls

When I saw that, I thought: Hoo boy. This thing is no joke. Must admit, I didn't read through all of the code, but thats a serious challenge you've gotten yourself into

Optimizing this one isn't the same as the instruction juggling I've been doing for your ham rendering engine, which is small and well commented (translators are far from perfect, but they do help with the French). This is a whole different cup of tea

Quote:

Originally Posted by meynaf

I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).

Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...

This one will separate the men from the boys. One day I will be a man.
Thanks for sharing

meynaf · 12 December 2007, 10:07

Quote:

Originally Posted by Thorham

Unreadable indeed

But seriously: muls muls muls and more muls

When I saw that, I thought: Hoo boy. This thing is no joke. Must admit, I didn't read through all of the code, but thats a serious challenge you've gotten yourself into

Optimizing this one isn't the same as the instruction juggling I've been doing for your ham rendering engine, which is small and well commented (translators are far from perfect, but they do help with the French). This is a whole different cup of tea

Else that would just be too easy

Quote:

Originally Posted by Thorham

Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...

Already done ! What I found was way too superficial. When it comes to actually do something real with the data, you're left to the existing code. I looked into mpg123 and libmad sources, but... er... errhm...

You can try to find something, too. Maybe you'll be more lucky than I was.

What I did for my rewrite project is that :
. reading the file (ok, not too hard

)
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.

Quote:

Originally Posted by Thorham

This one will separate the men from the boys. One day I will be a man.
Thanks for sharing

No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...

Thorham · 12 December 2007, 16:15

Quote:

Originally Posted by meynaf

Else that would just be too easy

I agree. If it's too easy it's no fun

Quote:

Originally Posted by meynaf

Already done ! What I found was way too superficial. When it comes to actually do something real with the data, you're left to the existing code. I looked into mpg123 and libmad sources, but... er... errhm...

You can try to find something, too. Maybe you'll be more lucky than I was.

Right, I'll see if I can find anything useful. You'd think in this day and age the net is simply filled with the info one needs... But I know, sometimes it's really hard to find what you need, while at other times it's too easy. Maybe I will have more luck.

Quote:

Originally Posted by meynaf

What I did for my rewrite project is that :
. reading the file (ok, not too hard

)
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.

That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.

Quote:

Originally Posted by meynaf

No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...

Well, hopefully I can find some docs. I've been interested in how the mp3 format works for quite some time now, so this is the perfect opportunity to get to now more about it. I'll keep you posted on anything interesting/useful I find.

meynaf · 12 December 2007, 16:53

Quote:

Originally Posted by Thorham

Right, I'll see if I can find anything useful. You'd think in this day and age the net is simply filled with the info one needs... But I know, sometimes it's really hard to find what you need, while at other times it's too easy. Maybe I will have more luck.

Fingers crossed...

Quote:

Originally Posted by Thorham

That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.

When I did it, I just wanted to see where I could go without being blocked. But shame on me now if I don't continue

Quote:

Originally Posted by Thorham

Well, hopefully I can find some docs. I've been interested in how the mp3 format works for quite some time now, so this is the perfect opportunity to get to now more about it. I'll keep you posted on anything interesting/useful I find.

Thanks in advance.

Globally, I know how it works, but I need much more than a distant view to write a player...

Thorham · 12 December 2007, 19:19

Searching for docs and sources didn't yield a whole lot of results, but I think the following links will be interesting.

MP3' Tech - MPEG source codes Has multiple mp3 decoder sources, amongst other things.
MP3 - Hydrogenaudio Knowledgebase Looks like a nice page with some potentially interesting links, including one that describes huffman decoding.
www.eecs.umich.edu/~accheng/doc/MP3Decoder_AllenCheng.pdf Seems to be an in-depth description of mp3 decoding.
www.mp3-tech.org/programmer/docs/fpga_report.pdf Also a more in-depth description. It's a hardware implementation, but that shouldn't matter too much.

These are the best I could find without searching for hours, and although I'm sure they'll make for some interesting reading, I hope some of it is actually of some use to you. If not, I'm going to have to dig a 'little' deeper

meynaf · 13 December 2007, 10:13

These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented...

Maybe you'll have to dig a little deeper

Thorham · 13 December 2007, 10:24

Quote:

Originally Posted by meynaf

These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented...

Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.

Quote:

Originally Posted by meynaf

Maybe you'll have to dig a little deeper

I don't mind digging a little deeper

meynaf · 13 December 2007, 11:11

Quote:

Originally Posted by Thorham

Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.

I already have the sources of Amp. Quite unreadable. Couldn't make them compile on amiga.

The more interesting are mpg123, because it's the most readable I've found (but very very slow as it uses floating-point), and libmad (which is much faster as it uses fixed-point, that is, integer).

Quote:

Originally Posted by Thorham

I don't mind digging a little deeper

If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough

)

What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).

Thorham · 13 December 2007, 18:06

Quote:

Originally Posted by meynaf

I already have the sources of Amp. Quite unreadable. Couldn't make them compile on amiga.

The more interesting are mpg123, because it's the most readable I've found (but very very slow as it uses floating-point), and libmad (which is much faster as it uses fixed-point, that is, integer).

And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.

Quote:

Originally Posted by meynaf

If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough

)

That's more or less already what I find, except it's a little bit more sticky

Quote:

Originally Posted by meynaf

What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).

At first I thought finding good docs on this couldn't be that hard, but it seems like all of them are in the silly math format. Those math people apparently like to write down just about everything in the form of mathematical formulas. I'm afraid this is going to be harder then I thought (but I will try).

Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.

BippyM · 13 December 2007, 18:07

You two guys should have your oiwn section

Meynaf & Thorams asm coding and chat - There only needs to be 2 members

Only joking guys.. nice to see some asm action in here

meynaf · 13 December 2007, 18:28

Quote:

Originally Posted by Thorham

And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.

It sure is not obvious.

Quote:

Originally Posted by Thorham

That's more or less already what I find, except it's a little bit more sticky

If you can sell it...

Quote:

Originally Posted by Thorham

At first I thought finding good docs on this couldn't be that hard, but it seems like all of them are in the silly math format. Those math people apparently like to write down just about everything in the form of mathematical formulas. I'm afraid this is going to be harder then I thought (but I will try).

When it's not maths, then it's code.

Which one do you prefer ?

Quote:

Originally Posted by Thorham

Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.

Oh, it sure gives all the details, but it doesn't explain anything.

Quote:

Originally Posted by bippym

You two guys should have your oiwn section

Meynaf & Thorams asm coding and chat - There only needs to be 2 members

Only joking guys.. nice to see some asm action in here

I have nothing against a section

(and neither anything against more people to participate, too

)

Thorham · 14 December 2007, 16:24

Quote:

Originally Posted by meynaf

When it's not maths, then it's code.

Which one do you prefer ?

I will always prefer code.

Quote:

Originally Posted by meynaf

Oh, it sure gives all the details, but it doesn't explain anything.

That surely doesn't make it any easier. I wonder why it's so hard to find good and simple documentation. Doesn't make sense. I have to say I haven't found much stuff thats any more interesting then what I already found, but I won't give up!

meynaf · 14 December 2007, 16:47

Quote:

Originally Posted by Thorham

I will always prefer code.

So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ?

)

Quote:

Originally Posted by Thorham

That surely doesn't make it any easier. I wonder why it's so hard to find good and simple documentation. Doesn't make sense. I have to say I haven't found much stuff thats any more interesting then what I already found, but I won't give up!

I have found some links bringing to documents you had to pay to get.

I fear there is some sort of copyright blocking free docs

Thorham · 17 December 2007, 16:07

Quote:

Originally Posted by meynaf

So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ?

)

That depends. Your ham rendering engine was pretty easy to get the grips with, but then again it was also well documented. It's that mpega re-source that is really hard to understand: no remarks except the ones you write yourself, and not a clue about the workings of the code at all.

Quote:

Originally Posted by meynaf

I have found some links bringing to documents you had to pay to get.

I fear there is some sort of copyright blocking free docs

I found a book about mp3 on the net. Of course you have to pay to see all of it

It's not as easy as I thought to find understandable and yet complete documentation about mp3. For, say, 680x0 coding this is much easier as you're just going to find the original Motorola docs! You sure picked one

meynaf · 21 December 2007, 11:16

Quote:

Originally Posted by Thorham

I found a book about mp3 on the net. Of course you have to pay to see all of it

It's not as easy as I thought to find understandable and yet complete documentation about mp3. For, say, 680x0 coding this is much easier as you're just going to find the original Motorola docs! You sure picked one

I don't remember how many docs I picked, but it sure was more than one

Quote:

Originally Posted by Thorham

That depends. Your ham rendering engine was pretty easy to get the grips with, but then again it was also well documented. It's that mpega re-source that is really hard to understand: no remarks except the ones you write yourself, and not a clue about the workings of the code at all.

Not a clue, yes. The only thing I could find was how much cpu a routine used, by setting up color #0 (dff180) to something upon entry, then resetting it to black at the end. The more of the color you see, the more cpu the code takes.

If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.
Here :

Thorham · 22 December 2007, 23:30

Quote:

Originally Posted by meynaf

Not a clue, yes. The only thing I could find was how much cpu a routine used, by setting up color #0 (dff180) to something upon entry, then resetting it to black at the end. The more of the color you see, the more cpu the code takes.

I can't remember how often I've used that method, too

Very tough to find simple explanations for mpeg decoding

However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.

After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion

Quote:

Originally Posted by meynaf

If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.

I can certainly have a go at it

However, the sound output is just about the last thing that should be optimized, because of the low bandwidth requirements cd quality sound has, only about 176kb per second when in raw format. Although I really don't mind having a go at it (and could actually enjoy doing so), the cpu intesive parts (read: the hard parts) are the parts where the real profit is. But you knew that, didn't you

meynaf · 24 December 2007, 10:50

Quote:

Originally Posted by Thorham

I can't remember how often I've used that method, too

And you can't do that on nowadays machines.

Quote:

Originally Posted by Thorham

Very tough to find simple explanations for mpeg decoding

However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.

There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :

Code:

 * The Layer III formula for requantization and scaling is defined by
 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:
 *
 *   long blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210)) *
 *           2^-(scalefac_multiplier *
 *               (scalefac_l[sfb] + preflag * pretab[sfb]))
 *
 *   short blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *
 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])
 *
 *   where:
 *   scalefac_multiplier = (scalefac_scale + 1) / 2

Not simple, really

Quote:

Originally Posted by Thorham

After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion

I did. Definitely option 2.

Quote:

Originally Posted by Thorham

I can certainly have a go at it

However, the sound output is just about the last thing that should be optimized, because of the low bandwidth requirements cd quality sound has, only about 176kb per second when in raw format. Although I really don't mind having a go at it (and could actually enjoy doing so), the cpu intesive parts (read: the hard parts) are the parts where the real profit is. But you knew that, didn't you

The part of that stuff is similar to the ham rendering as compared to the jpeg decoding proper, so it's not useless to check.

Remember that we can't play that 16-bit 44.1 data directly ; we have to downsample it before, and prepare it for 14-bit output. My code does this in 5:3 instead of the usual 2:1, leading to 26460hz instead of 22050 (better quality). But, of course, this takes some time.

When I use mpega I'm often at 95% cpu use (when there aren't gaps in the replay !), so it's worth removing whatever we can.

This code must write to chip memory, and there are nasty divides in it. You sure know these things aren't fast

Thorham · 24 December 2007, 15:56

Quote:

Originally Posted by meynaf

And you can't do that on nowadays machines.

Maybe you can with palette based screen modes

Quote:

Originally Posted by meynaf

There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :

Code:

 * The Layer III formula for requantization and scaling is defined by
 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:
 *
 *   long blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210)) *
 *           2^-(scalefac_multiplier *
 *               (scalefac_l[sfb] + preflag * pretab[sfb]))
 *
 *   short blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *
 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])
 *
 *   where:
 *   scalefac_multiplier = (scalefac_scale + 1) / 2

Not simple, really

So this does both in one go, eh? Doesn't that still mean the huffman decoding simply has to be written to output the variable length data, after which the scaling and re-quantization are handled

Maybe I just don't get enough of it, yet

Quote:

Originally Posted by meynaf

The part of that stuff is similar to the ham rendering as compared to the jpeg decoding proper, so it's not useless to check.

Remember that we can't play that 16-bit 44.1 data directly ; we have to downsample it before, and prepare it for 14-bit output. My code does this in 5:3 instead of the usual 2:1, leading to 26460hz instead of 22050 (better quality). But, of course, this takes some time.

When I use mpega I'm often at 95% cpu use (when there aren't gaps in the replay !), so it's worth removing whatever we can.

This code must write to chip memory, and there are nasty divides in it. You sure know these things aren't fast

95% is pretty steep. I suppose optimizing the 14bit routine really should be done then. Although I still believe most of the gain will come from finding optimizations in the really heavy parts of the code

It's a big shame the audio dma can only be doubled by doubling the screen scan rate, otherwise the down-sampling wouldn't be needed and one could just chop off two bits, would be faster and sound better.

By the way, have you ever thought of a 15bit routine by any chance? I know I should probably not be bringing this up (will slow things down), but I just couldn't resist

10 December 2007, 11:02	#1
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	mpega.library faster than itself Hello coders, As you know, there is no support for mpega.library and the author can't be contacted (it seems). I asked myself if the integer version of the library, which is actually the fastest Amiga implementation of such a decoder (or I think so ), could be optimized, and found out it could. I re-sourced it and got around 10% speed, enough to play in medium quality setting what I previously played in low. Now most of the (up to) 160 kbps mp3's can be played at 22.05 medium quality and mono on a 50mhz 68030. To use it, DeliTracker's mpega player will do nicely. I don't know if it's better to rewrite it all or start from its actual code. I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs). You will find the actual source (reassembler output, so mostly unreadable) here : http://meynaf.free.fr/tmp/mpega.lzx (not in the zone because files don't live long enough there) Included is the library doc. Unfortunately I don't have the lvo's. More can surely be done. If someone already did something similar or is interested, you now know where to help...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can it be faster?	oRBIT	Coders. General	2	16 May 2011 20:38
Chipram 3x faster?	oRBIT	Coders. General	10	20 July 2010 02:13
mpega.library (WarpUP) problem	radzik	support.Apps	23	14 December 2009 17:05
Making a shared library from a gcc .a library	JoJo	Coders. General	1	10 March 2003 19:06
Faster Emu	Radgam	support.WinUAE	3	27 February 2003 17:16

10 December 2007, 11:13	#2
StrategyGamer Total Chaos AGA is fun! Join Date: Jun 2005 Location: USA Posts: 873	I had wanted to optimize it myself, timeless ages ago, but I could not find the source. Thanks for doing it for me.

12 December 2007, 19:19	#7
Thorham Computer Nerd Join Date: Sep 2007 Location: Rotterdam/Netherlands Age: 47 Posts: 3,757	Searching for docs and sources didn't yield a whole lot of results, but I think the following links will be interesting. MP3' Tech - MPEG source codes Has multiple mp3 decoder sources, amongst other things. MP3 - Hydrogenaudio Knowledgebase Looks like a nice page with some potentially interesting links, including one that describes huffman decoding. www.eecs.umich.edu/~accheng/doc/MP3Decoder_AllenCheng.pdf Seems to be an in-depth description of mp3 decoding. www.mp3-tech.org/programmer/docs/fpga_report.pdf Also a more in-depth description. It's a hardware implementation, but that shouldn't matter too much. These are the best I could find without searching for hours, and although I'm sure they'll make for some interesting reading, I hope some of it is actually of some use to you. If not, I'm going to have to dig a 'little' deeper

13 December 2007, 10:13	#8
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented... Maybe you'll have to dig a little deeper

13 December 2007, 18:07	#12
BippyM Global Moderator Join Date: Nov 2001 Location: Derby, UK Age: 48 Posts: 9,355	You two guys should have your oiwn section Meynaf & Thorams asm coding and chat - There only needs to be 2 members Only joking guys.. nice to see some asm action in here

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)