English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 10 December 2007, 11:02   #1
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
mpega.library faster than itself

Hello coders,


As you know, there is no support for mpega.library and the author can't be contacted (it seems).

I asked myself if the integer version of the library, which is actually the fastest Amiga implementation of such a decoder (or I think so ), could be optimized, and found out it could.
I re-sourced it and got around 10% speed, enough to play in medium quality setting what I previously played in low.
Now most of the (up to) 160 kbps mp3's can be played at 22.05 medium quality and mono on a 50mhz 68030.

To use it, DeliTracker's mpega player will do nicely.

I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).

You will find the actual source (reassembler output, so mostly unreadable) here :
http://meynaf.free.fr/tmp/mpega.lzx
(not in the zone because files don't live long enough there)

Included is the library doc. Unfortunately I don't have the lvo's.

More can surely be done. If someone already did something similar or is interested, you now know where to help...
meynaf is online now  
Old 10 December 2007, 11:13   #2
StrategyGamer
Total Chaos AGA is fun!
 
Join Date: Jun 2005
Location: USA
Posts: 873
I had wanted to optimize it myself, timeless ages ago, but I could not find the source.

Thanks for doing it for me.
StrategyGamer is offline  
Old 12 December 2007, 00:37   #3
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
You will find the actual source (reassembler output, so mostly unreadable) here :
Unreadable indeed

But seriously: muls muls muls and more muls When I saw that, I thought: Hoo boy. This thing is no joke. Must admit, I didn't read through all of the code, but thats a serious challenge you've gotten yourself into

Optimizing this one isn't the same as the instruction juggling I've been doing for your ham rendering engine, which is small and well commented (translators are far from perfect, but they do help with the French). This is a whole different cup of tea

Quote:
Originally Posted by meynaf
I don't know if it's better to rewrite it all or start from its actual code.
I started both but the rewrite stopped 'coz lack of understanding of the layer3 (that is, lack of docs).
Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...

This one will separate the men from the boys. One day I will be a man.
Thanks for sharing

Last edited by Thorham; 12 December 2007 at 00:47.
Thorham is offline  
Old 12 December 2007, 10:07   #4
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Unreadable indeed

But seriously: muls muls muls and more muls When I saw that, I thought: Hoo boy. This thing is no joke. Must admit, I didn't read through all of the code, but thats a serious challenge you've gotten yourself into

Optimizing this one isn't the same as the instruction juggling I've been doing for your ham rendering engine, which is small and well commented (translators are far from perfect, but they do help with the French). This is a whole different cup of tea
Else that would just be too easy

Quote:
Originally Posted by Thorham View Post
Thats a pity. Are you sure you can't find what you need on the net? Might be worth taking a look...
Already done ! What I found was way too superficial. When it comes to actually do something real with the data, you're left to the existing code. I looked into mpg123 and libmad sources, but... er... errhm...

You can try to find something, too. Maybe you'll be more lucky than I was.

What I did for my rewrite project is that :
. reading the file (ok, not too hard )
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.
Quote:
Originally Posted by Thorham View Post
This one will separate the men from the boys. One day I will be a man.
Thanks for sharing
No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...
meynaf is online now  
Old 12 December 2007, 16:15   #5
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
Else that would just be too easy
I agree. If it's too easy it's no fun

Quote:
Originally Posted by meynaf
Already done ! What I found was way too superficial. When it comes to actually do something real with the data, you're left to the existing code. I looked into mpg123 and libmad sources, but... er... errhm...

You can try to find something, too. Maybe you'll be more lucky than I was.
Right, I'll see if I can find anything useful. You'd think in this day and age the net is simply filled with the info one needs... But I know, sometimes it's really hard to find what you need, while at other times it's too easy. Maybe I will have more luck.

Quote:
Originally Posted by meynaf
What I did for my rewrite project is that :
. reading the file (ok, not too hard )
. parsing headers and side info (those are quite well documented)
. circular buffer handling for data
. reading the scale factors

But now the next step is to read huffman data. That is, do the work that III_huffdecode() in libmad's layer3.c does (note that I concentrate on layer3, else I wouldn't have any file to test). It's not just reading huffman codes, something is done on the fly with the data.
That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.

Quote:
Originally Posted by meynaf
No beginner's stuff, sure. It is not that an implementation in code is that hard, but I simply don't know what to do...
Well, hopefully I can find some docs. I've been interested in how the mp3 format works for quite some time now, so this is the perfect opportunity to get to now more about it. I'll keep you posted on anything interesting/useful I find.
Thorham is offline  
Old 12 December 2007, 16:53   #6
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Right, I'll see if I can find anything useful. You'd think in this day and age the net is simply filled with the info one needs... But I know, sometimes it's really hard to find what you need, while at other times it's too easy. Maybe I will have more luck.
Fingers crossed...

Quote:
Originally Posted by Thorham View Post
That is already seems like quite some code, would be a shame to abandon it just because you're lacking some docs.
When I did it, I just wanted to see where I could go without being blocked. But shame on me now if I don't continue

Quote:
Originally Posted by Thorham View Post
Well, hopefully I can find some docs. I've been interested in how the mp3 format works for quite some time now, so this is the perfect opportunity to get to now more about it. I'll keep you posted on anything interesting/useful I find.
Thanks in advance.

Globally, I know how it works, but I need much more than a distant view to write a player...
meynaf is online now  
Old 12 December 2007, 19:19   #7
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Searching for docs and sources didn't yield a whole lot of results, but I think the following links will be interesting.

MP3' Tech - MPEG source codes Has multiple mp3 decoder sources, amongst other things.
MP3 - Hydrogenaudio Knowledgebase Looks like a nice page with some potentially interesting links, including one that describes huffman decoding.
www.eecs.umich.edu/~accheng/doc/MP3Decoder_AllenCheng.pdf Seems to be an in-depth description of mp3 decoding.
www.mp3-tech.org/programmer/docs/fpga_report.pdf Also a more in-depth description. It's a hardware implementation, but that shouldn't matter too much.

These are the best I could find without searching for hours, and although I'm sure they'll make for some interesting reading, I hope some of it is actually of some use to you. If not, I'm going to have to dig a 'little' deeper
Thorham is offline  
Old 13 December 2007, 10:13   #8
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented...

Maybe you'll have to dig a little deeper
meynaf is online now  
Old 13 December 2007, 10:24   #9
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
These are interesting reading indeed, but yet not precise enough to actually write a decoder. Existing code is the best doc I have found so far : I have the sources for mpeg3play, mpg123, and mad. They're not too commented...
Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.

Quote:
Originally Posted by meynaf
Maybe you'll have to dig a little deeper
I don't mind digging a little deeper
Thorham is offline  
Old 13 December 2007, 11:11   #10
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Have you taken a look at the source code of Amp (and others) on MP3' Tech? If you haven't I suggest you do.
I already have the sources of Amp. Quite unreadable. Couldn't make them compile on amiga.

The more interesting are mpg123, because it's the most readable I've found (but very very slow as it uses floating-point), and libmad (which is much faster as it uses fixed-point, that is, integer).

Quote:
Originally Posted by Thorham View Post
I don't mind digging a little deeper
If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough )

What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).
meynaf is online now  
Old 13 December 2007, 18:06   #11
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
I already have the sources of Amp. Quite unreadable. Couldn't make them compile on amiga.

The more interesting are mpg123, because it's the most readable I've found (but very very slow as it uses floating-point), and libmad (which is much faster as it uses fixed-point, that is, integer).
And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.

Quote:
Originally Posted by meynaf
If you don't find anything about mp3 then you will maybe find petroleum (if you dig deep enough )
That's more or less already what I find, except it's a little bit more sticky

Quote:
Originally Posted by meynaf
What I need right now is either an optimization for mpega, or a detailed algorithm to decode layer3's particular huffman codes (just knowing how huffman works is not enough).
At first I thought finding good docs on this couldn't be that hard, but it seems like all of them are in the silly math format. Those math people apparently like to write down just about everything in the form of mathematical formulas. I'm afraid this is going to be harder then I thought (but I will try).

Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.
Thorham is offline  
Old 13 December 2007, 18:07   #12
BippyM
Global Moderator
 
BippyM's Avatar
 
Join Date: Nov 2001
Location: Derby, UK
Age: 48
Posts: 9,355
You two guys should have your oiwn section

Meynaf & Thorams asm coding and chat - There only needs to be 2 members

Only joking guys.. nice to see some asm action in here
BippyM is offline  
Old 13 December 2007, 18:28   #13
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
And even these don't make it obvious what must be done with that huffman coding? Well, that is quite a challenge then.
It sure is not obvious.

Quote:
Originally Posted by Thorham View Post
That's more or less already what I find, except it's a little bit more sticky
If you can sell it...

Quote:
Originally Posted by Thorham View Post
At first I thought finding good docs on this couldn't be that hard, but it seems like all of them are in the silly math format. Those math people apparently like to write down just about everything in the form of mathematical formulas. I'm afraid this is going to be harder then I thought (but I will try).
When it's not maths, then it's code.
Which one do you prefer ?

Quote:
Originally Posted by Thorham View Post
Trying to understand everything from just a source code really should be a last resort, unless it explains all the details.
Oh, it sure gives all the details, but it doesn't explain anything.

Quote:
Originally Posted by bippym View Post
You two guys should have your oiwn section

Meynaf & Thorams asm coding and chat - There only needs to be 2 members

Only joking guys.. nice to see some asm action in here
I have nothing against a section
(and neither anything against more people to participate, too )
meynaf is online now  
Old 14 December 2007, 16:24   #14
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
When it's not maths, then it's code.
Which one do you prefer ?
I will always prefer code.

Quote:
Originally Posted by meynaf
Oh, it sure gives all the details, but it doesn't explain anything.
That surely doesn't make it any easier. I wonder why it's so hard to find good and simple documentation. Doesn't make sense. I have to say I haven't found much stuff thats any more interesting then what I already found, but I won't give up!
Thorham is offline  
Old 14 December 2007, 16:47   #15
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
I will always prefer code.
So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ? )

Quote:
Originally Posted by Thorham View Post
That surely doesn't make it any easier. I wonder why it's so hard to find good and simple documentation. Doesn't make sense. I have to say I haven't found much stuff thats any more interesting then what I already found, but I won't give up!
I have found some links bringing to documents you had to pay to get.
I fear there is some sort of copyright blocking free docs
meynaf is online now  
Old 17 December 2007, 16:07   #16
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
So what do you think about existing code ? Very easy to read and understand, eh ? (who said no ? )
That depends. Your ham rendering engine was pretty easy to get the grips with, but then again it was also well documented. It's that mpega re-source that is really hard to understand: no remarks except the ones you write yourself, and not a clue about the workings of the code at all.

Quote:
Originally Posted by meynaf
I have found some links bringing to documents you had to pay to get.
I fear there is some sort of copyright blocking free docs
I found a book about mp3 on the net. Of course you have to pay to see all of it It's not as easy as I thought to find understandable and yet complete documentation about mp3. For, say, 680x0 coding this is much easier as you're just going to find the original Motorola docs! You sure picked one
Thorham is offline  
Old 21 December 2007, 11:16   #17
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
I found a book about mp3 on the net. Of course you have to pay to see all of it It's not as easy as I thought to find understandable and yet complete documentation about mp3. For, say, 680x0 coding this is much easier as you're just going to find the original Motorola docs! You sure picked one
I don't remember how many docs I picked, but it sure was more than one

Quote:
Originally Posted by Thorham View Post
That depends. Your ham rendering engine was pretty easy to get the grips with, but then again it was also well documented. It's that mpega re-source that is really hard to understand: no remarks except the ones you write yourself, and not a clue about the workings of the code at all.
Not a clue, yes. The only thing I could find was how much cpu a routine used, by setting up color #0 (dff180) to something upon entry, then resetting it to black at the end. The more of the color you see, the more cpu the code takes.

If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.
Here :

Last edited by meynaf; 12 May 2011 at 08:32.
meynaf is online now  
Old 22 December 2007, 23:30   #18
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
Not a clue, yes. The only thing I could find was how much cpu a routine used, by setting up color #0 (dff180) to something upon entry, then resetting it to black at the end. The more of the color you see, the more cpu the code takes.
I can't remember how often I've used that method, too

Very tough to find simple explanations for mpeg decoding However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.

After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion
Quote:
Originally Posted by meynaf
If you can't find something in here, then I have something else that could be useful to accelerate : my 44.1 khz 14-bit rendering code.
I can certainly have a go at it However, the sound output is just about the last thing that should be optimized, because of the low bandwidth requirements cd quality sound has, only about 176kb per second when in raw format. Although I really don't mind having a go at it (and could actually enjoy doing so), the cpu intesive parts (read: the hard parts) are the parts where the real profit is. But you knew that, didn't you
Thorham is offline  
Old 24 December 2007, 10:50   #19
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
I can't remember how often I've used that method, too
And you can't do that on nowadays machines.

Quote:
Originally Posted by Thorham View Post
Very tough to find simple explanations for mpeg decoding However, I was able to work out that the huffman code seems to differ from normal huffman code in that it outputs variable length data (has to do with the scaling factors if I'm not mistaken) instead of fixed length data. This has lead me to believe that the only thing that happens during the huffman decoding stage is scaling the data to some fixed length. If this is correct, then it should not be to hard to extract this from one of the source codes you have, and make your own routine based on this.
There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :
Code:
 * The Layer III formula for requantization and scaling is defined by
 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:
 *
 *   long blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210)) *
 *           2^-(scalefac_multiplier *
 *               (scalefac_l[sfb] + preflag * pretab[sfb]))
 *
 *   short blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *
 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])
 *
 *   where:
 *   scalefac_multiplier = (scalefac_scale + 1) / 2
Not simple, really

Quote:
Originally Posted by Thorham View Post
After searching the web for a while, it began to dawn to me that you have two choices: 1. Go through trouble of learning how layer 3 really works, including the math. 2. Go through the trouble of understanding someone else's source code. Personally I'm not a math guy, as you know, so I would definitely go for option two. Of course, you probably came to the same conclusion
I did. Definitely option 2.

Quote:
Originally Posted by Thorham View Post
I can certainly have a go at it However, the sound output is just about the last thing that should be optimized, because of the low bandwidth requirements cd quality sound has, only about 176kb per second when in raw format. Although I really don't mind having a go at it (and could actually enjoy doing so), the cpu intesive parts (read: the hard parts) are the parts where the real profit is. But you knew that, didn't you
The part of that stuff is similar to the ham rendering as compared to the jpeg decoding proper, so it's not useless to check.

Remember that we can't play that 16-bit 44.1 data directly ; we have to downsample it before, and prepare it for 14-bit output. My code does this in 5:3 instead of the usual 2:1, leading to 26460hz instead of 22050 (better quality). But, of course, this takes some time.

When I use mpega I'm often at 95% cpu use (when there aren't gaps in the replay !), so it's worth removing whatever we can.

This code must write to chip memory, and there are nasty divides in it. You sure know these things aren't fast
meynaf is online now  
Old 24 December 2007, 15:56   #20
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,757
Quote:
Originally Posted by meynaf
And you can't do that on nowadays machines.
Maybe you can with palette based screen modes
Quote:
Originally Posted by meynaf
There are more computations than that : the code also performs on-the-fly requantization.
According to libmad's layer3.c :
Code:
 * The Layer III formula for requantization and scaling is defined by
 * section 2.4.3.4.7.1 of ISO/IEC 11172-3, as follows:
 *
 *   long blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210)) *
 *           2^-(scalefac_multiplier *
 *               (scalefac_l[sfb] + preflag * pretab[sfb]))
 *
 *   short blocks:
 *   xr[i] = sign(is[i]) * abs(is[i])^(4/3) *
 *           2^((1/4) * (global_gain - 210 - 8 * subblock_gain[w])) *
 *           2^-(scalefac_multiplier * scalefac_s[sfb][w])
 *
 *   where:
 *   scalefac_multiplier = (scalefac_scale + 1) / 2
Not simple, really
So this does both in one go, eh? Doesn't that still mean the huffman decoding simply has to be written to output the variable length data, after which the scaling and re-quantization are handled Maybe I just don't get enough of it, yet
Quote:
Originally Posted by meynaf
The part of that stuff is similar to the ham rendering as compared to the jpeg decoding proper, so it's not useless to check.

Remember that we can't play that 16-bit 44.1 data directly ; we have to downsample it before, and prepare it for 14-bit output. My code does this in 5:3 instead of the usual 2:1, leading to 26460hz instead of 22050 (better quality). But, of course, this takes some time.

When I use mpega I'm often at 95% cpu use (when there aren't gaps in the replay !), so it's worth removing whatever we can.

This code must write to chip memory, and there are nasty divides in it. You sure know these things aren't fast
95% is pretty steep. I suppose optimizing the 14bit routine really should be done then. Although I still believe most of the gain will come from finding optimizations in the really heavy parts of the code

It's a big shame the audio dma can only be doubled by doubling the screen scan rate, otherwise the down-sampling wouldn't be needed and one could just chop off two bits, would be faster and sound better.

By the way, have you ever thought of a 15bit routine by any chance? I know I should probably not be bringing this up (will slow things down), but I just couldn't resist
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can it be faster? oRBIT Coders. General 2 16 May 2011 20:38
Chipram 3x faster? oRBIT Coders. General 10 20 July 2010 02:13
mpega.library (WarpUP) problem radzik support.Apps 23 14 December 2009 17:05
Making a shared library from a gcc .a library JoJo Coders. General 1 10 March 2003 19:06
Faster Emu Radgam support.WinUAE 3 27 February 2003 17:16

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:48.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10932 seconds with 14 queries