English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 09 June 2016, 21:10   #221
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
Quote:
Originally Posted by meynaf View Post
Part of mpeg audio layer 3 (my accelerated mpega.library) ; huff quad decode (a part simple enough to be submitted here).
Why not just use and + table read? It's a 512 byte table. Doesn't seem excessive.
Thorham is online now  
Old 09 June 2016, 21:18   #222
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Thorham View Post
Why not just use and + table read? It's a 512 byte table. Doesn't seem excessive.
I don't get it. Do you mean 4 table reads each giving 16 bits ? Or 2 table reads each giving 32 bits ?
How would it look like when turned into code ?
meynaf is offline  
Old 09 June 2016, 22:22   #223
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
Quote:
Originally Posted by meynaf View Post
I don't get it. Do you mean 4 table reads each giving 16 bits ? Or 2 table reads each giving 32 bits ?
How would it look like when turned into code ?
I made a mistake there, but this should be nice and simple, and it's still only a 512 byte table:
Code:
    clr.w   d6
    move.b  d0,d6
    move.w  table(pc,d6.w*2),(a1)+
Thorham is online now  
Old 09 June 2016, 22:42   #224
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
That's not enough. You did only one value this way and there are 4 to do.
meynaf is offline  
Old 09 June 2016, 23:25   #225
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
Try a sequence of 4
bfexts d1{x:2}, d6
+
move.w d6, (a1)+
.
Leffmann is offline  
Old 09 June 2016, 23:59   #226
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
Quote:
Originally Posted by meynaf View Post
That's not enough. You did only one value this way and there are 4 to do.
No, it does four values in one go, hence the 512 byte table. There's no need to look up each bit pair individually, you can just do all four at once.
Thorham is online now  
Old 10 June 2016, 00:56   #227
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,047
Thorham idea is the best and fastest, code can looks next:

moveq #31,D6
and.b D0,D6
move.l Table(PC,D6.W*4),(A1)+

Table will be 128x4.
Don_Adan is offline  
Old 10 June 2016, 08:15   #228
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Leffmann View Post
Try a sequence of 4
bfexts d1{x:2}, d6
+
move.w d6, (a1)+
.
Assuredly the shortest solution
Unfortunately bfexts is slow (10 clocks on 030).


Quote:
Originally Posted by Thorham View Post
No, it does four values in one go, hence the 512 byte table. There's no need to look up each bit pair individually, you can just do all four at once.
All four at once give 64 bits of data (4 16-bit blocks). You can't read that in one go. The code you presented does only one 16-bit value.
To fetch the whole 64 bits (8 bytes) you need a table of 256*8=2048 bytes.

Please present the full code doing it, to prevent any misunderstanding.


Quote:
Originally Posted by Don_Adan View Post
Thorham idea is the best and fastest, code can looks next:

moveq #31,D6
and.b D0,D6
move.l Table(PC,D6.W*4),(A1)+

Table will be 128x4.
Full code please

This thread isn't for theoretical ideas, it's for direct working code submitting - if i'm not mistaken. So please submit full working code.

Last edited by meynaf; 10 June 2016 at 08:32. Reason: avoiding back-to-back posts
meynaf is offline  
Old 10 June 2016, 08:56   #229
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
If you want the fastest and still keep it within 100 bytes, it should be two lookups from a 64-byte table:
Code:
moveq    #15, d6
and.b    d1, d6
move.l   (table, pc, d6.w*4), (4, a1)
move.b   d1, d6
lsr.b    #4, d6
move.l   (table, pc, d6.w*4), (a1)
addq.l   #8, a1
Leffmann is offline  
Old 10 June 2016, 09:11   #230
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Not bad. But compare it to :
Code:
 rept 4
 add.b d1,d1
 subx.w d6,d6
 add.b d1,d1
 addx.w d6,d6
 move.w d6,(a1)+
 endr
meynaf is offline  
Old 10 June 2016, 11:23   #231
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,047
Quote:
Originally Posted by meynaf View Post
Assuredly the shortest solution
Unfortunately bfexts is slow (10 clocks on 030).



All four at once give 64 bits of data (4 16-bit blocks). You can't read that in one go. The code you presented does only one 16-bit value.
To fetch the whole 64 bits (8 bytes) you need a table of 256*8=2048 bytes.

Please present the full code doing it, to prevent any misunderstanding.



Full code please

This thread isn't for theoretical ideas, it's for direct working code submitting - if i'm not mistaken. So please submit full working code.
Oki, i forget that it must be word output for every 2 bits input. Then if output can not be changed to byte size, like $00, $01, $xx, $FF, then table must be 1024 bytes long.
Code can looks next:
Code:
moveq    #63,D6
and.b    D1, D6
move.l   Table(PC,D6.W*8),(A1)+
move.l   Table+4(PC,D6.W*8),(A1)+

Table
dc.w 0,0,0,0
dc.w 0,0,0,1
....
dc.w -1,-1,-1,-1
Don_Adan is offline  
Old 10 June 2016, 11:32   #232
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
It doesn't work. You clear b7 and b6 so first word will be wrong.

This method can work but it needs 256*8=2048 bytes. This is overkill (actually even a 64-byte table is too big for my taste ; this code isn't that important).
meynaf is offline  
Old 10 June 2016, 12:25   #233
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,047
Quote:
Originally Posted by meynaf View Post
It doesn't work. You clear b7 and b6 so first word will be wrong.

This method can work but it needs 256*8=2048 bytes. This is overkill (actually even a 64-byte table is too big for my taste ; this code isn't that important).
You right, sorry. Only Leffmann table version can be a few optimised. But you can made speed benchmark for both versions. I dont know why you need word output, not byte output for your code. You waste half of destination buff, single ext.w when value is readed from table will be better for me.
Don_Adan is offline  
Old 10 June 2016, 12:40   #234
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Don_Adan View Post
I dont know why you need word output, not byte output for your code. You waste half of destination buff, single ext.w when value is readed from table will be better for me.
Taken out of context it seems strange indeed.
But the huffquad data follows regular huff data (which is full 16-bit) in the same buffer, to be read by the same routines after that (first stereo handling, then imdct). The position at which quad data starts isn't constant : there may be a lot of it, or very few, or even none at all.


That said, the quickest method appears to be :
Code:
 moveq #0,d6
 move.b d1,d6
 move.l table(pc,d6.w*8),(a1)+
 move.l table+4(pc,d6.w*8),(a1)+
(with a 2048-byte table).

But... Did i say that i don't like tables, especially when they're put right in the middle of the code ?

Btw. if we're done with this one and/or ppl are ready i already have my next challenge idea...
meynaf is offline  
Old 10 June 2016, 12:54   #235
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
Quote:
Originally Posted by meynaf View Post
All four at once give 64 bits of data (4 16-bit blocks). You can't read that in one go. The code you presented does only one 16-bit value.
To fetch the whole 64 bits (8 bytes) you need a table of 256*8=2048 bytes.
Yeah, I misread it. I thought the output cases were in binary
Thorham is online now  
Old 10 June 2016, 13:02   #236
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Thorham View Post
Yeah, I misread it. I thought the output cases were in binary
I suppose i wasn't too clear either

Ready for the next one ?
meynaf is offline  
Old 10 June 2016, 14:59   #237
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,047
Quote:
Originally Posted by meynaf View Post
Taken out of context it seems strange indeed.
But the huffquad data follows regular huff data (which is full 16-bit) in the same buffer, to be read by the same routines after that (first stereo handling, then imdct). The position at which quad data starts isn't constant : there may be a lot of it, or very few, or even none at all.


That said, the quickest method appears to be :
Code:
 moveq #0,d6
 move.b d1,d6
 move.l table(pc,d6.w*8),(a1)+
 move.l table+4(pc,d6.w*8),(a1)+
(with a 2048-byte table).

But... Did i say that i don't like tables, especially when they're put right in the middle of the code ?

Btw. if we're done with this one and/or ppl are ready i already have my next challenge idea...
You can use Leffman version with 64 byte table. 10c slowest than 2048 table version, if i remember correctly 68030 timings.
Code:
 moveq #0,d6
 move.b d1,d6
 ror.l #4,d6
 move.l table(pc,d6.w*4),(a1)+
 clr.w d6
 rol.l #4,d6
 move.l table(pc,d6.w*4),(a1)+
(with a 64-byte table).

What is next challenge? And which is main prize?
Don_Adan is offline  
Old 10 June 2016, 15:10   #238
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Don_Adan View Post
What is next challenge? And which is main prize?
No prize awarded, sorry

But i can explain my next challenge.

Not the same project, but still real life - and still used in some audio decoder.
This is middle-side stereo decode of flac.

We have :
c0 = longword data coming from (a0)+
c1 = longword data coming from (a1)+
l,r = left and right pcm samples - double word data to write to (a2)+

And the computation to get l,r from c0,c1 is (mid,side are temporaries) :
side = c1
mid = c0 *2 + (side & 1)
l = (mid+side) /2
r = (mid-side) /2

This is a lossless format and the data can just be truncated without any need to clamp.
However the end result must be exact.
(Anyone interested can read the original libflac C source ; the stuff comes from there.)

All regs can be used (personnally i used d1,d2,d3). Again assume 030 timing.

I wonder if ppl will still try to use tables for this.
meynaf is offline  
Old 10 June 2016, 16:20   #239
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
Questions:

1. What's read from (a0) and (a1) exactly? Two 16 bit samples per long word, or one 32 bit sample? If it's two samples per long word, then what order are they in?
2. What's the order of the 2 x 16 bit output samples per longword?
Thorham is online now  
Old 10 June 2016, 16:43   #240
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
Quote:
Originally Posted by Thorham View Post
1. What's read from (a0) and (a1) exactly? Two 16 bit samples per long word, or one 32 bit sample? If it's two samples per long word, then what order are they in?
That's one 32 bit sample (up to 17 bits are actually used as it might be the sum of two 16-bit values).
As you have two sources you will end up with two 32 bit samples, of course.


Quote:
Originally Posted by Thorham View Post
2. What's the order of the 2 x 16 bit output samples per longword?
Samples are output left channel, then right - like in a wave file (actually more like in aiff as it's signed and big endian).
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Starting ASM coding on A1200. Which Assembler? Nosferax Coders. Asm / Hardware 68 27 November 2015 16:14
4th tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 11 10 April 2013 20:32
3rd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 6 26 March 2013 15:57
First tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 46 18 March 2013 12:33
2nd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 10 17 March 2013 11:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:40.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10890 seconds with 14 queries