English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 16 November 2007, 15:27   #21
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Doc Mindie View Post
Sorry, I saw a maths problems where the probably wasn't one........
No problemo. You know, I can convert 50,000,000 images per second

Ok... more seriously, the 12 clock cycles are what's added by my "equilibration" method, not the time taken for the whole conversion (which is more 200-300).
meynaf is offline  
Old 18 November 2007, 04:15   #22
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Thanks

I've been thinking about the stupid scaling problem for the 3x1 frame buffer, and I thought: What if one would simply take the average of three pixels? I sat down in front of my PeeCee and coded it up in FreeBasic (great for testing stuff) and lo and behold, it actually worked! The quality is pretty good.

I then did a version in assembler on my 50mhz 68030, and the completely un-optimized version scaled down an 800x600 picture in one second.

Here is the code (this is not the full loop, it only scales three pixels into one):

;
;Scaling idea for reducing x axis to 33% while preserving quality
;
move.l #0,d3 ;Component input
move.l #data,a0 ;RGB data

;Start of the loop
move.l #0,d0 ;Red
move.l #0,d1 ;Green
move.l #0,d2 ;Blue

move.b (a0)+,d3 ;Pixel 1
add.l d3,d0
move.b (a0)+,d3
add.l d3,d1
move.b (a0)+,d3
add.l d3,d2

move.b (a0)+,d3 ;Pixel 2
add.l d3,d0
move.b (a0)+,d3
add.l d3,d1
move.b (a0)+,d3
add.l d3,d2

move.b (a0)+,d3 ;Pixel 3
add.l d3,d0
move.b (a0)+,d3
add.l d3,d1
move.b (a0)+,d3
add.l d3,d2

divu.w #3,d0
divu.w #3,d1
divu.w #3,d2

and.l #255,d0
and.l #255,d1
and.l #255,d2
;
;At this point, d0,d1 and d2 contain the scaled down rgb data.
;

As you can see, this is ridiculously simple code.

Smack/Infect's JPG2HAM8 actually seems to do the same thing, as it scales down the image if you don't have enough chip mem for the super hires screen (seeing this was pretty fast gave me the idea of trying the simple average method).

Edited: I actually forgot you have to scale down the y axis, too. This time to 50%. Speed wise this makes no difference at all, as the 33% part and the 50% part can be done in one go (thank goodness).

I should have taken this account, but I totally forgot the free basic program runs in 1280x1024, not 1280x512

I have actually made a crude version in assembler which now does all this, simply to verify I got it right this time (it just outputs a bmp file).

Last edited by Thorham; 18 November 2007 at 13:09. Reason: I forgot something...
Thorham is online now  
Old 19 November 2007, 10:54   #23
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Taking the average of three pixels involves making three divisions, that's not what I would call fast (at least on a 030, on 060 it's no big deal).

Ok, so you asked for it, you'll get it. See ham8+viewer.zip in the zone.
Sources for my viewer and ham8-converter are here. Along with my library.
The "v" executable is the viewer in itself (iffs and gifs), the "vj" is the same but with experimental jpeg support (note the *much* bigger size !)
The directory ham8-optim-tests contains a rudimentary pbmplus/pnm viewer, to test the rendering method (output of djpeg is ok for it).
I hope I didn't forget a thing !

Some benchmarks on 030/50 (for ham8 conversion, including c2p) :
1024*768 -> 234 frames (@50 hz)
500*333 -> 53 frames

About the scrolling :
When scrolling an intuition screen, you won't get a left overscan (at least on OS3.0), so the leftmost pixel is always visible !
This sort of defeats the method of changing the three left pixels...

I checked Smack/Infect!'s code. Good rendering, even if there are some visible artifacts. And not fast enough to my taste
Then again, zooming a jpeg image is NOT a good idea.
meynaf is offline  
Old 19 November 2007, 11:57   #24
StrategyGamer
Total Chaos AGA is fun!
 
Join Date: Jun 2005
Location: USA
Posts: 873
Do you have a high-quality, slow version coded for absolute best picture quality?
StrategyGamer is offline  
Old 19 November 2007, 12:16   #25
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by StrategyGamer View Post
Do you have a high-quality, slow version coded for absolute best picture quality?
This version is a high-quality one, just try it.

I once tried to apply some smoothing (error diffusion) but ended up with worse quality and dropped the idea. Also, I saw no difference against programs that adapt the palette to the image.
If you know a program which renders better than the one I gave, then please let me know. Same if you have a better algorithm !
meynaf is offline  
Old 20 November 2007, 09:15   #26
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Thanks for the source codes. Always nice to see how other people do things.

The jpeg viewer's speed is very good, and so is the quality. Good job

If you want to speed this up any further then maybe you could use the blitter to assist in the c2p routine (I didn't see any blitter code in your source, hope I didn't miss it). However, it seems the real bottlenecks are the jpeg decoder and the ham rendering.

Ultimately, it's going to be quite difficult to get the viewer much faster then this.

One last word about the scaling: The three divisions apply to the 24bit output pixel. If all the pixels needed can be averaged in one go (as in 33%x50%), then they collectively only need three divs.

Anyway, your best bet might still be in optimizing the ham rendering and the jpeg decoder. I'll take good look at the ham8 test program to see if I can find anything today. As for the jpeg decoder, the source for vj is missing!

If you're interested in blitter optimizing, I'll upload an article about. One note about that: It's a full explanation on c2p, which you don't need. I only mentioned it because of the blitter optimization (I also have a bunch of c2p source codes for different processors, probably not very use full).

Let me know if you want the articles and sources, and please upload the source code for vj!
Thorham is online now  
Old 20 November 2007, 10:30   #27
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,332
I am amazed to learn that AGA Sliced-HAM (a regular 256-colour AGA screen with dynamic palette) isnt better than regular HAM8.

On a 320 pixel wide screen AGA Sliced-HAM almost gives you true colour as you can have 256 colours per line and no fringing.
alexh is online now  
Old 20 November 2007, 10:55   #28
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Thanks for the source codes. Always nice to see how other people do things.

The jpeg viewer's speed is very good, and so is the quality. Good job
Thanks.

But try FastJPEG from Christoph Feck :
http://aminet.net/package/gfx/show/FastJPEG_1.10
and you'll see what I want to get at the end.

Quote:
Originally Posted by Thorham View Post
If you want to speed this up any further then maybe you could use the blitter to assist in the c2p routine (I didn't see any blitter code in your source, hope I didn't miss it). However, it seems the real bottlenecks are the jpeg decoder and the ham rendering.
You didn't see any blitter code, and hopefully because there is none.
If only we had an updated 32-bit blitter...

The interest of a blitter c2p is that it frees the cpu while it works, but in fact it is slower.
And the c2p isn't much of the overall time, say, something like 2%.
Even if it could gain 20% speed (I've got serious doubt about this), that would be 20% of 2%. Not worth the trouble.
And it can't be made twice faster, that would be more than the speed of a bare copymem.

The main bottleneck is undoubtedly the jpeg decoding, and always will be.
Then it's the ham rendering, which can be used for other formats as well (e.g. bmp, if one day i decide to add it) and then become the speed-critical part.

Quote:
Originally Posted by Thorham View Post
Ultimately, it's going to be quite difficult to get the viewer much faster then this.
For iffs and gifs, this is right. However, for jpeg there is still room for optimization.

Quote:
Originally Posted by Thorham View Post
One last word about the scaling: The three divisions apply to the 24bit output pixel. If all the pixels needed can be averaged in one go (as in 33%x50%), then they collectively only need three divs.
That's still too many divs for me, especially if they can't be put right after a write to chipmem.

Quote:
Originally Posted by Thorham View Post
Anyway, your best bet might still be in optimizing the ham rendering and the jpeg decoder. I'll take good look at the ham8 test program to see if I can find anything today. As for the jpeg decoder, the source for vj is missing!
I wanted to do : first the ham rendering (hence this thread), then the jpeg decoder, which will be much more work.

There is no single source for vj. There are instead a bunch of sources, as it's a (somewhat hacked) jpeg library v6. So that's C code. And a lot of it.
Yes, I didnt't put the sources for that, as they often no longer compile when I work on them, and rebuilding the thing is very tricky because of the link with ASM.

I dunno which part of that C code can be rewritten in asm first, but rewriting the whole thing is something I intended to do.

Quote:
Originally Posted by Thorham View Post
If you're interested in blitter optimizing, I'll upload an article about. One note about that: It's a full explanation on c2p, which you don't need. I only mentioned it because of the blitter optimization (I also have a bunch of c2p source codes for different processors, probably not very use full).
Not useful for that particular case, yes, but still very interesting.

Quote:
Originally Posted by Thorham View Post
Let me know if you want the articles and sources, and please upload the source code for vj!
Yes, I'd be happy if you gave me both the article and the bunch of source code - just for the sake of curiosity.

For the jpeg part, well, get the IJG jpeg library v6 and you'll get most of vj's sources.
I can up the sources if you insist (not before this week-end). They're made to work with phxass and hisoft c++. Other compiler and/or settings may or may not work.
There is no C startup/cleanup code as everything is handled by my asm library (which does all the resource tracking). The C parts are called from asm, not the other way, so the C should not use any address register for globals - in fact it mustn't even make a single OS call !
meynaf is offline  
Old 20 November 2007, 11:18   #29
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by alexh View Post
I am amazed to learn that AGA Sliced-HAM (a regular 256-colour AGA screen with dynamic palette) isnt better than regular HAM8.

On a 320 pixel wide screen AGA Sliced-HAM almost gives you true colour as you can have 256 colours per line and no fringing.
Alas, you can't have 256 colours per line, because the copper is much too slow for that...
It requires 4 lowres pixels or so to change a color palette entry if I am not mistaken. For 256 colors it's a minimum of 4*256=1024 pixels, much more than 320 + the borders.
meynaf is offline  
Old 20 November 2007, 11:45   #30
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,332
Quote:
Originally Posted by meynaf View Post
Alas, you can't have 256 colours per line
No?

Quote:
because the copper is much too slow for that... It requires 4 lowres pixels or so to change a color palette entry if I am not mistaken.
Not sure.

Quote:
For 256 colors it's a minimum of 4*256=1024 pixels, much more than 320 + the borders.
Yeah, I see what you mean.
alexh is online now  
Old 20 November 2007, 11:47   #31
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,332
I tried to look up what the maximum colours per line there were in the original Sliced-HAM. I think that SHAM modified the standard HAM6 base palette of 16 entries every line in LoRes, but I needed confirmation.

Why is there is so much re-iterated GARBAGE on the internet? Google for SHM S-HAM and Sliced HAM and you get 100's of pages quoting:

Quote:
SHAM (Sliced HAM), in which each raster line of the image could have its own 16-bit palette
Now forgive me if I am wrong but OCS/ECS had a 32 entry, 12-bit palette and AGA was 256 entry 24-bit palette! Where did this 16-bit palette come from?

Last edited by alexh; 20 November 2007 at 11:54.
alexh is online now  
Old 20 November 2007, 12:09   #32
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by alexh View Post
I tried to look up what the maximum colours per line there were in the original Sliced-HAM. I think that SHAM modified the standard HAM6 base palette of 16 entries every line in LoRes, but I needed confirmation.

Why is there is so much re-iterated GARBAGE on the internet? Google for SHM S-HAM and Sliced HAM and you get 100's of pages quoting:
What you've found on the net isn't garbage, it is true... for OCS/ECS. It simply doesn't apply for AGA.

Quote:
Originally Posted by alexh View Post
Now forgive me if I am wrong but OCS/ECS had a 32 entry, 12-bit palette and AGA was 256 entry 24-bit palette! Where did this 16-bit palette come from?
You're not wrong. The 16-bit palette comes from the fact that you have 16-bit hardware registers, not 12-bit. So a 16-bit palette is in fact a 12-bit palette.
meynaf is offline  
Old 20 November 2007, 13:50   #33
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,332
Quote:
Originally Posted by meynaf View Post
What you've found on the net isn't garbage, it is true...
Surely it is wrong? No Amiga had a 16-bit palette, so the statement "each raster line of the image could have its own 16-bit palette" is not true?

Quote:
The 16-bit palette comes from the fact that you have 16-bit hardware registers, not 12-bit. So a 16-bit palette is in fact a 12-bit palette.
Eh?
alexh is online now  
Old 20 November 2007, 14:04   #34
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by alexh View Post
Surely it is wrong? No Amiga had a 16-bit palette, so the statement "each raster line of the image could have its own 16-bit palette" is not true?


Eh?
The statement is true if you consider the amount of data moved, but false if you look at the actual color components.

So, a 16-bit color here is in fact a 12-bit one :
0000 rrrrr gggg bbbb (4 unused, 4 red, 4 green, 4 blue)

And not a 16-bit one like on a PC :
rrrr rggg gggb bbbb (0 unused, 5 red, 6 green, 5 blue)

I hope this is clear enough.
... what ? who said "no" ?
meynaf is offline  
Old 20 November 2007, 14:15   #35
StrategyGamer
Total Chaos AGA is fun!
 
Join Date: Jun 2005
Location: USA
Posts: 873
A500 has a 12-bit palette.
A1200 has a 24-bit palette.
StrategyGamer is offline  
Old 20 November 2007, 14:30   #36
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,332
Quote:
Originally Posted by meynaf View Post
So, a 16-bit color here is in fact a 12-bit one :
0000 rrrrr gggg bbbb (4 unused, 4 red, 4 green, 4 blue)
Ah! I see what you mean.

To me, regardless of the bus width, if a register only uses the 12 LSB's, then it is a 12-bit register.

I think the rest of the world also thinks like this

Last edited by alexh; 20 November 2007 at 14:37.
alexh is online now  
Old 20 November 2007, 14:41   #37
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
I've uploaded the files to the zone (C2P.zip). I hope you find it interesting.

I've had a little bit of trouble trying to assemble your source code, until I used the right version of PhxAss Took me way too many times to figure it out! On the other hand, the two versions on aminet support different directives (one handles basereg ok, the other doesn't). Also tried it in AsmOne1.49, and, geuss what, it didn't work. What's up with all the different assemblers trying to do the same thing in different ways? Seems crazy.

Anyway, now I can take a serious look at your code, and if I can't understand some of the French (should have paid more attention in French class 17 years ago), I'll just run it through a translator. These things can even handle Japanese! The fact that your code is a bit messy doesn't bother me at all, I just underestimated the French comments

You don't have to upload the jpeg part of vj, I've downloaded the ijg source. Now there is something to not look forward to getting your teeth into.

As for fastjpeg, what exactly does it do better? I didn't get it!
Thorham is online now  
Old 20 November 2007, 14:57   #38
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Small side note about the copper speed: I think it's eight lowres pixels per register in color modes up to 16 colors on aga and 8 colors on non-aga. Use more colors on screen and the copper becomes even slower, except in the border. On non-aga in 16 color hires, it's even worse, then you get to set sixteen colors per scan-line in the border, and for the visible part of the line the copper takes the whole line to set a register! I've once tried to do it with the cpu (68030), and I could still only change somewhere between 32 and 64 colors per scan-line.
Thorham is online now  
Old 20 November 2007, 15:06   #39
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by alexh View Post
Ah! I see what you mean.

To me, regardless of the bus width, if a register only uses the 12 LSB's, then it is a 12-bit register.

I think the rest of the world also thinks like this
I knew I didn't think like the rest of the world

But from this point of view it's a 13-bit register, as bit #15 is the transparency (for genlock video).
meynaf is offline  
Old 20 November 2007, 15:37   #40
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
I've uploaded the files to the zone (C2P.zip). I hope you find it interesting.
Oh yes gimme more c2p

Quote:
Originally Posted by Thorham View Post
I've had a little bit of trouble trying to assemble your source code, until I used the right version of PhxAss Took me way too many times to figure it out! On the other hand, the two versions on aminet support different directives (one handles basereg ok, the other doesn't). Also tried it in AsmOne1.49, and, geuss what, it didn't work. What's up with all the different assemblers trying to do the same thing in different ways? Seems crazy.
PhxAss tries to be compatible with the others, but I chose it mainly because it's able to optimize forward branches.

Quote:
Originally Posted by Thorham View Post
Anyway, now I can take a serious look at your code, and if I can't understand some of the French (should have paid more attention in French class 17 years ago), I'll just run it through a translator. These things can even handle Japanese! The fact that your code is a bit messy doesn't bother me at all, I just underestimated the French comments
For me it isn't messy at all. But it hasn't been done to be read by someone else
And I do not trust translators. You should really learn french

Quote:
Originally Posted by Thorham View Post
You don't have to upload the jpeg part of vj, I've downloaded the ijg source. Now there is something to not look forward to getting your teeth into.

As for fastjpeg, what exactly does it do better? I didn't get it!
FastJpeg has roughly the same quality, but it's faster.
Alternatively you can have a look at FastView. Even faster, but a lower quality.
But AFAIK neither of them handles progressive jpegs.

Quote:
Small side note about the copper speed: I think it's eight lowres pixels per register in color modes up to 16 colors on aga and 8 colors on non-aga. Use more colors on screen and the copper becomes even slower, except in the border. On non-aga in 16 color hires, it's even worse, then you get to set sixteen colors per scan-line in the border, and for the visible part of the line the copper takes the whole line to set a register! I've once tried to do it with the cpu (68030), and I could still only change somewhere between 32 and 64 colors per scan-line.
Anyway it's too slow. <sigh>
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
HAM8 screen question. Thorham Coders. General 28 04 April 2011 19:26
HAM8 C2P Hacking NovaCoder Coders. General 2 25 March 2010 10:37
Problem making ham8 icons. Thorham support.Apps 0 12 March 2008 22:30
Multiple HAM8 pictures? killergorilla support.Other 4 15 February 2007 14:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 14:06.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.12703 seconds with 14 queries