English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 26 November 2007, 14:53   #61
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I don't see the point for the div thing. You're supposed to average 3 pixels -> 1 to do 33%, not 9 pixels -> 1.
meynaf is offline  
Old 26 November 2007, 16:56   #62
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
The nine pixel average is for scaling down both the x axis and y axis to 33% in one go (for 1600x1200 in hq ham). You are indeed right in saying you only need to average three pixels for 33% on one axis.

The code I uploaded scales down to 33%x50% (six pixels), so that's good for seeing this principal in action. It's done by the 'SetupBmp' sub routine.

And remember, the divs have nothing to do with the number of pixels, just with the gun colors, so it's always three divs or shifts per pixel set.
Thorham is online now  
Old 26 November 2007, 17:31   #63
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Now that I know it was for 33%/33% scaling, things are much clearer !

But divides are basically slow. Exactly how much of the 60 frames (for a 800*600 image) are used by the downsampling ?
meynaf is offline  
Old 26 November 2007, 18:47   #64
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Yes, I know! Divs are terrible at 44 cycles And signed divs are even worse...

The down sampling takes 26 frames for 33%x50%, which isn't that bad. My c2p routine can be optimized, though, it's probably possible to strip off 30-40 instructions (first timer at the c2p business), so the 60 frames can even go down. Also the c2p always converts 1280xbmp height, so it can be optimized even more. In the end the down sampling is going to be a bit slower then the rendering.
Thorham is online now  
Old 27 November 2007, 00:11   #65
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
You can replace a division with a constant factor with a multiplication with the reciprocal. The basic idea is: instead of dividing by 9, multiply by 65536/9 and then shift down 16 steps. For more info, search on terms like "division by constant" and "reciprocal multiplication".

Also, given that you are (in this example) adding up to 9 terms in the range 0..255, the end result will be in the range 0..2295, so you could use a lookup table for the division-by-9.
Kalms is offline  
Old 27 November 2007, 10:16   #66
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Divs on a '060 are only 2 cycles

The multiply would lead you from 44 to 28 (in fact 32 because you'll have to swap).
A table look-up would cost 14 or so (and 1531 bytes for 33%x50%).

However if you can put a multiply right after a write to chipmem it'll only cost 5. Maybe you can use this to make the scaling+c2p in one pass ?

Also, if you consider using 25%x50% instead, shifting is 4 cycles.

When at home this week-end I'll check if I can do a 50%x50% with my method.
meynaf is offline  
Old 27 November 2007, 14:27   #67
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Cool guys Those are some mighty fine optimizations indeed.

Implementing 50%x50% for hq ham should be a cake walk for you meynaf! Doing the c2p and scaling in one pass would be really great, but it's out of the question for my current code , I can only hope you have more luck then me with that one. However, even with two passes, 50%x50% is already going to be pretty fast.

Can't wait to see the finished jpeg viewer, so long Visage

Edited: Tried the table look up for scaling 800x600 down to 33%x50%. The number of frames dropped from 60 to 50! Also, using a table means you can do a divide and a write to fast mem in one instruction. With divu, you first divide and then write to memory, effectively making it better then 14 cycles. Or is move.b (a0,d0.w),-(a1) slower then 14 cycles?

Last edited by Thorham; 27 November 2007 at 15:28. Reason: Update
Thorham is online now  
Old 27 November 2007, 17:02   #68
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Errhm... for the "finished" jpeg viewer, you might have to wait a little. There is still the jpeg decoding part to rewrite in asm

Doing this down-scale for my high-quality rendering is a piece of cake, well maybe, but I have to remember the position of the next line (forcing me to swap a reg).

Doing move.b (a0,d0.w),-(a1) is probably slower than 14 cycles. So you might try to put register-only instructions after it, just to use the pipeline...

Last edited by meynaf; 27 November 2007 at 17:05. Reason: oops
meynaf is offline  
Old 27 November 2007, 19:32   #69
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
Division on 060 is a bit slower than 2 cycles unfortunately (they are at 17 or so cycles). It's the multiplications that are very quick on that chip.

Anyway, you should pay some consideration to what sort of filter kernel you are using when you are downsampling your image.

Right now you are using a filter whose kernel is like this for a 33%/33% shrink:

1 1 1
1 1 1 / 9
1 1 1

The 3x3 numbers are weights (scale factors) for the corresponding pixel, and the "/ 9" is a global scale factor for the entire filter.
Or described in plain english: you take the average of the current pixel and its 8 neighbours, divide by 9, and use that as the final (downsampled) color value.

For better visual results you should use a filter shape where
A) the kernel is larger than 3x3 pixels and
B) pixels further away from the origin have less weight than pixels close to the origin. Signal processing theory and transform theory describes how to construct a "good" filter kernel.

Since you have performance concerns, you might want to test with:

1 2 1
2 4 2 / 16
1 2 1

You can filter using this kernel with just shifts & adds.
Kalms is offline  
Old 27 November 2007, 20:54   #70
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
Errhm... for the "finished" jpeg viewer, you might have to wait a little. There is still the jpeg decoding part to rewrite in asm

Doing this down-scale for my high-quality rendering is a piece of cake, well maybe, but I have to remember the position of the next line (forcing me to swap a reg).

Doing move.b (a0,d0.w),-(a1) is probably slower than 14 cycles. So you might try to put register-only instructions after it, just to use the pipeline...
Thanks for the advice, I'll check it out immediately
Gonna have another go at your ham rendering engine, but it's starting to look like it's impossible to get it any faster

Quote:
Originally Posted by Kalms View Post
Division on 060 is a bit slower than 2 cycles unfortunately (they are at 17 or so cycles). It's the multiplications that are very quick on that chip.

Anyway, you should pay some consideration to what sort of filter kernel you are using when you are downsampling your image.

Right now you are using a filter whose kernel is like this for a 33%/33% shrink:

1 1 1
1 1 1 / 9
1 1 1

The 3x3 numbers are weights (scale factors) for the corresponding pixel, and the "/ 9" is a global scale factor for the entire filter.
Or described in plain english: you take the average of the current pixel and its 8 neighbours, divide by 9, and use that as the final (downsampled) color value.

For better visual results you should use a filter shape where
A) the kernel is larger than 3x3 pixels and
B) pixels further away from the origin have less weight than pixels close to the origin. Signal processing theory and transform theory describes how to construct a "good" filter kernel.

Since you have performance concerns, you might want to test with:

1 2 1
2 4 2 / 16
1 2 1

You can filter using this kernel with just shifts & adds.
So, I guess for six pixels it would be something like this:

1 2 1
1 2 1 / 8

I don't want to sound ungrateful, but:

Simple averaging is already very good. I've tried your nine pixel idea, but the quality does not seem to improve (might just be some lameness on my part). Also, it's going to be slower then a divu table: I've tried this idea with six pixels (as above), which is one frame faster then a divu table (for a 1280x1024 bmp). The reason is, that you only need six extra adds to multiply two of the pixels by two. For nine pixels, it's quite different: Four pixels need multiplying by two, so thats 12 adds, and one needs multiplying by 4, thats three shifts. This is a lot more extra work then whats needed for six pixels.

Sorry for that

Last edited by Thorham; 28 November 2007 at 07:43. Reason: Correction.
Thorham is online now  
Old 27 November 2007, 21:12   #71
StrategyGamer
Total Chaos AGA is fun!
 
Join Date: Jun 2005
Location: USA
Posts: 873
If you are going for speed then just do
1 2 1 /4

That way you can get rid of that nasty divu
StrategyGamer is offline  
Old 27 November 2007, 21:22   #72
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
True, but for a 1280x1024 bmp scaled down to 33%x50% it only saves one frame compared to a divu table... The real optimizations can be done in the c2p department.

Last edited by Thorham; 28 November 2007 at 07:50.
Thorham is online now  
Old 28 November 2007, 10:09   #73
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Gonna have another go at your ham rendering engine, but it's starting to look like it's impossible to get it any faster
If you're looking at it, then I'll give you the latest modifications I did.
I used your 32-bit palette idea, pushing it a little bit further.

You may want to take a look at the attached file. Dunno if it works or even assemble (I might have broken something).

The basic idea is 32-bit palette entries, of which the 4th byte is color# *4.
This allowed me to remove the temporary variable on the stack !

I have put some repetitive code in macros to ease testing of different methods. That might have made the thing even more unreadable

To be faster if quality loss is acceptable, it is also possible to use a more regular palette, thus removing the need of a table to find the closest color.
You may activate it with the "quick" equ. Enjoy !

Last edited by meynaf; 12 May 2011 at 08:32.
meynaf is offline  
Old 28 November 2007, 14:43   #74
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Thanks for uploading

Have tried it immediately, and think there's a bug. I've also replied immediately after finding out, but maybe I can find it. It might be I haven't setup the table properly, also, while trying things myself, I've seen exactly the same render bugs. It's either my table or something in the code.

It did assemble the first time I tried it. A while ago, you said you were going try this in the weekend when you got home, does that mean you made the modifications without an amiga/uae? If so, then I can tell you the quick mode seems to be ok, and is indeed faster, about 34 frames I think. I have to admit that I'm using my amiga's composite output, which gives bad image quality, meaning I can't always see everything, but it seems ok! I'll try it on winuae today as well (I hate using uae for this, because the nr of frames is going to drop beneath realistic values, making it useless for speed testing).
Thorham is online now  
Old 28 November 2007, 15:20   #75
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
The quality seems good for the quick method ; however texts get blurred and you see green pixels at the left of important color changes.

And, yes, you're right : I made the modifications without amiga/uae
How does the bug look like ?
meynaf is offline  
Old 28 November 2007, 15:38   #76
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
In the test image I'm using, some parts which should be grayish, become a bit colored, and the ham fringing increases.

By the way, could you tell me what the two other data files are for? Currently I haven't even started to get into those parts, yet.
Thorham is online now  
Old 28 November 2007, 16:07   #77
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
The symptoms make me think of fixed-pixels error. To check this I would uncomment one of the two moveq,d0 to force fixed or relative pixels.
But I can't do this here at work

The two data files are simply for building the palette table (which gives the closest palette entry for each color).
meynaf is offline  
Old 29 November 2007, 00:52   #78
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
I've done some more optimizations, and got the number of frames down to 139.

Your code is fine, by the way. I made my own palette table without realizing your code generates it's own palette table, and screwed up completely

The optimizations are in the attached file, and are commented. They should be easy to spot because of the formatting. They're based on the version I already had, as I know that one better then the new one.

ham8.zip
Thorham is online now  
Old 29 November 2007, 11:13   #79
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
Are you displaying the SHRES screen while performing the HAM8 conversion? A 1280x512 SHRES screen eats 85% of the chipram bandwidth.

If so, here are three things that might be worth considering:

* don't display the image until you have finished conversion
* display the image in HAM6 until conversion finishes (this should take approx 30% less chipram bandwidth), if HAM6 is supported in SHRES (not sure)
* adjust the bottom end of the display window (DIWSTOP) each frame such that you don't show any region of the screen which has not finished processing yet
Kalms is offline  
Old 29 November 2007, 13:15   #80
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I had a quick look at the code you posted.
Some remarks about it :
- I saw both moveq.b and moveq.l - as moveq has no size, giving it one could be misleading
- move.b (a6),d0 followed by move.b d0,(a1)+ can be replaced by move.b (a6),(a1)+
- is move.l (sp),a5 really faster than move.l #adr,a5 ?

For SHRES display, it's easy to open an intuition screen in the background, and bring it to top once finished. My actual viewer already has the option to do this.
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
HAM8 screen question. Thorham Coders. General 28 04 April 2011 19:26
HAM8 C2P Hacking NovaCoder Coders. General 2 25 March 2010 10:37
Problem making ham8 icons. Thorham support.Apps 0 12 March 2008 22:30
Multiple HAM8 pictures? killergorilla support.Other 4 15 February 2007 14:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:53.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11519 seconds with 14 queries