English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 20 November 2007, 15:43   #41
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,507
Quote:
Originally Posted by meynaf View Post
I knew I didn't think like the rest of the world

But from this point of view it's a 13-bit register, as bit #15 is the transparency (for genlock video).
No, it is 12.5-bit register because genlock-bit only exists when BPLCON3 LOCT=0
Toni Wilen is offline  
Old 20 November 2007, 15:53   #42
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Toni Wilen View Post
No, it is 12.5-bit register because genlock-bit only exists when BPLCON3 LOCT=0
meynaf is offline  
Old 21 November 2007, 10:04   #43
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
I've been looking at the ham8 test code. So far I've only been able to strip off about 10 frames (for an 800x600 picture), and that's only because you can chop of bits, instead of rounding them. Also, using a 32bit palette table (currently the code reads the original table for the library, too) allows you to strip off one whole instruction (wow).

I know this isn't much yet...

As for the translators: The back draw is that they can't understand the context, but in this case they're quite help full (better then nothing), and of course learning a language is always best!

I'll see if I can find some more optimizations (tough).
Thorham is offline  
Old 21 November 2007, 10:42   #44
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Could you please post here what you exactly did ? (or MP it to me ?)
meynaf is offline  
Old 21 November 2007, 11:05   #45
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
Of course, here goes (for the ham8.s file):

I replaced the rounding code with this:

moveq.l #4,d0
neg.b d0

move.b (a0)+,d1
and.l d0,d1
move.b (a0)+,d2
and.l d0,d2
move.b (a0)+,d3
and.l d0,d3

And I changed this:

lea zv_h8pal(pc),a6
add.l d4,a6
add.l d4,d4
add.l d4,a6

To this:

lea Palette32(pc),a6
lsl.l #2,d4
add.l d4,a6

Where Palette32 points to the palette table with the following format:

dc.b value,value,value,0

Where value is always the original value.

As said the library still uses the original format, so both tables are loaded, which doesn't seem to slow anything down.
Thorham is offline  
Old 21 November 2007, 11:17   #46
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I dunno on which cpu you're testing, but on a 030, this :

Code:
lea Palette32(pc),a6
lsl.l #2,d4
add.l d4,a6
isn't faster than that :
Code:
lea zv_h8pal(pc),a6
add.l d4,a6
add.l d4,d4
add.l d4,a6
because of the lsl taking 4 clock cycles (adds take only 2).

Also, I just noticed that high part of d1-d2-d3 stays 0 all long, so 3 moveq #0 outside of the loop, and moveq #$fc / and.b should do the trick.
meynaf is offline  
Old 21 November 2007, 11:39   #47
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
I'm testing on a 68030/50mhz. I didn't know about lsl being twice as slow as add The main reason for doing this is that I was trying to do some further optimization by reading a whole long word (even have a third version of the palette table for that) instead of reading three bytes separately, but I found out it didn't work in this case, because you need two swaps. Also, this got me in trouble further down the source, where the table is read again.

Edited:
By the way, I don't have a good doc on instruction timings. Any suggestions? I only just found out rol can be slower then reading bytes from fast mem, for example.

Last edited by Thorham; 21 November 2007 at 12:02.
Thorham is offline  
Old 21 November 2007, 13:13   #48
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I think I'll boot up my miga on this saturday to check your 32-bit palette idea.
Results on monday here, maybe a new version ?

For the timings Flint/Darkness did a great job with his guide :
http://aminet.net/package/dev/asm/mc680x0
Timings when accessing memory can vary, however on my configuration (030/50 with 60ns EDO ram) you need 8 cycles for a fastmem access, and 26 for a chipmem access. Both can be pipelined if they are writes.

I wonder if we should not continue this by MP, since few people here will understand what we're talking about (who has looked into the code ?)...
meynaf is offline  
Old 21 November 2007, 14:08   #49
StrategyGamer
Total Chaos AGA is fun!
 
Join Date: Jun 2005
Location: USA
Posts: 873
On 060 this is faster because first 2 instructions execute simultaneously:
Code:
lea Palette32(pc),a6
lsl.l #2,d4
add.l d4,a6
In this version all instructions depend on output of previous instruction. So none of them can dual execute on 060.
Code:
lea zv_h8pal(pc),a6
add.l d4,a6
add.l d4,d4
add.l d4,a6
StrategyGamer is offline  
Old 21 November 2007, 14:25   #50
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by StrategyGamer View Post
On 060 this is faster because first 2 instructions execute simultaneously:
Code:
lea Palette32(pc),a6
lsl.l #2,d4
add.l d4,a6
In this version all instructions depend on output of previous instruction. So none of them can dual execute on 060.
Code:
lea zv_h8pal(pc),a6
add.l d4,a6
add.l d4,d4
add.l d4,a6
And what for the 040 ?
Do someone here have 040/060 timings ?
meynaf is offline  
Old 22 November 2007, 13:41   #51
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
Yes, we could keep each other posted through pm. On the other hand, people are still replying, so there seems to be some interest in the topic. Just let me know what you prefer.
Thorham is offline  
Old 22 November 2007, 13:52   #52
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
What I prefer is the forum for whatever can be understood by other ppl, and pm for the rest. Staying here doesn't bother me, but I thought about those who will read us and not understand a thing because they didn't see the code
meynaf is offline  
Old 22 November 2007, 13:53   #53
Spellcoder
Spellcoder
 
Spellcoder's Avatar
 
Join Date: Aug 2006
Location: The Netherlands
Age: 44
Posts: 27
Although I'm not familiar with the techniques and with calculating instructing timings, I find it interesting to read about the techniques and optimizing. So please, do continue .
Those who don't understand/care should just not read the thread .
Spellcoder is offline  
Old 22 November 2007, 17:46   #54
Wepl
Moderator
 
Wepl's Avatar
 
Join Date: Nov 2001
Location: Germany
Posts: 866
Quote:
Originally Posted by meynaf View Post
And what for the 040 ?
Do someone here have 040/060 timings ?
Some time ago the cpu manuals could be downloaded as pdf from the web. I also got the books printed for free from mot years ago.
In the manuals you will find the instructions times. But especially on 40/60 they depend on *.
The 40 is not superscalar and has only one integer unit.
It is most times impossible to write code which is the fastest on all cpus. e.g. avoid pc-relative on 40 but its no problem on 30/60.
The timing also depends on previous instruction because of the pipeline and alignment.
Wepl is offline  
Old 23 November 2007, 15:12   #55
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
The MC680x0 manuals can be downloaded from www.freescale.com

Here's the link:

http://www.freescale.com/webapp/sear...Order=default&

The manuals cover the full 680x0 family
Thorham is offline  
Old 23 November 2007, 15:45   #56
BippyM
Global Moderator
 
BippyM's Avatar
 
Join Date: Nov 2001
Location: Derby, UK
Age: 48
Posts: 9,355
Quote:
Originally Posted by Thorham View Post
The MC680x0 manuals can be downloaded from www.freescale.com

Here's the link:

http://www.freescale.com/webapp/sear...Order=default&

The manuals cover the full 680x0 family
I have grabbed all the relevant manuals and 7zipped..

There are 51mb worth zipped, over 8 files.. anyone wants lemme know and I'll zone it all
BippyM is offline  
Old 23 November 2007, 16:05   #57
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
For me it's too late : I grabbed them already.
meynaf is offline  
Old 24 November 2007, 08:27   #58
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
To get back to the topic, I made a simple bmp viewer in assembler which uses the 3x1 'screen mode'. It scales down (sorry) an 800x600 24bit image to 33%x50% and displays the image in 60 frames. The program does kill the os, and thus lacks an intuition screen (pure metal banging).

The 60 frames include scaling and drawing, but exclude reading the bmp file and memory allocation. Further more, the c2p currently draws 1280xscaled height.

It turns out the scaling/3x1 mode combination is less expensive then a high quality ham rendering engine, and yours is definitely high quality (the only way to improve it is to calculate the base palette for the whole image, maybe an idea for a super high quality mode).

Since you showed interest in the 3x1 screen for a quick and dirty mode, I thought I'd write a proper test program. If you want, I can put it in the zone, including the right include files, and AsmOne (if you don't have it), as you'll need that to do the speed test.

Another advantage of this quick and dirty stuff is that any image up to 1280x1024 fits on the screen completely. However, it's definitely not a replacement for the hq code you already have.

Have edited this reply far too many times now! One last go:

One more thing about scaling: It would be a great option. If someone doesn't want it, they don't have to use it, while if they do, it's there! The reason I'm mentioning the whole scaling thing again? Have you considered images which are way to large to fit on the screen, such as 1280x1024 jpegs, or worse, 1600x1200? For your hq mode 1280x1024 can be scaled down to 640x512 by averaging 4 pixels, that means you can shift and don't have to use divs. Also, consider the chip mem needed for 1280x1024, it's a whopping 1.2 megs! Scaled down it's only 320kb, and it fits the screen snugly! For 1600x1200 you can scale down to 33%x33%, now you do need divs, but since it can be done in one go, you have three divs for nine pixels, which seems pretty good to me. And for 1600x1200 you have to scale, as it takes up 1.9 megs, simply making it impossible to display this directly. Scaling like this also has the advantage of less ham pixels to render, so the scaling isn't much of an issue anyway!

It's these kind of features I've always wanted to see in jpeg viewers, but never found them. Just thought to let you know.

Last edited by Thorham; 26 November 2007 at 09:11. Reason: Added some text, and corrected mistakes.
Thorham is offline  
Old 26 November 2007, 11:27   #59
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Yes, you can upload it - with the bmp file if possible.
I'm sure I have asmone somewhere, but I'm also sure that I'll try to change the source to make it assemblable by phxass first

Of course scaling of enormous pics is useful, and it's amongst the planned features of my viewer - though not on the urgent todolist.
I intended to do it by averaging 50%/50% (hi-q mode) or simple pixel skipping (fast mode). Alternatively the jpeg library has code for down-sampled rendering, which could be even faster and get the best quality.

Speaking of quality, I don't believe that a palette adaptation is much better (however I'm sure it will be much slower ). I compared my rendering to other viewers which do that and didn't see a real diff.
Anyway how to find out what the "best" colors are ?

EDIT: how can you possibly average 9 pixels in 3 divs ???

Last edited by meynaf; 26 November 2007 at 11:33.
meynaf is offline  
Old 26 November 2007, 12:37   #60
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,753
The files will be in the zone today, including some test bmps.

Palette adaption only improves the image quality if the colors are chosen just right. Adpro is the only program I know of which does this properly. Choosing the best colors is probably done by analyzing the image first to determine which pixels would generate the largest errors when rendered in ham. Then you probably just have to pick the 64 pixels with the highest error ratio, and use their values in the palette. However, I am not very sure about this, it's more of an educated guess. I'll also include an Adpro rendering, as this program probably does the best quality ham rendering.

Taking the average of nine pixels can be done with three divs like so:

move.b (a0)+,d3
add.l d3,d0 ;Red
move.b (a0)+,d3
add.l d3,d1 ;Green
move.b (a0)+,d3
add.l d3,d2 ;Blue

This code is then applied to all nine pixels, and when that is done you need one div per gun color, hence three divs only. Just make sure the upper 24 bits of d3 are cleared before the loop, and make sure d0/d1/d2 get cleared for each set of pixels. This principle works for any number of pixels which can be averaged in one go, so if you want to scale down to 25%x25% (16 pixels) you'd still only need three divs, or in this case shifts.

I still recommend AsmOne, for one reason, and thats the speed test, which stores the number of frames in a memory location, and asm has a nice mem viewer. But you don't really need to.
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
HAM8 screen question. Thorham Coders. General 28 04 April 2011 19:26
HAM8 C2P Hacking NovaCoder Coders. General 2 25 March 2010 10:37
Problem making ham8 icons. Thorham support.Apps 0 12 March 2008 22:30
Multiple HAM8 pictures? killergorilla support.Other 4 15 February 2007 14:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:58.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10936 seconds with 14 queries