![]() |
![]() |
#101 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
![]() Quote:
Quote:
![]() Anyway the gain on 020/030 would not be enormous, provided you'll gain anything at all. If you think you can really get a good deal of performance like that then I want to see it. Then again, I've just disassembled too much code with holes in it. ![]() I remember having seen code where each function was aligned to long boundaries... or wanted to be so. The linker apparently didn't respect this and all code ended up unaligned ![]() ... and, of course, completely unreadable. Quote:
The big archive is no problem, I've got room on my CF card ![]() EDIT: thanks a lot for your pics Thorham. They made me definitively validate my little "vbrb" code, that you can activate or not via an equate in the beginning of the source. Just inactive it, then display your 1024x768 image (use bmptoppm to convert it) : scroll it to the right at its maximum and look. Now do it again with the equ reactivated... Last edited by meynaf; 03 December 2007 at 12:17. |
||||
![]() |
![]() |
#102 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
![]() ![]() Quote:
Quote:
Quote:
Quote:
Ah, now I've got a good reason to test the software under winuae. I'm using my miggy's composite output with a video/svideo to vga converter which refuses to display max overscan properly. Winuae does this properly, I believe, so I'll check it out. |
|||||
![]() |
![]() |
#103 |
Global Moderator
Join Date: Nov 2001
Location: Derby, UK
Age: 48
Posts: 9,355
|
|
![]() |
![]() |
#104 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
![]() Quote:
Quote:
Alternatively you can use the viewer I wrote, it supports ham pictures ![]() Quote:
And you'll see that, without special treatment, the ham fringing nearly reaches half of the screen ! If you simply want a viewer which does scaling in order to make the image fit on the screen, then John Hendrikx already did the job with fastview (see attachment). So long with Visage ! ![]() Last edited by meynaf; 12 May 2011 at 08:32. |
||||
![]() |
![]() |
#105 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
![]() I've included a crude command line version of my 24 bit bmp viewer here so you can get an idea of what I want for bmp viewing. Just use it with the test pictures, in particular the 1280x1024 one, and copy them to the ram disk to get the best speed impression. Check it out (if you haven't already). By the way, this time it's only the executable. ShowBmp24.zip |
|
![]() |
![]() |
#106 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
![]() Quote:
Your bmpview.s assembled correctly with phxass (with the "case" argument because there are some lowercase/uppercase mismatches). However, for buffer storage you should really use a bss section and the ds (not dcb) directive : your code is 80 kb ! Anyway, it worked ok. You may want to test my 50%/50% quick rendering. See attached file. The 800x600 -> 400x300 image is done in 31 frames. Now you've got some work to beat it. ![]() As I still don't display bmp's, you will have to use bmptoppm before (if you didn't do it already). Also, the goal was speed, not quality. The code for high-quality is present in the source given here but untested and probably doesn't work (as the low-qual one didn't when I first tried it). Note that a modified version of ham8_test.s is necessary for that. Last edited by meynaf; 12 May 2011 at 08:32. |
||
![]() |
![]() |
#107 | |||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
Quote:
Quote:
![]() Haven't tried the hq code, yet. We'll see if it works or not, I'll keep you posted as I have access to my miggy all week. |
|||
![]() |
![]() |
#108 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
![]() Curiously the case thingy annoys me much less than the arrays of null bytes in the executable ![]() If you like optimisations, even the smallest ones, then you have the option here to reduce your exe loading time ![]() Quote:
![]() ![]() For the source you won't see much difference from what you knew - apart from the scale. However, there *was* a possible optimization, and when I spotted it, I thought : oooooooops... Indeed... cmp.l a5,d6 is faster than cmp.l d6,a5 because the latter is cmpa, not cmp ! (4 cycles vs 2) I have modified my code to take this into account. 3 cmpa became regular cmp (and bhs became bls). Not much effect though (3 frames or so in 1024x768). Finally, I did my speed testings this week-end. Here are the results. Test #1 Code:
move.l (a7),a5 ; 4 clock cycles if dcache active, 8 else vs move.l #adr,a5 ; 6 clock cycles On 030 you gain 2, but you lose 2 on 020. Clock cycles are more important @14mhz than @50mhz. Sorry pal, but I won't put your optim in my version because of this. Test #2 Code:
move.l #adr,a5 ; 6 cycles move.l (a5,d4.l),a6 ; 11 cycles (7 if in dcache) vs move.l (adr.l,d4.l),a6 ; 17 cycles (14 if in dcache) And take care, as you must write : Code:
move.l (adr.l,d4.l),a6 Code:
move.l (adr,d4.l),a6 ![]() |
|||
![]() |
![]() |
#109 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
And, yes, I like optimizations. Frankly I didn't know about bss sections. Asmone should be able to handle them, though, and if it does, I'll start using them, saves you some allocs in the code. Quote:
![]() ![]() ![]() Quote:
Quote:
![]() Quote:
![]() |
|||||
![]() |
![]() |
#110 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
No big deal for me either, because nearly all my labels are in lower case ![]() Wanna get thousands of lines of code to optimize ? I might have something for you then ![]() Quote:
![]() Quote:
![]() However, would you make an optimization which uses 240k of memory, just for gaining 0.02 seconds at best (unnoticeable) for an (unscaled) 1280x1024 image ? Quote:
![]() Quote:
![]() Hopefully it didn't call the guru. The bad thing about this is that the bug is clearly documented in phxass docs ![]() |
|||||
![]() |
![]() |
#111 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
![]() |
![]() |
#112 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
I knew you'd bite
![]() It has nothing to do with the actual subject, so it will require to open a new thread. The basics is that I re-sourced the (no longer maintained) mpega.library to get more speed (-> more quality) out of it. I gained 10% or so ; still unsatisfying, and a lot of code hasn't been touched. The job looks like jpeg decoding because of the dct and all that sort of things. Interested ? Quote:
But please compute the gain on a 400x300 image, then count the time taken to fulfill your 256k buffer... (I can give you the results if you feel lazy) Quote:
Quote:
![]() |
|||
![]() |
![]() |
#113 | ||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
#114 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
I'll open a thread as soon as I can upload the code.
Hint : 8 clock cycles per write ; *16 (because you write the same thing 16 times), *4096 (number of colors to write). Just compare this to 4 clock cycles per pixel. Quote:
![]() However I still suggest you try to press the "2" key (not numpad) while booting. And I think Vesalia Computer still sells A1200 keyboards. Quote:
![]() |
||
![]() |
![]() |
#115 | ||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
![]() Quote:
Quote:
![]() I checked Vesalia and they do still have them, but at almost 20 bucks I'm not buying any unless I'm going to fully restore my a1200 (it's a bit messy, at the moment, but electronically in great shape), and that's not happening any time soon, I'm afraid. By the way: If you want a brand new a1200, then amikit has them (old stock of new amigas) ![]() Quote:
|
||||
![]() |
![]() |
#116 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
You don't give up, do you ?
![]() But how are you going to make a row of 16 move.l a6,(a1)+ go faster ? You, surely not. But your miga ? Seeing itself looking like a dumb peecee ? Seeing that is has - horror ! - windows keys ??? ![]() Quote:
Alternatively you could look for an A4k keyboard... Quote:
![]() Quote:
Quote:
Else we can make a new, open-sourced, asm ![]() Hmmm... it looks like if we're a bit off-topic right now. I think I can go with my renderer like it is - thanks for your help. ... or did you spot some further opt. that we can do ? After we're done with the rescaling, the next step for me is the jpeg decoder ; this may require to open another thread. |
||||
![]() |
![]() |
#117 | ||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
First, I'm sorry for not doing what I said I would do
![]() Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Filling the larger table actually doesn't make the program slower. Since your bench program takes everything in account, including file reading, and the number of frames still went down, I'd say that speed-wise it's no problem, but I can understand that you don't like the extra memory overhead. Edit: I've tested pre-calculating the palette table addresses, but unless I'm doing something wrong, this actually made it slower again, so I changed it back to indexes (64kb for 4069x16). Changing the cmpa to cmp does make a difference here, the number of frames for 800x600 is now 136! Since it was 155, it's now 22% faster. Last edited by Thorham; 05 December 2007 at 12:46. |
||||||||
![]() |
![]() |
#118 | ||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
![]() ![]() ![]() (sorry... it was just too easy...) Quote:
Quote:
Quote:
![]() Quote:
![]() You still can... Quote:
Quote:
Here are my actual values : 500x333 : old=54, new=50, quick=33 800x600 : old=145, new=135, quick=86 1024x768 : old=234, new=217, quick=137 Quote:
Quote:
![]() EDIT: Quote:
I'm @135 frames right now. Look at my code for the palette table addresses. But maybe this is incompatible with the 4096*16 writes (because you end up with 256 kb). Last edited by meynaf; 05 December 2007 at 13:53. Reason: answer to an edit... |
||||||||||
![]() |
![]() |
#119 | |||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
![]() Quote:
Quote:
![]() Quote:
![]() ![]() ![]() Quote:
Quote:
Quote:
Anyway, I've found another optimization: move.b (a6)+,d4 move.l d1,d0 sub.l d4,d0 bpl.s .n0 neg.b d0 Can be replaced with: move.l d1,d0 sub.b (a6)+,d0 bpl.s .n0 neg.b d0 This brings down the frame count to 133 for 800x600. By the way, the ham table is 64kb! |
|||||||
![]() |
![]() |
#120 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
![]() Quote:
![]() Quote:
Quote:
EDIT: it *can* be done ! It may overflow a signed byte, but not an unsigned one. So it must be written that way : (same but with bcc instead of bpl) move.l d1,d0 sub.b (a6)+,d0 bcc.s .n0 neg.b d0 Thanks for it. I wouldn't have believed something could still be done ! Last edited by meynaf; 05 December 2007 at 15:18. |
|||||
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
HAM8 screen question. | Thorham | Coders. General | 28 | 04 April 2011 19:26 |
HAM8 C2P Hacking | NovaCoder | Coders. General | 2 | 25 March 2010 10:37 |
Problem making ham8 icons. | Thorham | support.Apps | 0 | 12 March 2008 22:30 |
Multiple HAM8 pictures? | killergorilla | support.Other | 4 | 15 February 2007 14:41 |
|
|