12 January 2014, 17:00 | #1 |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
Most optimized Atari ST to Amiga real time screen converter
Right, so i've been tackling this two different ways, the first routine I did was fast enough, but it was more optimized for size, the second way, I did away with optimized size and went for reducing the routines to only the most essential functions without the need for decrementing a counter and use of a bne.
So heres my first routine thats optimized for size: Process_game_screen: movem.l d0/a0-a4,-(a7) move.l videobase(pc),a0 ;Base address of Atari ST screen lea Amiga_screen,a1 ;Base address of Amiga Screen move.l #$1f40,d0 ; Size of bitplane move.l a1,a2 add.l d0,a2 move.l a2,a3 add.l d0,a3 move.l a3,a4 add.l d0,a4 loop_until_copied: move.w (a0)+,(a1)+ move.w (a0)+,(a2)+ move.w (a0)+,(a3)+ move.w (a0)+,(a4)+ subq.l #2,d0 bne.s loop_until_copied movem.l (a7)+,d0/a0-a4 rts However, its not a great routine because that tight loop of moves to address registers is repeated 4000 times! So I thought that if I removed the subq.l #2,d0 and the bne, that would make it slightly quicker, obviously removing those means I have to now repeat that tight loop 4000 times instead, but obviously if I do that, i'm also not repeating the subq.l and the bne 4000 times either. Clearly, that leads to a massive routine, but I have memory I need in extra memory so thats not an issue. So, can anyone else see any better ways of doing this which will lead to a faster routine? Please note, i'm not looking for coding elegance, i'm looking to see if my routine can be significantly, or even slightly speeded up, because I have only tested Where Time Stood Still on an emulated A500, I have no clue as to whether or not on a physical machine it will be exactly the same. If it is the same, then it runs at an acceptable speed, but any improvements would be welcome. |
12 January 2014, 17:55 | #2 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
Add.w and subq.w is fastest for 68000 than add.l and subq.l. Anyway fastest than subq.w and bne.b is simple dbf. |
|
12 January 2014, 18:00 | #3 | |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
Quote:
Just a shame the blitter can't be used to any degree here Last edited by Galahad/FLT; 12 January 2014 at 18:47. |
|
12 January 2014, 19:59 | #4 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
|
|
12 January 2014, 20:29 | #5 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@Galahad/FLT
You can check this routine. I didn't test it yet, but should works - I will do some tests today evening. Code:
move.l #amount,A5 ;Damn I' matmematician and I will calc this today evening :) .loop movem.l (A0)+,D0-D7 ;D0 - 0 and 1 ;D1 - 2 and 3 ;D2 - 0 and 1 ;D3 - 2 and 3 ;D4 - 0 and 1 ;D5 - 2 and 3 ;D6 - 0 and 1 ;d7 - 2 and 3 movem.w D0/D2/d4/D6,(A2) swap D0 swap D2 swap D4 swap D6 movem.w D0/D2/D4/D6,(A1) addq.l #8,A1 addq.l #8,A2 movem.w D1/D3/D5/D7,(A4) swap D1 swap D3 swap D5 swap D7 movem.w D1/D3/D5/D7,(A3) addq.l #8,A3 addq.l #8,A4 subq.w #1,A5 bne .loop |
12 January 2014, 20:42 | #6 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
subq.w #1,A5 bne .loop |
|
12 January 2014, 21:11 | #7 | |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
Quote:
|
|
12 January 2014, 21:21 | #8 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@Don_Adan
Right. Thanks. @Galahad/FLT I have another idea to use blitter to copy one bitplan but for sure I will first check it . |
12 January 2014, 21:36 | #9 |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
|
12 January 2014, 22:02 | #10 | |
Registered User
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
|
Quote:
Example from top of head.. no warranties as I did not think too much of this Use A and D channels. A modulo 6, D modulo to 0, A to ST fb word 0 and D to amiga plane 0, start with the size width 1 height 1024 for plane 0. In the blitter interrupt just restart blitter until plane 0 has been copied. Then move to plane 1 etc.. Just a thought. |
|
13 January 2014, 08:06 | #11 |
Zone Friend
Join Date: May 2006
Location: France
Posts: 1,801
|
|
13 January 2014, 11:36 | #12 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
|
|
13 January 2014, 21:42 | #13 | ||
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
Quote:
Unfortunately, because of the weird way that the Atari ST displays its graphics, its not possible to simply do a straight copy which is what that MacPaint example uses. Quote:
|
||
13 January 2014, 22:59 | #14 |
Registered User
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
|
Check your PM.
|
13 January 2014, 23:10 | #15 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@Galahad/FLT
I did some tests and there is - it uses mr.spiv method ( thanks a lot mr.spiv ) plus copy. It must be called twice, or use copy paste method. (For tests I use degas picture from Rolling Thunder - LOADER.PI1 - I still have hope that some day I will so angry and I will convert this game as should be ). So for sure you must adapt some things and some things can be optimized. Code:
;use WAITBLITTER somewhere on the begining of the program move.w #6,bltamod(a5) move.w #0,bltdmod(a5) move.l #$09f00000,bltcon0(a5) move.l #$ffffffff,bltafwm(a5) lea degas+34,a0 move.l screen(a6),a1 bsr CopySt lea degas+34+4,a0 move.l screen(a6),a1 add.l #$1f40*2,a1 bsr CopySt Code:
CopySt: move.l a0,bltapt(a5) move.l a1,bltdpt(a5) move.w #0*64+1,bltsize(a5) ;1024 height move.l #$1f40,D0 move.l a1,a2 add.l d0,a2 lea 2(a0),a3 move.w #$1f40/8-1,D1 .1 move.w (a3),(a2)+ addq.l #8,a3 dbf D1,.1 WAITBLITTER move.w #0*64+1,bltsize(a5) ;1024 height move.w #$1f40/8-1,D1 .2 move.w (a3),(a2)+ addq.l #8,a3 dbf d1,.2 WAITBLITTER move.w #0*64+1,bltsize(a5) ;1024 height move.w #$1f40/8-1,D1 .3 move.w (a3),(a2)+ addq.l #8,a3 dbf d1,.3 WAITBLITTER move.w #928*64+1,bltsize(a5) move.w #$1f40/8-1,D1 .4 move.w (a3),(a2)+ addq.l #8,a3 dbf d1,.4 rts |
13 January 2014, 23:25 | #16 |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
|
for this thread
|
13 January 2014, 23:37 | #17 |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
|
@Asman, great work dude, its definately faster, but the last few lines are missing from the bottom of the screen as if a couple of bitplanes haven't been written properly, will check that i've actually copied your code properly!
EDIT: Right, a typo on my part. I've got a feeling that the CPU routine you wrote before was quicker with the movem.w instructions, because, i'm getting a flickering when moving which I didn't have before, and i'm not so sure its quicker. Will have to do more testing to see. Last edited by Galahad/FLT; 14 January 2014 at 01:26. |
14 January 2014, 18:36 | #18 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Hm.... my next idea was to use blitter but this attempt is slower then previous one. I use blitter and for channels to speed up previous blitter copy (longword instead word). It uses operation D = A + BC, a with mask $ffff0000 and C contains mask $0000ffff.
Code:
lea maskC,a3 lea degas+34,a0 lea 6(a0),a1 move.l screen(a6),a2 WAITBLITTER move.w #12,bltamod(a5) move.w #12,bltbmod(a5) move.w #-4,bltcmod(a5) move.w #0,bltdmod(a5) move.l #$0df80000,bltcon0(a5) move.l #$ffff0000,bltafwm(a5) move.l a0,bltapt(a5) move.l a1,bltbpt(a5) move.l a3,bltcpt(a5) move.l a2,bltdpt(a5) move.w #0*64+2,bltsize(a5) ;1024 longwords rts ;must be located in CHIP maskC: dc.w 0,-1 |
14 January 2014, 20:07 | #19 | |
Registered User
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
|
I would, as originally hinted, chain blitts using blitter interrupt. Since we are only using two channels I would blitt two planes with blitter and once that has started do the other two using CPU. Then you do not need to have blitter waits between CPU passes.
Quote:
|
|
14 January 2014, 21:18 | #20 |
Registered User
Join Date: Jul 2009
Location: Lala Land
Posts: 520
|
Has anyone been measuring the timings for these, and can they give them?
Should also note that this thread is some top shit. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Amiga Juggler real-time reimplementation? | Mequa | Amiga scene | 10 | 29 May 2023 16:12 |
Amiga Real-Time 3D Graphics | Jherek Carnelia | Coders. Tutorials | 14 | 13 April 2023 00:01 |
WTB: Amiga Real-Time 3d graphics | Fridrik | MarketPlace | 0 | 27 September 2012 01:53 |
Wanted - Amiga Real-Time 3D Graphics book | michel3105 | MarketPlace | 0 | 02 September 2011 08:29 |
F/S: Vidi Amiga 24-bit real time colour digitiser | John64 | MarketPlace | 4 | 06 June 2009 18:47 |
|
|