English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 12 January 2014, 17:00   #1
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
Most optimized Atari ST to Amiga real time screen converter

Right, so i've been tackling this two different ways, the first routine I did was fast enough, but it was more optimized for size, the second way, I did away with optimized size and went for reducing the routines to only the most essential functions without the need for decrementing a counter and use of a bne.

So heres my first routine thats optimized for size:

Process_game_screen:

movem.l d0/a0-a4,-(a7)
move.l videobase(pc),a0 ;Base address of Atari ST screen
lea Amiga_screen,a1 ;Base address of Amiga Screen
move.l #$1f40,d0 ; Size of bitplane
move.l a1,a2
add.l d0,a2
move.l a2,a3
add.l d0,a3
move.l a3,a4
add.l d0,a4
loop_until_copied:
move.w (a0)+,(a1)+
move.w (a0)+,(a2)+
move.w (a0)+,(a3)+
move.w (a0)+,(a4)+
subq.l #2,d0
bne.s loop_until_copied
movem.l (a7)+,d0/a0-a4
rts

However, its not a great routine because that tight loop of moves to address registers is repeated 4000 times!

So I thought that if I removed the subq.l #2,d0 and the bne, that would make it slightly quicker, obviously removing those means I have to now repeat that tight loop 4000 times instead, but obviously if I do that, i'm also not repeating the subq.l and the bne 4000 times either.

Clearly, that leads to a massive routine, but I have memory I need in extra memory so thats not an issue.

So, can anyone else see any better ways of doing this which will lead to a faster routine?

Please note, i'm not looking for coding elegance, i'm looking to see if my routine can be significantly, or even slightly speeded up, because I have only tested Where Time Stood Still on an emulated A500, I have no clue as to whether or not on a physical machine it will be exactly the same.

If it is the same, then it runs at an acceptable speed, but any improvements would be welcome.
Galahad/FLT is offline  
Old 12 January 2014, 17:55   #2
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Quote:
Originally Posted by Galahad/FLT View Post
Right, so i've been tackling this two different ways, the first routine I did was fast enough, but it was more optimized for size, the second way, I did away with optimized size and went for reducing the routines to only the most essential functions without the need for decrementing a counter and use of a bne.

So heres my first routine thats optimized for size:

Process_game_screen:

movem.l d0/a0-a4,-(a7)
move.l videobase(pc),a0 ;Base address of Atari ST screen
lea Amiga_screen,a1 ;Base address of Amiga Screen
move.l #$1f40,d0 ; Size of bitplane
move.l a1,a2
add.l d0,a2
move.l a2,a3
add.l d0,a3
move.l a3,a4
add.l d0,a4
loop_until_copied:
move.w (a0)+,(a1)+
move.w (a0)+,(a2)+
move.w (a0)+,(a3)+
move.w (a0)+,(a4)+
subq.l #2,d0
bne.s loop_until_copied
movem.l (a7)+,d0/a0-a4
rts

However, its not a great routine because that tight loop of moves to address registers is repeated 4000 times!

So I thought that if I removed the subq.l #2,d0 and the bne, that would make it slightly quicker, obviously removing those means I have to now repeat that tight loop 4000 times instead, but obviously if I do that, i'm also not repeating the subq.l and the bne 4000 times either.

Clearly, that leads to a massive routine, but I have memory I need in extra memory so thats not an issue.

So, can anyone else see any better ways of doing this which will lead to a faster routine?

Please note, i'm not looking for coding elegance, i'm looking to see if my routine can be significantly, or even slightly speeded up, because I have only tested Where Time Stood Still on an emulated A500, I have no clue as to whether or not on a physical machine it will be exactly the same.

If it is the same, then it runs at an acceptable speed, but any improvements would be welcome.

Add.w and subq.w is fastest for 68000 than add.l and subq.l. Anyway fastest than subq.w and bne.b is simple dbf.
Don_Adan is offline  
Old 12 January 2014, 18:00   #3
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
Quote:
Originally Posted by Don_Adan View Post
Add.w and subq.w is fastest for 68000 than add.l and subq.l. Anyway fastest than subq.w and bne.b is simple dbf.
But i'm thinking doing away with a sub, bne or a dbf altogether and having that tight loop repeated instead is going to be quicker still, although that is 32K of instructions it has to run through to build the screen

Just a shame the blitter can't be used to any degree here

Last edited by Galahad/FLT; 12 January 2014 at 18:47.
Galahad/FLT is offline  
Old 12 January 2014, 19:59   #4
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Quote:
Originally Posted by Galahad/FLT View Post
But i'm thinking doing away with a sub, bne or a dbf altogether and having that tight loop repeated instead is going to be quicker still, although that is 32K of instructions it has to run through to build the screen

Just a shame the blitter can't be used to any degree here
You can check Lethal Xcess game, This is dual format game and used ST graphics on Amiga. Perhaps Mad Max can used something interesting.
Don_Adan is offline  
Old 12 January 2014, 20:29   #5
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@Galahad/FLT

You can check this routine. I didn't test it yet, but should works - I will do some tests today evening.

Code:
    move.l  #amount,A5  ;Damn I' matmematician and I will calc this today evening :)

.loop
    movem.l (A0)+,D0-D7
    
    ;D0 - 0 and 1    ;D1 - 2 and 3
    ;D2 - 0 and 1    ;D3 - 2 and 3
    ;D4 - 0 and 1    ;D5 - 2 and 3
    ;D6 - 0 and 1    ;d7 - 2 and 3
    
    movem.w D0/D2/d4/D6,(A2)
    swap    D0
    swap    D2
    swap    D4
    swap    D6
    movem.w D0/D2/D4/D6,(A1)
    addq.l  #8,A1
    addq.l  #8,A2
    
    movem.w D1/D3/D5/D7,(A4)
    swap    D1
    swap    D3
    swap    D5
    swap    D7
    movem.w D1/D3/D5/D7,(A3)
    addq.l  #8,A3
    addq.l  #8,A4

    subq.w  #1,A5
    bne .loop
Asman is offline  
Old 12 January 2014, 20:42   #6
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Quote:
Originally Posted by Asman View Post
@Galahad/FLT

You can check this routine. I didn't test it yet, but should works - I will do some tests today evening.

Code:
    move.l  #amount,A5  ;Damn I' matmematician and I will calc this today evening :)

.loop
    movem.l (A0)+,D0-D7
    
    ;D0 - 0 and 1    ;D1 - 2 and 3
    ;D2 - 0 and 1    ;D3 - 2 and 3
    ;D4 - 0 and 1    ;D5 - 2 and 3
    ;D6 - 0 and 1    ;d7 - 2 and 3
    
    movem.w D0/D2/d4/D6,(A2)
    swap    D0
    swap    D2
    swap    D4
    swap    D6
    movem.w D0/D2/D4/D6,(A1)
    addq.l  #8,A1
    addq.l  #8,A2
    
    movem.w D1/D3/D5/D7,(A4)
    swap    D1
    swap    D3
    swap    D5
    swap    D7
    movem.w D1/D3/D5/D7,(A3)
    addq.l  #8,A3
    addq.l  #8,A4

    subq.w  #1,A5
    bne .loop
Next code can't works:
subq.w #1,A5
bne .loop
Don_Adan is offline  
Old 12 January 2014, 21:11   #7
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
Quote:
Originally Posted by Asman View Post
@Galahad/FLT

You can check this routine. I didn't test it yet, but should works - I will do some tests today evening.

Code:
amount = $3e8 ; added
     move.l  #amount,A5  ;Damn I' matmematician and I will calc this today evening :)
     suba.l a6,a6       ;Added
.loop
    movem.l (A0)+,D0-D7
    
    ;D0 - 0 and 1    ;D1 - 2 and 3
    ;D2 - 0 and 1    ;D3 - 2 and 3
    ;D4 - 0 and 1    ;D5 - 2 and 3
    ;D6 - 0 and 1    ;d7 - 2 and 3
    
    movem.w D0/D2/d4/D6,(A2)
    swap    D0
    swap    D2
    swap    D4
    swap    D6
    movem.w D0/D2/D4/D6,(A1)
    addq.l  #8,A1
    addq.l  #8,A2
    
    movem.w D1/D3/D5/D7,(A4)
    swap    D1
    swap    D3
    swap    D5
    swap    D7
    movem.w D1/D3/D5/D7,(A3)
    addq.l  #8,A3
    addq.l  #8,A4

    subq.w  #1,A5
    cmp.l    a5,a6    ;added
    bne .loop
I've added some stuff to get it to work, not entirely sure its faster, but it does in fact work, so nice one
Galahad/FLT is offline  
Old 12 January 2014, 21:21   #8
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@Don_Adan

Right. Thanks.

@Galahad/FLT
I have another idea to use blitter to copy one bitplan but for sure I will first check it .
Asman is offline  
Old 12 January 2014, 21:36   #9
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
Quote:
Originally Posted by Asman View Post
@Don_Adan

Right. Thanks.

@Galahad/FLT
I have another idea to use blitter to copy one bitplan but for sure I will first check it .
Good man, enthusiasm, I love it
Galahad/FLT is offline  
Old 12 January 2014, 22:02   #10
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
Quote:
Originally Posted by Galahad/FLT View Post
But i'm thinking doing away with a sub, bne or a dbf altogether and having that tight loop repeated instead is going to be quicker still, although that is 32K of instructions it has to run through to build the screen

Just a shame the blitter can't be used to any degree here
Multiple blitter passes does the job. And when using blitter interrupts your code does not need to wait between passes and you can use the CPU to do other stuff in a meanwhile.

Example from top of head.. no warranties as I did not think too much of this Use A and D channels. A modulo 6, D modulo to 0, A to ST fb word 0 and D to amiga plane 0, start with the size width 1 height 1024 for plane 0. In the blitter interrupt just restart blitter until plane 0 has been copied. Then move to plane 1 etc.. Just a thought.
mr.spiv is offline  
Old 13 January 2014, 08:06   #11
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Might contains interesting ideas.
http://www.looksgoodworkswell.com/el...macpaint-code/

Kamelito
kamelito is offline  
Old 13 January 2014, 11:36   #12
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by Don_Adan View Post
Add.w and subq.w is fastest for 68000 than add.l and subq.l.
There is no difference between subq.w and subq.l when the destination is an address register (same for addq, of course).
phx is offline  
Old 13 January 2014, 21:42   #13
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
Quote:
Originally Posted by kamelito View Post
Might contains interesting ideas.
http://www.looksgoodworkswell.com/el...macpaint-code/

Kamelito
Yes, Asmans example uses a genesis of that idea.

Unfortunately, because of the weird way that the Atari ST displays its graphics, its not possible to simply do a straight copy which is what that MacPaint example uses.

Quote:
Originally Posted by mr.spiv View Post
Multiple blitter passes does the job. And when using blitter interrupts your code does not need to wait between passes and you can use the CPU to do other stuff in a meanwhile.

Example from top of head.. no warranties as I did not think too much of this Use A and D channels. A modulo 6, D modulo to 0, A to ST fb word 0 and D to amiga plane 0, start with the size width 1 height 1024 for plane 0. In the blitter interrupt just restart blitter until plane 0 has been copied. Then move to plane 1 etc.. Just a thought.
Care to elaborate with an example?
Galahad/FLT is offline  
Old 13 January 2014, 22:59   #14
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
Quote:
Originally Posted by Galahad/FLT View Post
Yes, Asmans example uses a genesis of that idea.

Unfortunately, because of the weird way that the Atari ST displays its graphics, its not possible to simply do a straight copy which is what that MacPaint example uses.



Care to elaborate with an example?
Check your PM.
mr.spiv is offline  
Old 13 January 2014, 23:10   #15
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@Galahad/FLT

I did some tests and there is - it uses mr.spiv method ( thanks a lot mr.spiv ) plus copy. It must be called twice, or use copy paste method. (For tests I use degas picture from Rolling Thunder - LOADER.PI1 - I still have hope that some day I will so angry and I will convert this game as should be ). So for sure you must adapt some things and some things can be optimized.
Code:
;use WAITBLITTER somewhere on the begining of the program
		move.w	#6,bltamod(a5)
		move.w	#0,bltdmod(a5)
		move.l	#$09f00000,bltcon0(a5)
		move.l	#$ffffffff,bltafwm(a5)

		lea	degas+34,a0
		move.l	screen(a6),a1 
		bsr	CopySt
		lea	degas+34+4,a0
		move.l	screen(a6),a1
		add.l	#$1f40*2,a1
		bsr	CopySt
Everything should be clear - If not then just ask.

Code:
CopySt:
		move.l	a0,bltapt(a5)
		move.l	a1,bltdpt(a5)
		move.w	#0*64+1,bltsize(a5)	;1024 height

		move.l	#$1f40,D0
		move.l	a1,a2
		add.l	d0,a2
		lea	2(a0),a3
		
		move.w	#$1f40/8-1,D1
.1		move.w	(a3),(a2)+
		addq.l	#8,a3
		dbf	D1,.1

		WAITBLITTER
		move.w	#0*64+1,bltsize(a5)	;1024 height

		move.w	#$1f40/8-1,D1
.2		move.w	(a3),(a2)+
		addq.l	#8,a3
		dbf	d1,.2

		WAITBLITTER
		move.w	#0*64+1,bltsize(a5)	;1024 height

		move.w	#$1f40/8-1,D1
.3		move.w	(a3),(a2)+
		addq.l	#8,a3
		dbf	d1,.3

		WAITBLITTER
		move.w	#928*64+1,bltsize(a5)

		move.w	#$1f40/8-1,D1
.4		move.w	(a3),(a2)+
		addq.l	#8,a3
		dbf	d1,.4
		rts
It's faster and I tested it on my A1200. I have another idea but I'm not sure if it works - so I will check it first.
Asman is offline  
Old 13 January 2014, 23:25   #16
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
for this thread
TCD is offline  
Old 13 January 2014, 23:37   #17
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
@Asman, great work dude, its definately faster, but the last few lines are missing from the bottom of the screen as if a couple of bitplanes haven't been written properly, will check that i've actually copied your code properly!


EDIT: Right, a typo on my part.

I've got a feeling that the CPU routine you wrote before was quicker with the movem.w instructions, because, i'm getting a flickering when moving which I didn't have before, and i'm not so sure its quicker.

Will have to do more testing to see.

Last edited by Galahad/FLT; 14 January 2014 at 01:26.
Galahad/FLT is offline  
Old 14 January 2014, 18:36   #18
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Hm.... my next idea was to use blitter but this attempt is slower then previous one. I use blitter and for channels to speed up previous blitter copy (longword instead word). It uses operation D = A + BC, a with mask $ffff0000 and C contains mask $0000ffff.

Code:
	lea	maskC,a3
	lea	degas+34,a0
	lea	6(a0),a1
	
	move.l	screen(a6),a2
	
	WAITBLITTER
	move.w	#12,bltamod(a5)
	move.w	#12,bltbmod(a5)
	move.w	#-4,bltcmod(a5)
	move.w	#0,bltdmod(a5)
	move.l	#$0df80000,bltcon0(a5)
	move.l	#$ffff0000,bltafwm(a5)
	move.l	a0,bltapt(a5)
	move.l	a1,bltbpt(a5)
	move.l	a3,bltcpt(a5)
	move.l	a2,bltdpt(a5)
	move.w	#0*64+2,bltsize(a5) ;1024 longwords
	
	rts

	;must be located in CHIP
maskC:	dc.w	0,-1
So I think that best approach will be previous one, perhaps with other CPU routine.
Asman is offline  
Old 14 January 2014, 20:07   #19
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
I would, as originally hinted, chain blitts using blitter interrupt. Since we are only using two channels I would blitt two planes with blitter and once that has started do the other two using CPU. Then you do not need to have blitter waits between CPU passes.

Quote:
Originally Posted by Asman View Post
Hm.... my next idea was to use blitter but this attempt is slower then previous one. I use blitter and for channels to speed up previous blitter copy (longword instead word). It uses operation D = A + BC, a with mask $ffff0000 and C contains mask $0000ffff.

Code:
    lea    maskC,a3
    lea    degas+34,a0
    lea    6(a0),a1
    
    move.l    screen(a6),a2
    
    WAITBLITTER
    move.w    #12,bltamod(a5)
    move.w    #12,bltbmod(a5)
    move.w    #-4,bltcmod(a5)
    move.w    #0,bltdmod(a5)
    move.l    #$0df80000,bltcon0(a5)
    move.l    #$ffff0000,bltafwm(a5)
    move.l    a0,bltapt(a5)
    move.l    a1,bltbpt(a5)
    move.l    a3,bltcpt(a5)
    move.l    a2,bltdpt(a5)
    move.w    #0*64+2,bltsize(a5) ;1024 longwords
    
    rts

    ;must be located in CHIP
maskC:    dc.w    0,-1
So I think that best approach will be previous one, perhaps with other CPU routine.
mr.spiv is offline  
Old 14 January 2014, 21:18   #20
copse
Registered User
 
Join Date: Jul 2009
Location: Lala Land
Posts: 520
Has anyone been measuring the timings for these, and can they give them?

Should also note that this thread is some top shit.
copse is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiga Juggler real-time reimplementation? Mequa Amiga scene 10 29 May 2023 16:12
Amiga Real-Time 3D Graphics Jherek Carnelia Coders. Tutorials 14 13 April 2023 00:01
WTB: Amiga Real-Time 3d graphics Fridrik MarketPlace 0 27 September 2012 01:53
Wanted - Amiga Real-Time 3D Graphics book michel3105 MarketPlace 0 02 September 2011 08:29
F/S: Vidi Amiga 24-bit real time colour digitiser John64 MarketPlace 4 06 June 2009 18:47

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:37.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11083 seconds with 13 queries