English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 January 2014, 21:43   #21
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 829
@mr.spiv

I never used blitter interrupts. Can you post or link some example source code. Thanks.
About blitter interrupts: I don't think that this will be faster, because every int will eat about 50 cycles on A500 (or something like that). So to copy two planes we need 8 blits, so its eats about 400 cycles - I would prefer to use that's cycles to copy data But of course I can be wrong.
Another problem for me with ints will be measuring. I don't know how to measure it. In all above examples I just make some stupid loop

Code:
LOOP:
 WaitVB $30
 move.w #$5,$dff180
 bsr GalahadRoutine ;for example
 move.w #$0,$dff180
 ;check for ESC key or LMB and exit
 bra LOOP
Now I think that nice method will be to use copper to make blits. If we assume that copper is not used for other things like changing colors every line.

And the best method for me is to remove screen converter and recode all drawing routines. I know its sometimes pain and take some amount of time especially for AtariST bobs but is possible

@copse
Yes I did some measures. Now I will check blitter copper driven + cpu copy. I can post full source and executable if you want but source is so messy.
Asman is offline  
Old 14 January 2014, 22:07   #22
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,017
Quote:
Originally Posted by Asman View Post
@mr.spiv

I never used blitter interrupts. Can you post or link some example source code. Thanks.
About blitter interrupts: I don't think that this will be faster, because every int will eat about 50 cycles on A500 (or something like that). So to copy two planes we need 8 blits, so its eats about 400 cycles - I would prefer to use that's cycles to copy data But of course I can be wrong.
Another problem for me with ints will be measuring. I don't know how to measure it. In all above examples I just make some stupid loop

Code:
LOOP:
 WaitVB $30
 move.w #$5,$dff180
 bsr GalahadRoutine ;for example
 move.w #$0,$dff180
 ;check for ESC key or LMB and exit
 bra LOOP
Now I think that nice method will be to use copper to make blits. If we assume that copper is not used for other things like changing colors every line.

And the best method for me is to remove screen converter and recode all drawing routines. I know its sometimes pain and take some amount of time especially for AtariST bobs but is possible

@copse
Yes I did some measures. Now I will check blitter copper driven + cpu copy. I can post full source and executable if you want but source is so messy.
I'm not going to convert all the graphics and all the routines to make a slight speed difference for A500.

It runs at an acceptable speed, I'm just after any improvements at little cost of time to do.

Considering most other Atari ST realtime conversions have stipulated 68020 as a minimum, i'm doing something different by supporting A500 68000

Oh, if its possible to define a working copperlist blit routine, then we could try that. Where Time Stood Still does no fancy Timer C colour changing tricks, its just plain bitmaps, so for sure we have total control over the copperlist

Last edited by Galahad/FLT; 14 January 2014 at 22:15.
Galahad/FLT is offline  
Old 14 January 2014, 22:10   #23
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,017
Quote:
Originally Posted by copse View Post

Should also note that this thread is some top shit.
Thats what I think, it would be good to get some of the best coding minds behind this to come up with the very fastest realtime Atari ST to Amiga conversion routine going, so that any future game conversions can benefit.

The first routine I did worked and was 'meh, the second routine with the 32K of move.w's was quicker but not elegant, Asmans movem.w solution was better still, if we can really get the best out of blitter and cpu together, man, its going to be pretty close to what it should have been had the Amiga got a native version back in the day.
Galahad/FLT is offline  
Old 15 January 2014, 09:08   #24
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,658
There's no way the blitter can be slower than the CPU for a chipmem to chipmem copy, basically regardless of CPU. addq and subq take the same time regardless of instruction size. dbf takes 12, subq+bne.x takes 14. Only rarely does movem beat a chunk of move.w/l (a0)+,(a1)+.

So if both src and dst are in chipmem, do $09f00000 copy blits, if either is in fastmem do n move.w (a0)+,(an)+ divide d0 by n*2 (and subtract 1) for the dbf value.

Code:
n=32
Process_game_screen:

movem.l d0/a0-a4,-(a7)
move.l videobase(pc),a0 ;Base address of Atari ST screen
lea Amiga_screen,a1 ;Base address of Amiga Screen
move.l #$1f40,d0 ; Size of bitplane
move.l a1,a2
add.l d0,a2
move.l a2,a3
add.l d0,a3
move.l a3,a4
add.l d0,a4
move.l #$1f40/n/2-1,d0
loop_until_copied:
REPT n
move.w (a0)+,(a1)+
move.w (a0)+,(a2)+
move.w (a0)+,(a3)+
move.w (a0)+,(a4)+
ENDR
dbf d0,loop_until_copied
movem.l (a7)+,d0/a0-a4
rts
You can get the #repetitions down if you do the last 32 words separately, but a loop count of 125 is already near-ideal.

As I see it on 68000 only a 4 x copyblit (with src modulo 8 and dst modulo 2) would beat this. Do heavy blits after the last line of display. From memory, you can do up to 25% minus CPU inefficiency of the blit with the CPU while blitting with two channels on. In this case it will likely be close to 10%.

And using bg-color is good for measuring time CPU is wasting waiting for the blitter. Just set bg-color after starting the blit and another color after your blitwait and you'll see.

Last edited by Photon; 15 January 2014 at 09:35.
Photon is offline  
Old 15 January 2014, 19:12   #25
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 829
@Galahad/FLT

Now time for blitter copper driven example

First routine which set bltapt's and bltdpt's

Code:
InitCopper:

;do not forget set copdang!!! (move.w #2,copcon(a5) )

	;set first bitplane
	lea	chipCpB0,a0
	move.l	#degas+34,d0
	move.w	d0,4(a0)
	swap	d0
	move.w	d0,(a0)

	move.l	screen(a6),d0
	move.w	d0,12(a0)
	swap	d0
	move.w	d0,8(a0)

	;set third bitplane
	lea	chipCpB0,a0
	move.l	#degas+34+4,d0
	move.w	d0,4(a0)
	swap	d0
	move.w	d0,(a0)

	move.l	screen(a6),d0
	add.l	#$1f40*2,d0
	move.w	d0,12(a0)
	swap	d0
	move.w	d0,8(a0)

	rts
And here you are the copperlist part

Code:
		;dc.w	$0180,$0a000
		
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltamod,6
		dc.w	bltdmod,0
		dc.w	bltcon0,$09f0	;D = A
		dc.w	bltcon1,$0000
		dc.w	bltafwm,$ffff
		dc.w	bltalwm,$ffff
	
	;first
		dc.w	bltapt
chipCpB0:	dc.w	0
		dc.w	bltapt+2,0
		dc.w	bltdpt,0,bltdpt+2,0
		dc.w	bltsize,1	;1024 height
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,1
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,1
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,928*64+1
	;third
		dc.w	bltapt
chipCpB1:	dc.w	0
		dc.w	bltapt+2,0
		dc.w	bltdpt,0,bltdpt+2,0
		dc.w	bltsize,1	;1024 height
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,1
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,1
		dc.w	$0001,$0000	;blitter wait
		dc.w	$0001,$0000	;twice
		dc.w	bltsize,928*64+1

		;dc.w	$0001,$0000	;blitter wait
		;dc.w	$0001,$0000	;twice
		;dc.w	$0180,$0000
Its take on my 1200/030 blizz IV. about 35% of frame. So its mean there is possibility to use only blitter and should be fast.

Edit: This example copy first and third bitplane from AtariST screen.

EDIT: when I set copdang, then sometimes I can't back to OS. Mean there is something wrong with DisableOS/EnableOS routines, perhaps I should clear copdang
when I back to OS. Any idea ?

EDIT2: clear copdang help me also i will add two blitter waits in my copper list. Thanks a lot for mr.spiv.

Last edited by Asman; 15 January 2014 at 21:08.
Asman is offline  
Old 15 January 2014, 19:54   #26
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 52
Posts: 244
Just a note/hint.. from experience use two blitter waits in your copper list. Back in day my A500 required two waits or boom.

Quote:
Originally Posted by Asman View Post
@Galahad/FLT
Code:
        ;dc.w    $0180,$0a000
        
        dc.w    $0001,$0000    ;blitter wait
        dc.w    bltamod,6
        dc.w    bltdmod,0
        dc.w    bltcon0,$09f0    ;D = A
        dc.w    bltcon1,$0000
        dc.w    bltafwm,$ffff
        dc.w    bltalwm,$ffff
    
    ;first
        dc.w    bltapt
Its take on my 1200/030 blizz IV. about 35% of frame. So its
mr.spiv is offline  
Old 15 January 2014, 21:09   #27
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 829
Quote:
Originally Posted by mr.spiv View Post
Just a note/hint.. from experience use two blitter waits in your copper list. Back in day my A500 required two waits or boom.
Thanks a lot. I just edited my previous post.
Asman is offline  
Old 16 January 2014, 00:58   #28
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,658
Huh, my current copper blitting engine doesn't have double blitwaits, and it works fine on all Amigas. I use dc.w $0007,$7ffe. And of course you certainly don't need double blitwaits because of the CPU, it's on vacation... dreaming... :P
Photon is offline  
Old 16 January 2014, 01:35   #29
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,017
Quote:
Originally Posted by Photon View Post
Huh, my current copper blitting engine doesn't have double blitwaits, and it works fine on all Amigas. I use dc.w $0007,$7ffe. And of course you certainly don't need double blitwaits because of the CPU, it's on vacation... dreaming... :P
Don't some Kickstart 1.2 Amigas have a bug that if you don't wait for the blitter twice, it can sometimes ignore one of them?
Galahad/FLT is offline  
Old 16 January 2014, 08:17   #30
musashi5150
move.w #$4489,$dff07e
 
musashi5150's Avatar
 
Join Date: Sep 2005
Location: Norfolk, UK
Age: 43
Posts: 2,351
Quote:
Originally Posted by Galahad/FLT View Post
Don't some Kickstart 1.2 Amigas have a bug that if you don't wait for the blitter twice, it can sometimes ignore one of them?
IIRC it was mentioned in the Hardware RKM for early A1000 models. It's probably not required for most (all?) machines. YMMV
musashi5150 is offline  
Old 16 January 2014, 08:23   #31
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,573
Quote:
Originally Posted by mr.spiv View Post
Just a note/hint.. from experience use two blitter waits in your copper list. Back in day my A500 required two waits or boom.
Most blits will work fine with single wait so it must be some special case as usual.

Do you remember if wait ended slightly too early or it never waited? Ending early (before last write is done) is possible if blit is fill blit that adds one extra cycle (for example A->D with fill) and there is enough bitplanes active.

Do you have any example code (A500 compatible), I'd like to check what really happens using logic analyzer. (Does it end early or does it decide to not wait or perhaps something else..)

Quote:
Originally Posted by Galahad/FLT View Post
Don't some Kickstart 1.2 Amigas have a bug that if you don't wait for the blitter twice, it can sometimes ignore one of them?
This bug was only in A1000/early A2000 Agnus chip.
Toni Wilen is online now  
Old 16 January 2014, 08:46   #32
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 52
Posts: 244
Quote:
Originally Posted by Photon View Post
Huh, my current copper blitting engine doesn't have double blitwaits, and it works fine on all Amigas. I use dc.w $0007,$7ffe. And of course you certainly don't need double blitwaits because of the CPU, it's on vacation... dreaming... :P
Don't want to go into argument on this but this intro http://janeway.exotica.org.uk/release.php?id=15513 I put together had to use double blitter waits in the copperlist (the sinus scroller is done using copperlist). Otherwise it never worked in a set of Amigas, including mine. That's why I said "from experience". The wait instruction in there is $0101,$0000 i.e. ignore any VPos & HPos comparisons. If there is an explanation to the behaviour like buggy copperlist it would be great to know.

EDIT: ahh.. A2000 had this. I think I might have had a very old A2000 during that time.
EDIT2: cannot really remember which hardware I had.. could have been A500 or A2000A (German whatever model that had all kinds of weird issues with e.g. accelerator cards but the best keyboard ever, heh)

Last edited by mr.spiv; 16 January 2014 at 08:54.
mr.spiv is offline  
Old 16 January 2014, 12:02   #33
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,017
Theres some great ideas here chaps, so thanks to Asman and Mr.Spiv, I will implement them today and see what works best.

I might even just have a graphics option at the start of the game, and enable the user to select whatever it is they think will work best for their system, its not tricky to implement at all, so we'll see what happens
Galahad/FLT is offline  
Old 16 January 2014, 12:59   #34
mr.spiv
Registered User
 
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 52
Posts: 244
Quote:
Originally Posted by Asman View Post
@Galahad/FLT
Its take on my 1200/030 blizz IV. about 35% of frame. So its mean there is possibility to use only blitter and should be fast.
Just one more thing. For Amigas that have ECS Agnus you could do the ST one plane conversion in one blitt.

Instead of four
Code:
move.w #1,$dff058
do
Code:
move.l #$10000001,$dff05c
and then for the second time you start the blitter for another plane just
Code:
move.w #$0001,$dff05e
since the blitter remembers the old height of 4096 pixels.
mr.spiv is offline  
Old 16 January 2014, 17:49   #35
xxxxx
Registered User
 
Join Date: Jan 2012
Location: N/A
Posts: 38
A couple of thoughts on this fun thread;
1. The movem solution is clever, but you can optimize out 3 instructions: if you start from the end of the image instead of the top and change movem.l (a0)+,d0-d7 to movem.l (a0),d0-d7, Sub.w a6,a0 (and a6 contains 32) you can change the other movems to be movem.w regs,-(ax), and get rid of the add commands. (This would require you to be double buffering your output buffer to avoid tearing unless you do something clever
2. About loop unrolling: nobody is saying that it has to be all or nothing- you can unroll4/8 times instead of 4000 times. Takes less memory but you still get most of the savings. Plus, I you target something with a CPU cache them a limited unroll should be faster than a full unroll (if you target 68020, it has a 256 byte instruction cache if I remember correctly).
3. You can also do a mix of CPU and blitter - blit a percentage while CPU is doing the rest. This is easier to manage if you do blitter interrupts.
4. If your code does other things, the blitter can do the conversion while your CPU runs all the rest of the logic - that way the most important isn't what is the fastest standalone routine, but what is the fastest considering what else needs to run.
xxxxx is offline  
Old 16 January 2014, 18:33   #36
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,017
Quote:
Originally Posted by xxxxx View Post
A couple of thoughts on this fun thread;
1. The movem solution is clever, but you can optimize out 3 instructions: if you start from the end of the image instead of the top and change movem.l (a0)+,d0-d7 to movem.l (a0),d0-d7, Sub.w a6,a0 (and a6 contains 32) you can change the other movems to be movem.w regs,-(ax), and get rid of the add commands. (This would require you to be double buffering your output buffer to avoid tearing unless you do something clever
2. About loop unrolling: nobody is saying that it has to be all or nothing- you can unroll4/8 times instead of 4000 times. Takes less memory but you still get most of the savings. Plus, I you target something with a CPU cache them a limited unroll should be faster than a full unroll (if you target 68020, it has a 256 byte instruction cache if I remember correctly).
3. You can also do a mix of CPU and blitter - blit a percentage while CPU is doing the rest. This is easier to manage if you do blitter interrupts.
4. If your code does other things, the blitter can do the conversion while your CPU runs all the rest of the logic - that way the most important isn't what is the fastest standalone routine, but what is the fastest considering what else needs to run.
Well, i'm going to somewhat cheat, and the reason for that is that on 020/030/040...... this game absolutely flies along!!!

So, its going to be necessary to include an ingame configuration options screen (easily done as I have control over the screen anyway) which will enable the end user to pick whichever process suits them.

I will also have to include an option to set a delay in the game to throttle it, as on faster processors, its stupidly fast.

So I think i'll incorporate a few of these ideas as selectable options which will hopefully make the end experience good for all.

I was only considering 68000 users, until I saw the damned thing run at full speed on 68020 and then I realised I need to broaden my approach to it all!
Galahad/FLT is offline  
Old 17 January 2014, 06:59   #37
tomcat666
Retro Freak
 
tomcat666's Avatar
 
Join Date: Nov 2001
Location: Slovenia
Age: 51
Posts: 1,665
Quote:
Originally Posted by Galahad/FLT View Post
I will also have to include an option to set a delay in the game to throttle it, as on faster processors, its stupidly fast.
Wouldn't waiting for Vertical Blank tie the game to 50Hz on all machines - instead of using a delay loop ? And just make an option to disable the VBL if running on a slow machine...
tomcat666 is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiga Juggler real-time reimplementation? Mequa Amiga scene 10 29 May 2023 16:12
Amiga Real-Time 3D Graphics Jherek Carnelia Coders. Tutorials 14 13 April 2023 00:01
WTB: Amiga Real-Time 3d graphics Fridrik MarketPlace 0 27 September 2012 01:53
Wanted - Amiga Real-Time 3D Graphics book michel3105 MarketPlace 0 02 September 2011 08:29
F/S: Vidi Amiga 24-bit real time colour digitiser John64 MarketPlace 4 06 June 2009 18:47

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:24.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10013 seconds with 15 queries