English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 31 May 2018, 23:58   #1
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Fast tile flipping on CD32

Hi all,

Does anyone know if there are any system calls I can make on a CD32 platform that will take a graphical tile and flip it on the X, Y or both axis?

I have the CD32 developer documentation but for some reason I can't find anything specific to this Akiko chip and how to access it.

I want to take a 32x32 pixel tile and flip it.

Any help as always is really appreciated.

Geezer
mcgeezer is offline  
Old 01 June 2018, 08:06   #2
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by mcgeezer View Post
Does anyone know if there are any system calls I can make on a CD32 platform that will take a graphical tile and flip it on the X, Y or both axis?
Hi mcgeezer,
I do not know if it exists and in any case I imagine it would be slow..

Quote:
I want to take a 32x32 pixel tile and flip it.
The only fast thing that comes to my mind to make a flip around the y axis is through lockup table (8bit flip through a 256byte table and then shuffle positions or 16bit flip through a 64kwords table then swap) or much more convoluted and slow with the blitter.
Around the x axis is very simple with both the cpu and the blitter (modulo is your friend).

The alternative is to work completely in chunky, forgetting bitplanes and working only in bytes and then make the conversion with Akiko (that I have no idea how to program) or using one of the many chunky to planar routines available.

Obviously the fastest thing is to use double the memory (or the triple if you want also the composite flips) with different copies of the same tile
ross is offline  
Old 01 June 2018, 09:42   #3
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
Quote:
Originally Posted by mcgeezer View Post
Hi all,

Does anyone know if there are any system calls I can make on a CD32 platform that will take a graphical tile and flip it on the X, Y or both axis?

I have the CD32 developer documentation but for some reason I can't find anything specific to this Akiko chip and how to access it.

I want to take a 32x32 pixel tile and flip it.

Any help as always is really appreciated.

Geezer
Akiko only does transformation from chunky to planar (wrt data transformation).
hooverphonique is offline  
Old 01 June 2018, 16:31   #4
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by ross View Post
The only fast thing that comes to my mind to make a flip around the y axis is through lockup table (8bit flip through a 256byte table and then shuffle positions or 16bit flip through a 64kwords table then swap) or much more convoluted and slow with the blitter.
Perhaps this:

This code is wrong, see my post below for corrected version.

Code:
   move.l   #$55555555,d1
   eor.l    d0,d1
   eor.l    d1,d0
   add.l    d1,d1
   lsr.l    #1,d0
   or.l     d1,d0
   
   move.l   #$33333333,d1
   eor.l    d0,d1
   eor.l    d1,d0
   lsl.l    #2,d1
   lsr.l    #2,d0
   or.l     d1,d0

   move.l   #$0f0f0f0f,d1
   eor.l    d0,d1
   eor.l    d1,d0
   lsl.l    #4,d1
   lsr.l    #4,d0
   or.l     d1,d0

   rol.w    #8,d0
   swap     d0
   rol.w    #8,d0

Last edited by Thorham; 03 June 2018 at 00:10.
Thorham is offline  
Old 01 June 2018, 16:53   #5
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Thanks for the suggestions guys, I guess i'm looking at doing it with the CPU.

There a couple of reasons I can't use memory, one being capacity and two complications.

On the plus side I only need to do this flip when needed depending on what is in the Side Arms tile map. The other plus is that the scrolling only runs at 25 FPS so I should have plenty of time.

I'll write the scroll routine over the next week or so, the challenge is getting all of the palettes to mesh together during scrolling without having to alter the arcade rom tile map - ugh.
Attached Thumbnails
Click image for larger version

Name:	sa_splitset1.png
Views:	297
Size:	83.4 KB
ID:	58422   Click image for larger version

Name:	sa_splitset2.png
Views:	255
Size:	87.6 KB
ID:	58423   Click image for larger version

Name:	sa_splitset3.png
Views:	199
Size:	53.3 KB
ID:	58424  
mcgeezer is offline  
Old 01 June 2018, 18:07   #6
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
If there isn't enough memory for keeping mirrored copies of the same tile, then you might use some kind of graphical cache holding the last few ones that were used.

If you want to do that purely dynamic then the 256-byte table seems the best compromise.
meynaf is offline  
Old 01 June 2018, 21:08   #7
saimon69
J.M.D - Bedroom Musician
 
Join Date: Apr 2014
Location: los angeles,ca
Posts: 3,519
The good'ol' Side Arms! So underrated but also with some playability problems, would like to see it ported decently and improved from its original incarnation...
saimon69 is offline  
Old 01 June 2018, 21:41   #8
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by saimon69 View Post
The good'ol' Side Arms! So underrated but also with some playability problems, would like to see it ported decently and improved from its original incarnation...
I'm just running feasibility at the moment which is likely to fail. There's simply too many colours in the tile sets and sprites to do the game justice.

However I will get a nice 8 way scrolling routine out of it supporting 16 or 32 pixel tile sets that I could use on other projects.
mcgeezer is offline  
Old 01 June 2018, 23:14   #9
saimon69
J.M.D - Bedroom Musician
 
Join Date: Apr 2014
Location: los angeles,ca
Posts: 3,519
Powder was using a scrolling trechnique similar to side arms and some tricks to run lot of stuff with 16 colors, have the source code if you want to give it a look
saimon69 is offline  
Old 01 June 2018, 23:20   #10
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by saimon69 View Post
Powder was using a scrolling trechnique similar to side arms and some tricks to run lot of stuff with 16 colors, have the source code if you want to give it a look
Thanks for the offer, I need to write my own code as I know what I need to implement.
mcgeezer is offline  
Old 02 June 2018, 21:58   #11
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by meynaf View Post
If you want to do that purely dynamic then the 256-byte table seems the best compromise.
I somehow doubt that that's going to be faster than doing it in code. The 4x indexed addressing mode alone seems slower than the code I posted.
Thorham is offline  
Old 02 June 2018, 23:07   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Thorham View Post
I somehow doubt that that's going to be faster than doing it in code. The 4x indexed addressing mode alone seems slower than the code I posted.
Hi Thorham, your routine is not working.

This is a right version:
(I have not thought that much if it can be optimized)
Code:
    move.l  d0,d1
    move.l  #$55555555,d2
    lsr.l   #1,d0
    add.l   d1,d1
    and.l   d2,d0
    add.l   d2,d2
    and.l   d2,d1
    or.l    d1,d0
    
    move.l  d0,d1
    move.l  #$33333333,d2
    lsr.l   #2,d0
    lsl.l   #2,d1
    and.l   d2,d0
    lsl.l   #2,d2
    and.l   d2,d1
    or.l    d1,d0

    move.l  d0,d1
    move.l  #$0f0f0f0f,d2
    lsr.l   #4,d0
    lsl.l   #4,d1
    and.l   d2,d0
    lsl.l   #4,d2
    and.l   d2,d1
    or.l    d1,d0
    
    rol.w   #8,d0
    swap    d0
    rol.w   #8,d0
I've serious doubts that it may be faster than a LUT version, especially if designed for a CD32 (a chipmem only 020).

ross is offline  
Old 03 June 2018, 00:09   #13
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by ross View Post
Hi Thorham, your routine is not working.
Thanks for pointing that out Some of the eors have to be ands Remind me to test code before posting it
Code:
   move.l   #$55555555,d1
   and.l    d0,d1
   eor.l    d1,d0
   add.l    d1,d1
   lsr.l    #1,d0
   or.l     d1,d0

   move.l   #$33333333,d1
   and.l    d0,d1
   eor.l    d1,d0
   lsl.l    #2,d1
   lsr.l    #2,d0
   or.l     d1,d0

   move.l   #$0f0f0f0f,d1
   and.l    d0,d1
   eor.l    d1,d0
   lsl.l    #4,d1
   lsr.l    #4,d0
   or.l     d1,d0

   rol.w    #8,d0
   swap     d0
   rol.w    #8,d0
Thorham is offline  
Old 03 June 2018, 00:18   #14
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by Thorham View Post
Thanks for pointing that out Some of the eors have to be ands Remind me to test code before posting it
Code:
   move.l   #$55555555,d1
   and.l    d0,d1
   eor.l    d1,d0
   add.l    d1,d1
   lsr.l    #1,d0
   or.l     d1,d0

   move.l   #$33333333,d1
   and.l    d0,d1
   eor.l    d1,d0
   lsl.l    #2,d1
   lsr.l    #2,d0
   or.l     d1,d0

   move.l   #$0f0f0f0f,d1
   and.l    d0,d1
   eor.l    d1,d0
   lsl.l    #4,d1
   lsr.l    #4,d0
   or.l     d1,d0

   rol.w    #8,d0
   swap     d0
   rol.w    #8,d0
Would it be ok to ask for a little explanation on what this code is doing?

I haven't debugged or tried it yet but a short explanation of source data/dest would be really useful.

Cheers,
Geezer
mcgeezer is offline  
Old 03 June 2018, 00:25   #15
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by mcgeezer View Post
Would it be ok to ask for a little explanation on what this code is doing?

I haven't debugged or tried it yet but a short explanation of source data/dest would be really useful.

Cheers,
Geezer
Well, this is based in a magnitude progressive group swapping (first bits, then pairs, then nibbles, then bytes, then words).
Basically is like a SIMD approach because there is not carry between operations.
Input D0 contains the 32 bits from a bitplane, output d0 the same bits flipped.

Last edited by ross; 03 June 2018 at 00:35. Reason: typo... bitplane not bitblane :)
ross is offline  
Old 03 June 2018, 00:25   #16
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by mcgeezer View Post
Would it be ok to ask for a little explanation on what this code is doing?
It simply swaps odd and even bits, bit pairs, nibbles, bytes and finally words.

Quote:
Originally Posted by mcgeezer View Post
I haven't debugged or tried it yet but a short explanation of source data/dest would be really useful.
D0 is both source and destination.
Thorham is offline  
Old 03 June 2018, 00:31   #17
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Thorham View Post
It simply swaps odd and even bits, bit pairs, nibbles, bytes and finally words.

D0 is both source and destination.
Same time
ross is offline  
Old 03 June 2018, 00:32   #18
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by ross View Post
Well, this is based in a magnitude progressive group swapping (first bits, then pairs, then nibbles, then bytes, then words).
Basically is like a SIMD approach because there is not carry between operations.
Input D0 contains the 32 bits from a bitplane, output d0 the same bits flipped.
Quote:
Originally Posted by Thorham View Post
It simply swaps odd and even bits, bit pairs, nibbles, bytes and finally words.

D0 is both source and destination.
Thanks guys.

I like this because I can fit this in 68020 cache so it will go full speed.

Appreciate it.
mcgeezer is offline  
Old 03 June 2018, 01:02   #19
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by mcgeezer View Post
Appreciate it.
Mine is a didactics implementation (the algorithm is explicit).
Thorham is a more optimized version based on eor property (i don't figure out a better optimization possible).

At this point we need to test versus LUT, what will the winner be?
ross is offline  
Old 03 June 2018, 11:31   #20
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Some non-scientific and quick tests.
Pure code seems slightly faster than this lazy bfextu 8bit LUT implementation:
Code:
_lut8flip:
	lea	_8lut(pc),a0
	move.l	d0,d1
	bfextu  d1{8:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{16:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{24:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{0:8},d2
	move.b	(a0,d2.w),d0
	rts
But the absolute winner is the 16bit LUT approach (even 50% faster).
Simple as:
Code:
_lut16flip:
	lea	_16lut+65536,a0
	move.w	(a0,d0.w*2),d0
	swap	d0
	move.w	(a0,d0.w*2),d0
	rts
The abuse of memory can be contestable, BUT:
suppose you have a lot of big AGA sprites (64x64,4planes) and also a lot of tiles (32x32,4/8planes) for a big total of 1MB of data, all to be flipped.
In this case may be useful (the waste becomes proportionally less and less significant, and CPU time is precious on 020..)

But surely pure code, like Thorham suggested, is a great deal!


[EDIT, PS]
Why non-scientific?
I do not have a CD32, nor an Amiga for that matter
So it's all based on the emulation of WinUAE which for 020 is not CE perfect (or it is for this simple code? well, it's not that important..).
Also I had no will to write code other than bfextu and anyway the difference in speed between pure code and 8bit LUT does not seem significant enough to justify the exclusive use of LUT

Last edited by ross; 03 June 2018 at 12:04. Reason: PS
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Workaround to X-Flipping issue found. No actual solution as yet. Brick Nash Coders. AMOS 12 13 October 2017 19:01
flipping through screens using middle mouse button Yulquen74 request.Apps 5 27 June 2014 21:31
Too fast CD32 emulation Amigabest support.WinUAE 1 13 May 2012 20:13
wing commander cd32 too fast JuvUK support.Games 8 21 March 2009 21:43
Flipping floppies Dave_wb support.Hardware 8 03 December 2006 12:36

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:07.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.15964 seconds with 16 queries