English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old Today, 17:08   #1
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,397
can someone help me to optimize this blitter routine?

I wrote that myself, so I'm not questioning it too much but maybe I'm missing something BIG...


The inputs are pretty easy to understand, code supports vertical clipping, and works on A0 which is a pointer on a list of bitplanes: 16 pixel width. If bitplane is 0 then it's skipped (which is a big optimization already). Also I chose not to "cookie cut" the background if a bitplane is 0 (which can lead to strange effects when BOBs are overlayed, but in the facts it's barely noticeable).

CHECK_BLITTER_BOUNDS is only enabled in "developer" mode.
WAIT_BLIT is a macro that sets "blitter nasty" flag, waits for blitter and unsets "blitter nasty".

Code:
.macro	WAIT_BLIT
	move.w	#0x8400,(dmacon,a5)		| blitter high priority
wait\@:
	BTST	#6,(dmaconr,a5)
	BNE.S	wait\@
	move.w	#0x0400,(dmacon,a5)		| blitter normal priority
.endm
It's also using a multiplication table to compute offset from Y value (mulNB_BYTES_PER_ROW_table). Blitter mask is all FFFFFs all through the game.

Code:
* < A5: custom
* < D0.W,D1.W: x,y
* < A0: source (pointer on array of planes)
* < A1: destination fg plane, also background to mix with cookie cut fg plane
* < A3: source mask for cookie cut
* < D2: width in bytes (inc. 2 extra for shifting)
* < D3: number of planes
* < D4: height. If negative, source is copied with negative modulo (flip)
* < D5: y offset for source planes

* blit mask set
* returns: start of destination in A1 (computed from old A1+X,Y)
* trashes: a1
blit_planes_any_internal_cookie_cut:
    movem.l d0-d7/a2/a4,-(a7)
    * pre-compute the maximum of shit here
    tst.w    d4
    bpl.b    1f
    * inverted y blit
    
    sub.w    d4,d1    | pre-add height to d1
    subq.w    #1,d1    | minus one
1:
    tst    d1
    beq.b   2f    | optim
    cmp.w    #NB_LINES,d1
    jcc        8f            | too low, won't be drawn, may as well optimize
    lea        mulNB_BYTES_PER_ROW_table,a4
    .ifdef    NO68020
    add.w    d1,d1
    move.w  (a4,d1.w),d1    | y times 40
    .else
    move.w  (a4,d1.w*2),d1    | y times 40
    .endif
2:
    move.w    d5,-(a7)
    moveq    #0,d5
    move.w  #0x0fca,d5    | B+C-A->D cookie cut   
    swap    d5
    moveq    #0,d6        | make sure D6.L is zero!!
    move.w  d0,d6
    beq.b   4f
    lsr.w   #3,d0
    bclr    #0,d0
    and.w   #0xF,d6
    beq.b    3f                | if 0 shift, optimize a few instructions
    lsl.l   #8,d6
    lsl.l   #4,d6
    or.w    d6,d5            | add shift to mask (bltcon1)
    swap    d6
    clr.w   d6
    or.l    d6,d5            | add shift
3:   
    add.w   d0,d1
4:
    * make offset even. Blitter will ignore odd address
    * but a 68000 CPU doesn't and since we RETURN A1...
    bclr    #0,d1
    add.w   d1,a1       | plane position (D1 < 0x7FFF, 288*40=0x2D00)
    move.w    #NB_BYTES_PER_ROW,d0
    tst.w    d4
    bpl.b    5f
    neg.w    d0
    neg.w    d4    | make d4 positive again
5:

    sub.w   d2,d0       | blit width
    lsl.w   #6,d4
    lsr.w   #1,d2
    add.w   d2,d4       | blit height
    * always the same settings (ATM)

    * prepare d1
    moveq    #0,d1
    move.w    #0x0BCA,d1
    swap    d1
    or.l    d6,d1

    * now just wait for blitter ready to write all registers
    WAIT_BLIT
    
    * blitter registers set

    clr.w bltamod(a5)        |A modulo=bytes to skip between lines
    clr.w bltbmod(a5)        |B modulo=bytes to skip between lines
    move.l    d5,d7            | save cookie cut bltcon
    move.w    (a7)+,d5
    
    move.w  d0,bltcmod(a5)    |C modulo
    move.w  d0,bltdmod(a5)    |D modulo
                    
    add.w    d5,a3            | apply to mask too
    subq    #1,d3
    beq.b    7f
    subq    #1,d3
6:
    jbsr    process_1_plane
    lea        (BG_SCREEN_PLANE_SIZE,a1),a1
    WAIT_BLIT
    dbf        d3,6b
7:
    jbsr    process_1_plane
8:   
    movem.l (a7)+,d0-d7/a2/a4
    rts
    
process_1_plane:
    move.l a3,bltapt(a5)    |  source graphic top left corner (mask)
    move.l (a0)+,d0
    jeq    63f    | do nothing ATM see if it works
    move.l    d0,a4
    add.w    d5,a4
    bra.b    61f
60:
    * source is 0: just apply mask (less bandwidth lost) and change bltcon
    move.l    d1,bltcon0(a5)    | sets con0 and con1: C-A->D cookie cut, B fixed
     clr.w    bltbdat(a5)    |B word is zero
    bra.b    62f
61:
    * non-zero: set data source & bltcon
    move.l    d7,bltcon0(a5)    | sets con0 and con1: C-A+B->D cookie cut full
    move.l    a4,bltbpt(a5)    |source graphic top left corner
62:
    CHECK_BLITTER_BOUNDS
     move.l    a1,bltcpt(a5)    |pristine background top (bottom) left corner
    move.l    a1,bltdpt(a5)    |destination top (bottom) left corner
    move.w  d4,bltsize(a5)    |rectangle size, starts blit
63:    
    rts
jotd is offline  
Old Today, 17:59   #2
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,275
A few micro optimizations that spring to mind:
Code:
    lsl.l   #8,d6
    lsl.l   #4,d6
Faster to use
ror.w #4,d6


But really that whole sequence.. Are you blitting a lot at x=0 (probably that optimization is not worth it)? So

Code:
   moveq #$f,d6
   and.w d0,d6
   beq.b .noshift
   ror.w #4,d6
   ...
Also clr.w to memory is not great on 68000 since it does a useless read (not dangerous here, but beware). moveq #0,tempreg, move.w tempreg,xxx(a5), move.w tempreg,yyy(a5) is faster than 2x clr.w

And of course make process_1_plane a macro and inline it
paraj is offline  
Old Today, 18:06   #3
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,397
makes sense. Except that you probably mean ror.l #4,d6. Oh since d6 can't be > 0xFFFF ok I get it!!

About inlining the big routine, yes, it would be good for 68000 (but the overhead is negligible given the size of the routine), probably not so much for 68020.

Last edited by jotd; Today at 18:20.
jotd is offline  
Old Today, 18:29   #4
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,275
Quote:
Originally Posted by jotd View Post
makes sense. Except that you probably mean ror.l #4,d6. Oh since d6 can't be > 0xFFFF ok I get it!!
It would move the bits into the wrong position with ror.l also looks like you don't need the clr.w d6 after swap d6 (upper word is always clear), and maybe you can arrange the instructions a bit to avoid a bit of swap logic anyway.

Quote:
Originally Posted by jotd View Post
About inlining the big routine, yes, it would be good for 68000 (but the overhead is negligible given the size of the routine), probably not so much for 68020.
Probably won't be a win 020, but measurement is king of course. Whether you consider 34 cycles per loop iteration (for bsr.b+rts) negligible is up to you of course, but if it is then the above is even less worth while
paraj is offline  
Old Today, 20:02   #5
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,068
For me

Code:
    move.w    d5,-(a7) ; this is bug, because longword is used
   moveq    #0,d5
   move.w  #0x0fca,d5    | B+C-A->D cookie cut   
   swap    d5
Code:
   move.l    d5,-(a7)
 
    move.l  #0x0fca0000,d5    | B+C-A->D cookie cut
Don_Adan is offline  
 


Currently Active Users Viewing This Thread: 2 (1 members and 1 guests)
Thcm
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
What the bloody heck is wrong with this blitter routine? mcgeezer Coders. Asm / Hardware 6 27 March 2019 18:31
CPU Filling vs. Blitter Filling Routine victim Coders. General 18 26 January 2014 02:15
Blitter filling routine used in games Codetapper Coders. General 2 26 January 2012 10:20
Optimize the configuration Raudi support.WinUAE 12 26 May 2008 08:44
App to optimize disks? Photon request.Apps 7 06 January 2007 05:30

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:14.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.08668 seconds with 15 queries