10 February 2010, 18:57 | #1 |
Registered User
Join Date: Mar 2009
Location: N/A
Posts: 23
|
CPU Filling vs. Blitter Filling Routine
Hello coder Boys !
At the moment i try to programming a new 3d engine. I trying current to programming a better cpu fill routine for the 68020 to 68060 processor. Have you some ideas how I can do better routine or another solution? What do you think about Flood-Fill or Scanline-algorithms? My first experimental is very slow, here now my result. Code:
;-------------------------------------------------------------------------------- ;---PROCESSOR-FILLING-Routine 10.02.2010 Sascha Müller alias Victim of Savage --- ;at first fill the tables bsr InitFillTable ;now come some other routines and then fill bsr ProcFill rts ProcFill: move.l planebufferwork,a0 moveq.l #0,d0 moveq.l #0,d1 lea intab(pc),a2 lea dbtab(pc),a3 move #255,d7 pr_fyl: lea fntab(pc),a1 move.l a1,4(a3) lea fitab(pc),a1 move.l a1,(a3) moveq #40-1,d6 ;width pr_fxl: tst.b d1 ;speed up by ignoring 0 bytes beq pr_zer pr_set: tst.b (a0) bne pr_lin move.b #$ff,(a0)+ dbf d6,pr_set bra pr_ny pr_zer: tst.b (a0) bne pr_lin adda.l #1,a0 dbf d6,pr_zer bra pr_ny pr_lin: ;move #$1ff,d5 ;llo: nop ;dbf d5,llo move.b (a0),d0 move.b (a1,d0.l),(a0)+ tst.b (a2,d0.l) beq pr_nic ;no insert change move.l 0(a3),d2 ;change filltables move.l 4(a3),0(a3) move.l d2,4(a3) move.l 0(a3),a1 ;->a1 not.b d1 ;change insert in d1 pr_nic: dbf d6,pr_fxl pr_ny: dbf d7,pr_fyl rts ;--- creates three tables later used for filling --- InitFillTable: moveq.l #0,d0 ;init routine moveq.l #0,d3 lea fitab(pc),a0 ;fill table lea intab(pc),a1 ;insert table move #255,d4 ;256 different bytes fill pr_bl: move d3,d0 bsr pr_bfi move.b d0,(a0)+ move.b d1,(a1)+ addq #1,d3 dbf d4,pr_bl lea fitab(pc),a0 lea fntab(pc),a1 move #255,d1 pr_inv: move.b (a0)+,d0 not.b d0 move.b d0,(a1)+ dbf d1,pr_inv rts pr_bfi: moveq.l #0,d1 ;must complete a byte (d0) as the blitte moveq #7,d2 ;to test all eight bits pr_bfl: btst d2,d0 bne pr_ich tst.b d1 beq pr_nb bset d2,d0 pr_nb: dbf d2,pr_bfl rts pr_ich: tst.b d1 bne pr_nof bclr d2,d0 pr_nof: not.b d1 bra pr_nb dbtab: dc.l fitab,fntab ;change tabelle fitab: blk.b 256,0 ;fill table insert=0 fntab: blk.b 256,0 ;fill table insert=1 intab: blk.b 256,0 ;insert change ($ff) Victim |
10 February 2010, 19:16 | #2 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Since your code is not really commented (which parameters etc pp?) I don't know how it is supposed to work (and I don't feel like doing "guess work"). Your code looks kinda strange to me anyway.
Anyway, here's my flat filler which should be easy to understand. Code:
******************************************* *** Draw flat filled Polygons *** ******************************************* ; $VER: POLYFILLER FLAT v3.o, Wed, 24-Mar-2oo4 ; (c)oded by StingRay/[S]carab^Scoopex ; ; derived from my texture-mapper, no division table used ; maximum 3 divisions per polygon, writes 4 pixels at once ; ; d0-d2: coords ; d3: color ; a4: surface ; a6: ptr to object structure DRAW_FLAT moveq #0,d5 move.b d3,d5 ; ......cc cmp.w #P_T50,SURF_FLAGS(a4) beq.b .nolc lsl.w #8,d5 ; ....cc00 move.b d3,d5 ; ....cccc move.w d5,d3 swap d5 ; cccc.... move.w d3,d5 ; cccccccc .nolc cmp.w d0,d2 bge.b .ok1 exg.l d0,d2 .ok1 cmp.w d1,d2 bge.b .ok2 exg.l d1,d2 .ok2 cmp.w d0,d1 bge.b .ok3 exg.l d0,d1 .ok3 move.b OBJ_CLIP(a6),d7 lea .VARS(pc),a6 movem.l d0-d2,.X1(a6) ; x/y 1-3 move.b d7,.CLIP(a6) ; clipping status move.w SURF_FLAGS(a4),.FLAGS(a6) move.w d2,d7 sub.w d0,d7 ; y3-y1 beq.w .exit move.l d2,d6 sub.l d0,d6 ; x3-x1 asr.l #8,d6 ext.l d7 divs.l d7,d6 move.l d6,.DXDY2(a6) move.w d1,d7 sub.w d0,d7 ; y2-y1 bne.b .noflat ; xend for 2nd section = x1 move.w .X1(a6),d1 ext.l d1 lsl.l #8,d1 bra.w .part2 .noflat move.w .X2(a6),d1 sub.w .X1(a6),d1 ext.l d7 ext.l d1 lsl.l #8,d1 divs.l d7,d1 move.l d1,.DXDY1(a6) *** Draw top part of triangle *** move.w .X1(a6),d0 move.w d0,d1 move.l .DXDY2(a6),a2 move.l .DXDY1(a6),a3 tst.b .CLIP(a6) beq.b .nclip move.w .Y1(a6),a5 move.w .Y2(a6),d7 bsr.b .draw bra.b .part2 .nclip move.w .Y2(a6),d7 sub.w .Y1(a6),d7 move.l ENG_CHUNKYBUFFER(pc),a0 move.w .Y1(a6),d6 mulu.w #CHUNKYX,d6 add.l d6,a0 bsr.b .draw *** Draw bottom part of triangle *** .part2 move.w .Y3(a6),d7 sub.w .Y2(a6),d7 beq.w .exit move.w .X3(a6),d0 sub.w .X2(a6),d0 ext.l d7 ext.l d0 lsl.l #8,d0 divs.l d7,d0 move.l d0,.DXDY1(a6) move.w .X2(a6),d0 move.l .DXDY1(a6),a2 move.l .DXDY2(a6),a3 tst.b .CLIP(a6) beq.b .nclip2 move.w .Y2(a6),a5 move.w .Y3(a6),d7 bra.b .draw2 .nclip2 move.l ENG_CHUNKYBUFFER(pc),a0 move.w .Y2(a6),d6 mulu.w #CHUNKYX,d6 add.l d6,a0 bra.b .draw2 ******************************************* *** NON-CLIPPED DRAW NON-TRANSPARENT *** ******************************************* ; d0: x1 ; d1: x2 ; d5: color ; d7: height ; a0: chunkybuffer + yoffset ; a2: DXDY2 (DXDY1 for the 2nd section) ; a3: DXDY1 (DXDY2 for the 2nd section) .draw ext.l d1 lsl.l #8,d1 .draw2 ext.l d0 lsl.l #8,d0 subq.w #1,d7 ; adapt "dbf" cmp.l a2,a3 bge.b .swap move.l d0,a2 ; x left move.l d1,a1 ; x right move.l .DXDY1(a6),d6 ; delta xleft move.l .DXDY2(a6),a4 ; delta xright bsr.b .go move.l a1,d1 .exit rts .swap move.l d1,a2 ; x left move.l d0,a1 ; x right move.l .DXDY2(a6),d6 ; delta xleft move.l .DXDY1(a6),a4 ; delta xright bsr.b .go move.l a2,d1 rts CNOP 0,4 .go tst.b .CLIP(a6) bne.w .cl_go cmp.w #P_T50,.FLAGS(a6) beq.b .trans .loopY move.l a2,d0 ; x start move.l a1,d1 ; x end lsr.l #8,d0 lsr.l #8,d1 sub.w d0,d1 ; delta X = width of scanline ble.w .noX lea (a0,d0.w),a5 lsr.w #1,d1 bcc.b .nobyte move.b d5,(a5)+ .nobyte lsr.w #1,d1 bcc.b .noword move.w d5,(a5)+ .noword subq.w #1,d1 bmi.b .nolong .loopX move.l d5,(a5)+ dbf d1,.loopX .nolong .noX add.l a4,a1 add.l d6,a2 lea CHUNKYX(a0),a0 dbf d7,.loopY rts ******************************************* *** NON-CLIPPED DRAW 50% TRANSPARENT *** ******************************************* .trans .tloopY move.l a2,d0 ; x start move.l a1,d1 ; x end lsr.l #8,d0 lsr.l #8,d1 sub.w d0,d1 ; delta X = width of scanline ble.b .tnoX lea (a0,d0.w),a5 .tloopX moveq #0,d4 move.b (a5),d4 add.w d5,d4 lsr.w #1,d4 move.b d4,(a5)+ subq.w #1,d1 bne.b .tloopX .tnox add.l a4,a1 add.l d6,a2 lea CHUNKYX(a0),a0 dbf d7,.tloopY rts ******************************************* *** CLIPPED DRAW NON-TRANSPARENT *** ******************************************* ; d0: x1 ; d1: x2 ; d5: color ; d7: y1 ; a5: y2 ; a2: DXDY2 (DXDY1 for the 2nd section) ; a3: DXDY1 (DXDY2 for the 2nd section) CNOP 0,4 .cl_go .cl_loopY cmp.w #P_T50,.FLAGS(a6) beq.b .cltrans cmp.w a5,d7 blt.b .cl_exit move.w a5,d0 cmp.w #CLIPY_MAX,d0 bgt.b .cl_exit cmp.w #CLIPY_MIN,d0 blt.b .cl_noX mulu.w #CHUNKYX,d0 move.l ENG_CHUNKYBUFFER(pc),a0 add.l d0,a0 move.l a2,d0 ; x start move.l a1,d1 ; x end lsr.l #8,d0 lsr.l #8,d1 movem.w ([ENG_CLIPX_TABPTR,pc],a5.w*4),d2/d3 .cl_cl cmp.w d3,d0 bgt.b .cl_noX cmp.w d2,d1 blt.b .cl_noX cmp.w d3,d1 ble.b .cl_xmaxok move.w d3,d1 .cl_xmaxok cmp.w d2,d0 bge.b .cl_xminok move.w d2,d0 .cl_xminok ; cmp.w #CLIPX_MAX,d0 ; bgt.b .cl_noX ; cmp.w #CLIPX_MIN,d1 ; blt.b .cl_noX ; cmp.w #CLIPX_MAX,d1 ; ble.b .cl_xmaxok ; move.w #CLIPX_MAX,d1 ;.cl_xmaxok ; cmp.w #CLIPX_MIN,d0 ; bge.b .cl_xminok ; moveq #CLIPX_MIN,d0 ; x1 = CLIPX_MIN ;.cl_xminok sub.w d0,d1 ; delta X = width of scanline ble.w .cl_nox add.w d0,a0 lsr.w #1,d1 bcc.b .cl_nob move.b d5,(a0)+ .cl_nob lsr.w #1,d1 bcc.b .cl_now move.w d5,(a0)+ .cl_now subq.w #1,d1 bmi.b .cl_nol .cl_loopX move.l d5,(a0)+ dbf d1,.cl_loopX .cl_nol .cl_noX add.l a4,a1 add.l d6,a2 addq.w #1,a5 bra.b .cl_loopY .cl_exit rts ******************************************* *** CLIPPED DRAW 50% TRANSPARENT *** ******************************************* .cltrans .clt_loopY cmp.w a5,d7 blt.b .cl_exit move.w a5,d0 cmp.w #CLIPY_MAX,d0 bgt.b .cl_exit cmp.w #CLIPY_MIN,d0 blt.b .clt_noX mulu.w #CHUNKYX,d0 move.l ENG_CHUNKYBUFFER(pc),a0 add.l d0,a0 move.l a2,d0 ; x start move.l a1,d1 ; x end lsr.l #8,d0 lsr.l #8,d1 cmp.w #CLIPX_MAX,d0 bgt.b .clt_noX cmp.w #CLIPX_MIN,d1 blt.b .clt_noX cmp.w #CLIPX_MAX,d1 ble.b .clt_xmaxok move.w #CLIPX_MAX,d1 .clt_xmaxok cmp.w #CLIPX_MIN,d0 bge.b .clt_xminok moveq #CLIPX_MIN,d0 ; x1 = CLIPX_MIN .clt_xminok sub.w d0,d1 ; delta X = width of scanline ble.w .clt_nox add.w d0,a0 .clt_loopX moveq #0,d4 move.b (a0),d4 add.w d5,d4 lsr.w #1,d4 move.b d4,(a0)+ subq.w #1,d1 bne.b .clt_loopX .clt_noX add.l a4,a1 add.l d6,a2 addq.w #1,a5 bra.b .clt_loopY .VARS RSRESET .X1 rs.w 1 .Y1 rs.w 1 .X2 rs.w 1 .Y2 rs.w 1 .X3 rs.w 1 .Y3 rs.w 1 .DXDY1 rs.l 1 .DXDY2 rs.l 1 .CLIP rs.w 1 .FLAGS rs.w 1 .SIZE rs.b 0 dcb.b .SIZE |
10 February 2010, 19:31 | #3 |
gone
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
|
@ Sting - I take it in general as you go above 68020 it starts to get quicker to do blitter type stuff using the processor? - perhaps gaining additional speed because you're able to use fast RAM...?
Last edited by pmc; 11 February 2010 at 07:51. Reason: Edited... |
10 February 2010, 20:21 | #4 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,187
|
@pmc
RE: '020+ CPU blitting routines The '020 has a 256 byte code cache and 32 bit memory bus, making small loops go much faster and outstripping the bandwidth of a 16-bit ECS blitter. The main reason you'd want to use a CPU-blitting routine though, is that chunky graphics are almost always faster than planar. |
10 February 2010, 21:30 | #5 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,613
|
I think he probably means 'filled polygons'?
In the Stunner Dentro I had 25% of the screen height filled by the CPU while the blitter filled the rest IIRC. Could have been 12.5%. Stunner has a completely other (set of) 'fill methods', as it's inconvex vectors. Let me know if you want me to dig it up Victim. I'm not sure he means filling blitter-drawn polys ofc. If it's only going to work on expanded A1200's (and better), it'd be better to keep it all in fastram and do all the work with the CPU, including the final copy to chipmem screen. I know little about cache vs stock A1200 behavior - I'd have to learn more about the behavior if it fills the chipmem screen directly. |
11 February 2010, 00:04 | #6 |
Banned
Join Date: Jan 2009
Location: U.K.
Posts: 93
|
I did some time tests on this recently and found that you can get a large speed increase if you do alternate blitting and filling on a 68000 Amiga. plane 1 = blit, plane 2 = fill etc....
However, you get a HUGE increase in speed if you only cpu-fill on an 68020 with some fast ram. Hope that helps. Kev G |
11 February 2010, 07:55 | #7 | |
gone
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
|
Quote:
Speed increase is there purely because of processor / blitter doing these operations concurrently...? |
|
11 February 2010, 18:13 | #8 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,613
|
Quote:
|
|
12 February 2010, 00:18 | #9 | ||
Registered User
Join Date: Mar 2009
Location: N/A
Posts: 23
|
Hi coder boys !
Many thanks for your quick reply and excuse my bad english as a first, but I am still learning. Quote:
My first goal is a routine without any chunky mode programming. Chunky mode to come until a later version. Quote:
what can i still improve on my routine and do better? Do you have any ideas? how do I use the cache of 68040 or 68060? Here is one of my current routines, but how stressed she is unfortunately very slow. - Pure CPU vector calculation - Pure CPU clear - Pure CPU Line - Pure CPU Fill Code:
;******* ;**** Date: 11.02.2010 Prog: CPU-Fill-Vector ;**** ;**** Done by Victim of Savage (Sascha Mueller) ;******* auto cs\Sinus\0\451\451\$7fff-572\572\w0\ny auto e\o\j section prog,CODE_C start: movem.l d0-d7/a0-a6,-(sp) move #$4000,$dff09a lea CopperList(pc),a0 move.l a0,$dff084 ;Init CopperList bsr InitCPUFillTab ;MAKE FILL TAB bsr InitMulTab MouseWait: bsr WaitRaster bsr TribbleBuffer bsr CPUClear bsr Rotate bsr Angels movem.l d0-d7/a0-a6,-(sp) bsr CPUFill movem.l (sp)+,d0-d7/a0-a6 move #$ff,$dff180 btst #6,$bfe001 bne.w MouseWait movem.l (sp)+,d0-d7/a0-a6 rts WaitRaster: move.l $dff004,d0 asr.l #8,d0 and #$1ff,d0 cmp #300,d0 ;300 bne.s WaitRaster rts TribbleBuffer: move.l PlaneBufferShow(pc),d0 move.l PlaneBufferWork(pc),PlaneBufferShow move.l PlaneBufferClear(pc),PlaneBufferWork move.l d0,PlaneBufferClear lea.l CopperPlanes(pc),a3 move.l PlaneBufferShow(pc),d0 move d0,6(a3) swap d0 move d0,2(a3) swap d0 add.l #40,d0 move d0,14(a3) swap d0 move d0,10(a3) rts ;***************************************************** ;** CPU-Clear for two Bitplanes ;***************************************************** CPUClear: lea leer(pc),a6 movem.l (a6)+,d0-d7/a0-a5 move.l PlaneBufferClear(pc),a6 lea.l 256*40*2(a6),a6 blk.l 364,$48e6fffc ;2 Planes 366 rts leer: blk.l 14,0 ;***************************************************** ;** CPU-Vector-Rotation with 12 Muls ;** In a later version i will make a matrix with 9 muls ;***************************************************** Rotate: lea Vektor(pc),a1 lea XY(pc),a2 lea Sinus(pc),a3 moveq #7,d7 Rot: movem (a1)+,d0-d2 tst AngleXSpeed beq RotateY move alpha(pc),d6 ;Angle move (a3,d6.w*2),d5 ;Sinus move (a3,d6.w*2,180),d6 ;Cosinus move d1,d3 ;y save move d2,d4 ;z save muls d6,d1 ;y*cos(alpha) muls d5,d4 ;z*sin(alpha) sub.l d4,d1 ;y*cos(alpha) - z*sin(alpha) add.l d1,d1 ;2^15 = $8000 swap d1 muls d5,d3 ;y*sin(alpha) muls d6,d2 ;z*cos(alpha) add.l d3,d2 ;y*sin(alpha) + z*cos(alpha) add.l d2,d2 ;2^15 swap d2 RotateY: tst AngleYSpeed beq RotateZ move Beta(pc),d6 ;Angle move (a3,d6.w*2),d5 ;Sinus move (a3,d6.w*2,180),d6 ;Cosinus move d0,d3 ;x retten move d2,d4 ;z retten muls d6,d0 ;x*cos(beta) muls d5,d4 ;z*sin(beta) add.l d4,d0 ;x*cos(beta) + z*sin(beta) add.l d0,d0 ;2^15 swap d0 muls d5,d3 ;-x*sin(beta) muls d6,d2 ;z*cos(beta) sub.l d3,d2 ;-x*sin(beta) + z*cos(beta) (because -x = neg) ;z*cos(beta)-x*sin(beta) add.l d2,d2 ;2^15 swap d2 RotateZ: tst AngleZSpeed beq CentralProj move Gamma(pc),d6 ;Angle move (a3,d6.w*2),d5 ;Sinus move (a3,d6.w*2,180),d6 ;Cosinus move d0,d3 ;x retten move d1,d4 ;y retten muls d6,d0 ;x*cos(gamma) muls d5,d4 ;y*sin(gamma) sub.l d4,d0 ;x*cos(gamma) - y*sin(gamma) add.l d0,d0 ;2^15 swap d0 muls d5,d3 ;x*sin(gamma) muls d6,d1 ;y*cos(gamma) add.l d3,d1 ;x*sin(gamma) + y*cos(gamma) add.l d1,d1 ;2^15 swap d1 CentralProj: add xpos(pc),d0 add ypos(pc),d1 sub zpos(pc),d2 ;* Qx,Qy and Qz are the angles additions ;* ;* Zx,Zy and Zz are the central projection values (constant charged) ;* Px = Zx-Zz*Qx-Zx/Qz-Zz move d0,d4 sub x(pc),d4 ;Qx-Zx move d2,d5 sub z(pc),d5 ;Qz-Zz muls z(pc),d4 ;Zz*Qx-Zx divs d5,d4 ;Zz*Qx-Zx/Qz-Zz sub x(pc),d4 ;Zx-Zz*Qx-Zx/Qz-Zz ;* Py = Zy-Zz*Qy-Zy/Qz-Zz move d1,d6 sub y(pc),d6 ;Qy-Zy muls z(pc),d6 ;Zz*Qy-Zy divs d5,d6 ;Zz*Qy-Zy/Qz-Zz sub y(pc),d6 ;Zy-Zz*Qy-Zy/Qz-Zz add #320/2,d4 add #256/2,d6 move d4,(a2)+ move d6,(a2)+ dbf d7,Rot lea.l connect(pc),a3 ;connect lea xy(pc),a2 ;xy move (a3)+,d7 subq #1,d7 PolySort: move (a3)+,d6 subq #1,d6 move (a3)+,color move (a3),d4 movem (a2,d4.w),d0/d1 ;x1,y1 move 2(a3),d4 movem (a2,d4.w),d2/d3 ;x2,y2 move 4(a3),d4 movem (a2,d4.w),d4/d5 ;x3,y3 sub d1,d5 sub d0,d2 sub d0,d4 sub d1,d3 muls d2,d5 muls d3,d4 sub.l d4,d5 bmi nopoly DrawLines: move (a3)+,d4 movem (a2,d4.w),d0/d1 move (a3),d4 movem (a2,d4.w),d2/d3 move.l PlaneBufferWork(pc),a0 moveq #40,d4 move color(pc),d5 movem d0-d3,-(sp) btst #0,d5 beq SkipPoly movem.l d4-d7/a0-a6,-(a7) bsr CPULine movem.l (a7)+,d4-d7/a0-a6 SkipPoly: movem (sp)+,d0-d3 btst #1,d5 beq SkipP2 add.l #40,a0 movem.l d4-d7/a0-a6,-(a7) bsr CPULine movem.l (a7)+,d4-d7/a0-a6 SkipP2: dbra d6,DrawLines bra NextStep NoPoly: addq.l #8,a3 add d6,d6 add (a3,d6.w),d6 NextStep: addq.l #2,a3 dbra d7,PolySort rts color: dc.w 0 ZAdd: dc.w -60 ;Z World XY: blk.w 80*2 xpos: dc.w 0 ypos: dc.w 0 zpos: dc.w -1050 ;Viewer x: dc.w 0 y: dc.w 0 z: dc.w 600 ;***************************************************** ;** Angels addition ;***************************************************** Angels: move AngleXSpeed(pc),d0 add d0,Alpha cmp #360,Alpha ;360 grad blt NextYAngle ;branching if smaller move #0,Alpha ;if zero then 360 = 0 NextYAngle: move AngleYSpeed(pc),d0 add d0,Beta cmp #360,Beta blt NextZAngle move #0,Beta NextZAngle: move AngleZSpeed(pc),d0 add d0,Gamma cmp #360,Gamma blt NextAngleEnd move #0,Gamma NextAngleEnd: rts AngleXSpeed: dc.w 2 AngleYSpeed: dc.w 2 AngleZSpeed: dc.w 2 Alpha: dc.w 0 Beta: dc.w 0 Gamma: dc.w 0 ** x,y,z Vektor: dc.w -50,50,-50 dc.w -50,-50,-50 dc.w 50,-50,-50 dc.w 50,50,-50 dc.w -50,50,50 dc.w -50,-50,50 dc.w 50,-50,50 dc.w 50,50,50 connect: dc.w 6 dc.w 4, 1,0*4,1*4,2*4,3*4,0*4 dc.w 4, 1,4*4,7*4,6*4,5*4,4*4 dc.w 4, 2,0*4,4*4,5*4,1*4,0*4 dc.w 4, 2,3*4,2*4,6*4,7*4,3*4 dc.w 4, 3,1*4,5*4,6*4,2*4,1*4 dc.w 4, 3,0*4,3*4,7*4,4*4,0*4 ;***************************************************** ;** Mul tab for the CPU-Line routine ;***************************************************** InitMulTab: move #255,d7 lea MulTab(pc),a0 moveq #0,d0 loop1: move d0,(a0)+ add #80,d0 dbf d7,loop1 rts ;***************************************************** ;** CPU-Draw-Line routine with special Fill BIT ;***************************************************** ;--------------------------------------------------------------- ; d0 = x1 ; d1 = y1 ; d2 = x2 ; d3 = y2 ; a0 = PlanePointer CPULine: lea multab(pc),a1 cmp d1,d3 bgt pl_ord beq pl_out exg.l d0,d2 exg.l d1,d3 pl_ord: move d2,d4 move d3,d5 sub d1,d5 sub d0,d4 bge pl_o78 pl_o56: move #-1,a3 ;x-symmetry neg d4 move d4,d2 add d0,d2 cmp d4,d5 bgt pl_d67 bra pl_d58 pl_o78: move #1,a3 cmp d4,d5 bgt pl_d67 pl_d58: moveq.l #0,d4 move d0,d6 move d1,d7 move d2,a2 move d6,d0 move d7,d1 sub d0,a2 ;dx sub d1,d3 ;dy sub a2,d4 asr #1,d4 ;error=-dy/2 move a2,d5 subq #1,d5 pl_l58: move d6,d0 move d7,d1 add d3,d4 ;error=error+dy blt pl_t58 ;add d1,d1 ;--- pset ---------- move (a1,d1.w*2),d1 ;mulu #40,d1 move d0,d2 asr #3,d2 add d2,d1 not d0 bchg d0,(a0,d1) ;------------------- addq #1,d7 sub a2,d4 ;error=error-dx pl_t58: add a3,d6 ;a3:={-1,1} dbf d5,pl_l58 rts pl_d67: moveq.l #0,d4 move d0,d6 move d1,d7 move d2,a2 sub d0,a2 ;dx sub d1,d3 ;dy sub a2,d4 asr #1,d4 ;error=-dx/2 move d3,d5 subq #1,d5 pl_l67: move d6,d0 move d7,d1 ;add d1,d1 ;--- pset --------- move (a1,d1.w*2),d1 ;mulu #40,d1 move d0,d2 asr #3,d2 add d2,d1 not d0 bchg d0,(a0,d1) ;------------------ add a2,d4 ;error=error+dx blt pl_t67 add a3,d6 ;a3:={-1,1} sub d3,d4 ;error=error-dy pl_t67: addq #1,d7 dbf d5,pl_l67 pl_out: rts MulTab: blk.w 256,0 ;***************************************************** ;** CPU-Fill Routine ;***************************************************** CPUFill: move.l planebufferwork(pc),a0 moveq.l #0,d0 moveq.l #0,d1 lea intab(pc),a2 lea dbtab(pc),a3 move.w #255*2,d7 pr_fyl: lea fntab(pc),a1 move.l a1,4(a3) lea fitab(pc),a1 move.l a1,(a3) move.w #40-1,d6 ;width pr_fxl: ;--- tst.b d1 ;speed up by ignoring 0 bytes beq pr_zer pr_set: tst.b (a0) bne pr_lin move.b #$ff,(a0)+ dbf d6,pr_set bra pr_ny pr_zer: tst.b (a0) bne pr_lin adda.l #1,a0 dbf d6,pr_zer bra pr_ny pr_lin: ;--- ; move.w #$1ff,d5 ;llo: nop ; dbf d5,llo move.b (a0),d0 move.b (a1,d0.l),(a0)+ tst.b (a2,d0.l) beq pr_nic ;no insert change move.l 0(a3),d2 ;change filltables move.l 4(a3),0(a3) move.l d2,4(a3) move.l 0(a3),a1 ;->a1 not.b d1 ;change insert in d1 pr_nic: dbf d6,pr_fxl pr_ny: dbf d7,pr_fyl rts ;--- creates three tables later used for filling --- InitCPUFillTab: moveq.l #0,d0 ;init routine moveq.l #0,d3 lea fitab(pc),a0 ;fill table lea intab(pc),a1 ;insert table move #255,d4 ;256 different bytes to fill pr_bl: move d3,d0 bsr pr_bfi move.b d0,(a0)+ move.b d1,(a1)+ addq #1,d3 dbf d4,pr_bl lea fitab,a0 lea fntab,a1 move #255,d1 pr_inv: move.b (a0)+,d0 not.b d0 move.b d0,(a1)+ dbf d1,pr_inv rts pr_bfi: moveq.l #0,d1 ;fill one byte (d0) as the blitter moveq #7,d2 ;test all eight bits pr_bfl: btst d2,d0 bne pr_ich tst.b d1 beq pr_nb bset d2,d0 pr_nb: dbf d2,pr_bfl rts pr_ich: tst.b d1 bne pr_nof bclr d2,d0 pr_nof: not.b d1 bra pr_nb dbtab: dc.l fitab,fntab ;change table fitab: blk.b 256,0 ;fill table insert=0 fntab: blk.b 256,0 ;fill table insert=1 intab: blk.b 256,0 ;insert change ($ff) CopperList: dc.l $01200000 dc.l $01220000 dc.l $01fc0000 dc.l $01020000 dc.l $01040000 dc.l $01060000 dc.l $01080028 dc.l $010a0028 CopperPlanes: dc.l $00e00000 dc.l $00e20000 dc.l $00e40000 dc.l $00e60000 dc.l $01002200 dc.l $008e2981 dc.l $009029d1 dc.l $00920038 dc.l $009400d0 dc.l $01800000 dc.l $0182000f dc.l $0184000a dc.l $01860006 dc.l -2 ; 2^15 = $8000 Sinus: blk.w 452,0 PlaneBufferShow: dc.l Buffer1 PlaneBufferWork: dc.l Buffer2 PlaneBufferClear: dc.l Buffer3 Buffer1: blk.b 320*256/8*2,0 Buffer2: blk.b 320*256/8*2,0 Buffer3: blk.b 320*256/8*2,0 END so long.... victim |
||
12 February 2010, 02:13 | #10 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Try changing your line drawing routine so it plots 1 pixel per column instead of per row, then you can fill vertically and do 32 pixels in one go, f.ex:
Code:
lea Bitplane, a0 moveq #40, d2 ; Screen width in bytes moveq #10-1, d3 ; Screen width in 32-bit longwords .xloop moveq #0, d1 ; Set fill carry to 0 move.w #256-1, d0 ; 256 rows .yloop eor.l (a0), d1 ; Fill move.l d1, (a0) add.w d2, a0 ; Step to next row dbf d0, .yloop lea -40*256+4(a0), a0 ; Step to top of next column dbf d3, .xloop |
12 February 2010, 09:44 | #11 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Hmm, not sure if longword eor filler will work correctly in bitplane mode. I am sure that Leffmann's code won't assemble though (eor.l (a0),d1 = you wish ). :P
Anyway, why use such brute force method when you can do it in a different way (that's how I would do it anyway): - Have a table of of 256*2 words which holds xstart/xend coords, initialize this table with "invalid" coords (e.g. moveq #-1,dx move.l dx,(ax)+ ) - instead of drawing pixels in your line draw routine you save x coords, you need to check if a coord has already been written (that's why you need to initialize the table), if so, store the value as xend, otherwise it's xstart - also save ymin/ymax so you don't have the read the whole buffer later - code a simple horizontal line drawer which reads the coords from the table and draws a line from x1 to x2, obviously you want a "write 32 pixels at once" routine |
12 February 2010, 12:07 | #12 | |
Registered User
Join Date: Mar 2009
Location: N/A
Posts: 23
|
Quote:
A very good example of perhaps one of the best CPU Vector routines is the demo ARTE from SANITY. This meant the end part of the demo. [ Show youtube player ] or http://www.pouet.net/prod.php?which=1477 The higher the CPU, the faster and more fluid running the routines in the demo. Although only half a screen, it has been used here, but by the large number of objects, it is still very fast. I will try to implement your suggestions soon. so long... Victim |
|
12 February 2010, 12:36 | #13 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
In chunky mode I would just use my chunky triangle filler you can find in my first post. Also, vectors look the same in chunky and bitplane mode, just the way you draw them is different, nothing else. I don't see how they look "rough" in chunky mode, care to elaborate? Last edited by StingRay; 12 February 2010 at 12:46. |
|
12 February 2010, 13:01 | #14 | |
Registered User
Join Date: Mar 2009
Location: N/A
Posts: 23
|
Quote:
Now your routine did i understand, to first the data is normal calculated and did not do until later with a converter "chunky to planar" to translate and then drawing into the screen. so long.... Victim |
|
12 February 2010, 13:05 | #15 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
|
|
12 February 2010, 13:09 | #16 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Ops, should've checked the code in an assembler first. The 68K is convenient but it's not as orthogonal as you'd wish is it
XOR filling is well tested and is exactly what the blitter does, except we do 32 bits in parallel and this is why it works like it should in bitplane mode. Here's the corrected algorithm: Code:
lea Bitplane, a0 moveq #40, d2 ; Screen width in bytes moveq #10-1, d3 ; Screen width in 32-bit longwords .xloop moveq #0, d1 ; Set fill carry to 0 move.w #256-1, d0 ; 256 rows .yloop move.l (a0), d4 eor.l d4, d1 ; Fill move.l d1, (a0) add.w d2, a0 ; Step to next row dbf d0, .yloop lea -40*256+4(a0), a0 ; Step to top of next column dbf d3, .xloop |
12 February 2010, 13:13 | #17 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
I know how eor fillers work (yes, I once did C64 coding too ;D), just wasn't sure if it would work for a whole longword. Anyway, I still think my approach is faster as it'll only draw the really needed pixels without having to go through the whole scanline. And it's more flexible too since you can use it to do pencil vectors f.e.
|
12 February 2010, 14:32 | #18 | |
Registered User
Join Date: Mar 2009
Location: N/A
Posts: 23
|
Quote:
I think I will try another experiment in the scanline algorithm Code:
CPUFill: movea.l PlaneBufferWork(pc),a0 ;Bit Plane pointer move #255,d6 ; 256 rows movea.l a0,a1 FillVert: moveq.l #0,d4 moveq #9,d5 FillHori: tst.l (a0) bne Fit move.l d4,(a1)+ lea 4(a0),a0 dbf d5,FillHori dbf d6,FillVert rts Fit: move.l (a0),d0 move.l d0,d1 subq.l #1,d0 move.l d0,d2 and.l d1,d0 bne B etw FBit: not.l d4 beq Left Right: move.l (a0)+,d0 subq.l #1,d0 move.l d0,(a1)+ dbf d5,FillHori dbf d6,FillVert rts Left: move.l (a0)+,d0 subq.l #1,d0 not.l d0 move.l d0,(a1)+ dbf d5,FillHori dbf d6,FillVert rts Betw: lea 4(a0),a0 subq.l #1,d0 ;or.l d1,d0 eor.l d2,d0 ;exclusiv Fill move.l d0,(a1)+ dbf d5,FillHori dbf d6,FillVert rts victim Last edited by victim; 12 February 2010 at 14:37. |
|
26 January 2014, 02:15 | #19 | |
Registered User
Join Date: Dec 2013
Location: Fredrikstad/Norway
Age: 46
Posts: 17
|
Quote:
Code:
***************************************************************************** ** CPU FILL: ***************************************************************************** ** After the "dbne loop" i have to sustract one from d6 because when the ** "dbne" instruction doesn't branch, it doesn't decrement. I'm using a ** dbf to take care of that. ** ** Made by Lekman/Hemoraiders 1997 ***************************************************************************** CPU_FILL: lea DrawBuffer(pc),a0 lea Screen(pc),a1 move.l #ScreenSize*2,d2 add.l d2,a0 ; *** add.l d2,a1 ; *** move.w #(ScrHeight*2)-1,d7 .FillLoop lea (a0),a2 ; Source lea (a1),a3 ; Destination moveq #(ScrWidth/4)-1,d6 ; Screenwidth in longwords moveq #0,d5 .FindPoint move.l d5,-(a3) ; Clear/Fill move.l -(a2),d0 dbne d6,.FindPoint ;loops if d0=$00000000 beq.s .NextLine ;line finished? move.l d5,d2 bmi.s .FEndBit ; Find End Bit .FStartBit move.l d0,d1 subq.l #1,d1 and.l d1,d0 ; Mask out first bit eor.l d0,d1 ; Mask out other bits not.l d1 or.l d1,d2 ; Mask tst.l d0 beq.s .Filling ;only one bit in longword? .FEndBit move.l d0,d1 subq.l #1,d1 and.l d1,d0 ; Mask out first bit eor.l d0,d1 ; Mask out other bits and.l d1,d2 ; Mask tst.l d0 bne.s .FStartBit moveq #0,d5 ; Clearing bra.s .StoreLongWord .Filling moveq #-1,d5 ; Filling .StoreLongWord move.l d2,(a3) dbf d6,.FindPoint ; Fill all long-words .NextLine lea -ScrWidth(a0),a0 lea -ScrWidth(a1),a1 dbf d7,.FillLoop rts Code:
***************************************************************************** ** CPU FILL: ***************************************************************************** lea DrawBuffer+ScreenSize+ScrWidth,a0 move.l ScreenP(pc),a1 lea ScreenSize(a1),a1 move.w (a2)+,d0 ; CPUFill_Pos add.w d0,a0 add.w d0,a1 movem.w (a2),d3/d4/d7 ; CPUFill_Mod/Size .FillLoop move.w d4,d6 ; Screenwidth in longwords moveq #0,d5 .FindPoint move.l d5,-(a1) ; Clear move.l -(a0),d0 dbne d6,.FindPoint beq.s .NextLine move.l d5,d2 bmi.s .FEndBit ; Find End Bit .FStartBit move.l d0,d1 subq.l #1,d1 and.l d1,d0 ; Mask out first bit eor.l d0,d1 ; Mask out other bits not.l d1 or.l d1,d2 ; Mask tst.l d0 beq.s .Filling .FEndBit move.l d0,d1 subq.l #1,d1 and.l d1,d0 ; Mask out first bit eor.l d0,d1 ; Mask out other bits and.l d1,d2 ; Mask tst.l d0 bne.s .FStartBit moveq #0,d5 ; Clearing bra.s .StoreLongWord .Filling moveq #-1,d5 ; Filling .StoreLongWord move.l d2,(a1) dbf d6,.FindPoint .NextLine sub.w d3,a0 sub.w d3,a1 dbf d7,.FillLoop .NoFill rts |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Linedraw blitter vs. CPU on 68000 | pmc | Coders. Asm / Hardware | 17 | 29 February 2012 15:02 |
Blitter filling routine used in games | Codetapper | Coders. General | 2 | 26 January 2012 10:20 |
Filling with the blitter... | Lonewolf10 | Coders. Tutorials | 7 | 13 September 2011 14:30 |
Blitter fighting the CPU | h0ffman | Coders. General | 5 | 05 April 2011 13:18 |
|
|