27 July 2021, 22:17 | #1 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
Fastest way to multiply by 200?
Hi all,
I'm not too familiar with exact cycle count timings but I'm looking for a fast way to multiply a number by 200, return value can be within 64Kb. At the moment I have. Code:
lea .mulu200(pc),a3 add.w d2,d2 move.w (a3,d2.w),d2 . . . .mulu200: rept 256 dc.w REPTN*200 endr Any faster ways you guys can think of? Thanks, Graeme |
27 July 2021, 22:46 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,053
|
Assuming 68000... mulu.w #200,dx is: 200=c8=11001000, 38+4+6=48 cycles.
If you break it down into shifts and adds it should be around 40 cycles. Your table approach is 12+4+14=30 cycles worst case (meaning you execute all 3 instructions each time). Can be made 28 if you can 64kb align the table (move.l dx,ax + move.w (ax),dx is 12 cycles vs. 14 cycles). |
27 July 2021, 22:58 | #3 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
|
Table version will be the fastest for 68000. I will be use only, something like this:
Code:
add.w d2,d2 move.w .mulu200(PC,D2.W),D2 |
27 July 2021, 23:12 | #4 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
|
28 July 2021, 00:39 | #5 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
|
The best is placing this routine at end of your code. I always placed sample mixing routine at end of my players. PC table must start in 126 bytes range. Or you can use one register as base. I dont use register as table base because this is wasting register for critical routine. Exactly in my mixing routine i used PC for handling 2 tables, not only 1 table.
|
28 July 2021, 07:46 | #6 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 681
|
Quote:
used cycles: 4 + 14 = 18 the lea approach will work within a +-32k range used cycles 8 + 4 + 14 = 26 If you end up putting that code into a subroutine... ... shift/add is faster (40 cycles), since bsr/rts eats the advantage up. and the shift/add code can be inlined everywhere - ok, you need a scratch register... ... so if that is not available or size matters: use the mul ^^ Last edited by bebbo; 28 July 2021 at 10:05. |
|
28 July 2021, 08:41 | #7 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
For reference, shift and add version :
Code:
move.w d0,d1 add.w d0,d0 add.w d0,d1 lsl.w #2,d0 lsl.w #6,d1 add.w d1,d0 |
28 July 2021, 10:06 | #8 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 681
|
Quote:
that's 44 cycles faster is Code:
lsl.w #3,d2 ; *8 6+6 move.w d2,d3 4 lsl.w #3,d3 ; *64 6+6 add.w d3,d2 ; *72 4 add.w d3,d2 ; *136 4 add.w d3,d2 ; *200 4 |
|
28 July 2021, 11:17 | #9 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
|
For some code (i dont see your code) is possible to change without speed penalty input D2 value from 0,1,2,3,4,5... to 0,2,4,6,8,10 etc then add.w d2,d2 can be removed, and only move.w from table will be used.
|
28 July 2021, 17:47 | #10 |
Registered User
Join Date: Aug 2018
Location: Untergrund/Germany
Posts: 410
|
If you run the code while the blitter is active in parallel or a lot of bitplane DMA is going on then it may be fastest to simply do the mul (on 68k without fastmem).
|
30 July 2021, 02:41 | #11 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Code:
move.w d2,d3 4 lsl.w #3,d3 ; *8 6+6 add.w d3,d2 ; *9 4 add.w d3,d3 ; *16 4 add.w d3,d2 ; *25 4 lsl.w #3,d2 ; *200 6+6 |
|
30 July 2021, 11:18 | #12 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
OK thanks for the replies guys...
I did in the end settle on this - it's basically a blitter routine which reconstructs bob parts (for sprites). The idea is to save memory and cycles, so for example instead of blitting a 64x80 bob (player character) which may contain a lot of blank space I'm only blitting those that have pixels in them. Screen size is 320/256x5 bitplanes... hence the 200 bytes width. Bob's are interleaved. I haven't optimised the code segments yet - normally I have a6 pointing into a data segment but will do that soon to speed it up. Code:
; d0 = Frame Number ; d1 = xpos ; d2 = ypos ; a0 = Sprite Sheet struct ; a1 = Screen struct ; a2 = Restore pointer a3 agdPlotFrame: movem.l d0-d2/a0-a1,-(a7) ; Get the frame structure. lea SPRITES_FRAMES,a0 add.w d0,d0 add.w d0,d0 move.l (a0,d0),a0 move.l hScreenPointers(a1),a1 ; This would be the point for entry to the routine. add.w 2(a0),d1 ; add x offset for this frame add.w 4(a0),d2 ; add y offset for this frame lea .mulu200(pc),a3 add.w d2,d2 move.w (a3,d2.w),d2 move.w d1,d4 ; Make a copy of the xpos to d4 and.w #$fff0,d1 ; Get the Xposition nearest word position lsr.w #3,d1 ; d1 now has nearest word add.l d1,d2 ; d2=Byte position ror.w #4,d4 ; Barrel Shift amount for BLTCON0 Source Mask clr.b d4 move.w d4,d5 or.w #$fca,d4 ; We want Source A,B & C with D = $F, and Cookie Cut $CA = $FCA move.l d2,d3 ; save plot offset move.w d4,d6 ; save bltcon0 value move.w d5,d7 ; save bltcon1 value move.l #16,a3 .part: move.l d3,d2 ; restore offset move.w d6,d4 ; restore bltcon0 move.w d7,d5 ; restore bltcon1 add.l a3,a0 ; advance 16 bytes move.w (a0)+,d0 ; bltsize move.w (a0),d1 ; mod swap d1 move.w (a0)+,d1 WAIT_FOR_BLITTER move.l #$ffff0000,BLTAFWM(a5) move.l d1,BLTAMOD(a5) move.l d1,BLTCMOD(a5) move.l (a0)+,BLTBPTH(a5) ; bob move.l (a0)+,BLTAPTH(a5) ; mask add.w (a0)+,d2 ; plot offset y+x word add.w (a0),d4 ; barrel adjust add.w (a0)+,d5 ; barrel adjust bcc.s .ovf addq.w #2,d2 ; overflow to next word .ovf: move.w d0,(a2)+ ; Save Blit size move.w d1,(a2)+ ; Save Modulo move.w d2,(a2)+ ; Save offset add.l a1,d2 move.w d4,BLTCON0(a5) move.w d5,BLTCON1(a5) move.l d2,BLTCPTH(a5) move.l d2,BLTDPTH(a5) move.w d0,BLTSIZE(a5) tst.w (a0) ; Terminate? bpl.s .part .exit: movem.l (a7)+,d0-d2/a0-a1 rts .mulu200: rept 256 dc.w REPTN*200 endr Code:
SPRITES_FRAMES: dc.l .gripper_fall_left_frame1 dc.l .gripper_fall_left_frame2 dc.l .gripper_fall_left_frame3 dc.l .gripper_fall_left_frame4 dc.l .gripper_fall_left_frame5 dc.l .gripper_fall_left_frame5 dc.l -1 .gripper_fall_left_frame1: ; Part 1 dc.w 0 ; 0 sprite lock start dc.w 0 ; 2 x src offset dc.w 0 ; 4 y src offset dc.w 1 ; 6 DER x spr size (words) dc.w 40 ; 8 DER y spr size dc.w 0 ; 10 DER x dst offset dc.w 0 ; 12 DER y dst offset dc.w 0 ; 14 ds.b 16 ; Compiled blitter values here ; Part 2 dc.w 60 ; sprite lock start dc.w $DEAD ; x src offset dc.w $BEEF ; y src offset dc.w 1 ; x spr size (words) dc.w 19 ; y spr size dc.w 7 ; x dst offset dc.w 40 ; y dst offset (40 pixels down) dc.w 0 ds.b 16 ; Compiled blitter values here dc.l -1 .gripper_fall_left_frame2: |
30 July 2021, 11:41 | #13 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 681
|
Quote:
refering to Don Adams comment: you are counting y like 0, 1, 2 can't you count y as 0, 200, 400, ... ? then would be no need for * 200 |
|
30 July 2021, 11:57 | #14 | |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
Quote:
|
|
30 July 2021, 20:26 | #15 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 852
|
|
30 July 2021, 22:06 | #16 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
|
Or you can try this table version, perhaps can be ok for PC range, but i dont know size of wait for blitter routine.
Code:
; d0 = Frame Number ; d1 = xpos ; d2 = ypos ; a0 = Sprite Sheet struct ; a1 = Screen struct ; a2 = Restore pointer a3 agdPlotFrame: movem.l d0-d2/a0-a1,-(a7) ; Get the frame structure. lea SPRITES_FRAMES,a0 add.w d0,d0 add.w d0,d0 move.l (a0,d0),a0 move.l hScreenPointers(a1),a1 ; This would be the point for entry to the routine. add.w 2(a0),d1 ; add x offset for this frame add.w 4(a0),d2 ; add y offset for this frame ; lea .mulu200(pc),a3 ; add.w d2,d2 ; move.w (a3,d2.w),d2 lea 16.W,A3 move.w d1,d4 ; Make a copy of the xpos to d4 and.w #$fff0,d1 ; Get the Xposition nearest word position lsr.w #3,d1 ; d1 now has nearest word ; add.l d1,d2 ; d2=Byte position why add longword not word? ror.w #4,d4 ; Barrel Shift amount for BLTCON0 Source Mask clr.b d4 move.w d4,d5 or.w #$fca,d4 ; We want Source A,B & C with D = $F, and Cookie Cut $CA = $FCA ; move.l d2,d3 ; save plot offset move.w d4,d6 ; save bltcon0 value move.w d5,d7 ; save bltcon1 value add.w d2,d2 move.w .mulu200(PC,D2.W),D3 add.w D1,D3 ; move.l #16,a3 .part: move.l d3,d2 ; restore offset ; 88 bytes move.w d6,d4 ; restore bltcon0 ; 86 bytes move.w d7,d5 ; restore bltcon1 ; 84 bytes add.l a3,a0 ; advance 16 bytes ; 82 bytes move.w (a0)+,d0 ; bltsize ; 80 bytes move.w (a0),d1 ; mod ; 78 bytes swap d1 ; 76 bytes move.w (a0)+,d1 ; 74 bytes WAIT_FOR_BLITTER ; unknown size move.l #$ffff0000,BLTAFWM(a5) ; 72 bytes move.l d1,BLTAMOD(a5) ; 64 bytes move.l d1,BLTCMOD(a5) ; 60 bytes move.l (a0)+,BLTBPTH(a5) ; bob ; 56 bytes move.l (a0)+,BLTAPTH(a5) ; mask ; 52 bytes add.w (a0)+,d2 ; plot offset y+x word ; 48 bytes add.w (a0),d4 ; barrel adjust ; 46 bytes add.w (a0)+,d5 ; barrel adjust ; 44 bytes bcc.s .ovf ; 42 bytes addq.w #2,d2 ; overflow to next word ; 40 bytes .ovf: move.w d0,(a2)+ ; Save Blit size ; 38 bytes move.w d1,(a2)+ ; Save Modulo ; 36 bytes move.w d2,(a2)+ ; Save offset ; 34 bytes add.l a1,d2 ; 32 bytes move.w d4,BLTCON0(a5) ; 30 bytes move.w d5,BLTCON1(a5) ; 26 bytes move.l d2,BLTCPTH(a5) ; 22 bytes move.l d2,BLTDPTH(a5) ; 18 bytes move.w d0,BLTSIZE(a5) ; 14 bytes tst.w (a0) ; Terminate? ; 10 bytes bpl.s .part ; 8 bytes .exit: movem.l (a7)+,d0-d2/a0-a1 ; 6 bytes rts ; 2 bytes .mulu200: rept 256 dc.w REPTN*200 endr |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fast multiply / divide by 64? | mcgeezer | Coders. Asm / Hardware | 10 | 06 April 2018 19:29 |
CD-200 crashes with SX-1 | th4t1guy | support.Games | 2 | 26 June 2015 16:41 |
200 % fps | turrican3 | request.UAE Wishlist | 13 | 30 July 2008 18:34 |
64 bit signed multiply | cdoty | Coders. General | 2 | 16 December 2007 12:24 |
Moonstone for almost $200, are they serious? | Pyromania | Retrogaming General Discussion | 29 | 13 November 2003 22:28 |
|
|