English Amiga Board Fastest way to multiply by 200?
 Register Amiga FAQ Rules & Help Members List  /  Moderators List Today's Posts Mark Forums Read

 27 July 2021, 23:17 #1 mcgeezer Registered User   Join Date: Oct 2017 Location: Sunderland, England Posts: 2,507 Fastest way to multiply by 200? Hi all, I'm not too familiar with exact cycle count timings but I'm looking for a fast way to multiply a number by 200, return value can be within 64Kb. At the moment I have. Code: ``` lea .mulu200(pc),a3 add.w d2,d2 move.w (a3,d2.w),d2 . . . .mulu200: rept 256 dc.w REPTN*200 endr``` I'm assuming this will be faster than a mulu #200,d2 which I recall takes something like 70 cycles? Any faster ways you guys can think of? Thanks, Graeme
 27 July 2021, 23:46 #2 a/b Registered User   Join Date: Jun 2016 Location: europe Posts: 536 Assuming 68000... mulu.w #200,dx is: 200=c8=11001000, 38+4+6=48 cycles. If you break it down into shifts and adds it should be around 40 cycles. Your table approach is 12+4+14=30 cycles worst case (meaning you execute all 3 instructions each time). Can be made 28 if you can 64kb align the table (move.l dx,ax + move.w (ax),dx is 12 cycles vs. 14 cycles).
 27 July 2021, 23:58 #3 Don_Adan Registered User   Join Date: Jan 2008 Location: Warsaw/Poland Age: 53 Posts: 1,424 Table version will be the fastest for 68000. I will be use only, something like this: Code: ``` add.w d2,d2 move.w .mulu200(PC,D2.W),D2```
28 July 2021, 00:12   #4
mcgeezer
Registered User

Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,507
Quote:
 Originally Posted by Don_Adan Table version will be the fastest for 68000. I will be use only, something like this: Code: ``` add.w d2,d2 move.w .mulu200(PC,D2.W),D2```
Thanks - indeed this is 68000

That code won't assemble as I'm getting an out of range displacement.

 28 July 2021, 01:39 #5 Don_Adan Registered User   Join Date: Jan 2008 Location: Warsaw/Poland Age: 53 Posts: 1,424 The best is placing this routine at end of your code. I always placed sample mixing routine at end of my players. PC table must start in 126 bytes range. Or you can use one register as base. I dont use register as table base because this is wasting register for critical routine. Exactly in my mixing routine i used PC for handling 2 tables, not only 1 table.
28 July 2021, 08:46   #6
bebbo
botcher

Join Date: Jun 2016
Location: Hamburg/Germany
Posts: 565
Quote:
 Originally Posted by mcgeezer Thanks - indeed this is 68000 That code won't assemble as I'm getting an out of range displacement.
the displacement must fit into 1 byte. Thus the code needs to be very close to the table.
used cycles: 4 + 14 = 18

the lea approach will work within a +-32k range
used cycles 8 + 4 + 14 = 26

If you end up putting that code into a subroutine...
... shift/add is faster (40 cycles), since bsr/rts eats the advantage up.
and the shift/add code can be inlined everywhere - ok, you need a scratch register...

... so if that is not available or size matters: use the mul ^^

Last edited by bebbo; 28 July 2021 at 11:05.

 28 July 2021, 09:41 #7 meynaf son of 68k   Join Date: Nov 2007 Location: Lyon / France Age: 48 Posts: 4,421 For reference, shift and add version : Code: ``` move.w d0,d1 add.w d0,d0 add.w d0,d1 lsl.w #2,d0 lsl.w #6,d1 add.w d1,d0```
28 July 2021, 11:06   #8
bebbo
botcher

Join Date: Jun 2016
Location: Hamburg/Germany
Posts: 565
Quote:
 Originally Posted by meynaf For reference, shift and add version : Code: ``` move.w d0,d1 add.w d0,d0 add.w d0,d1 lsl.w #2,d0 lsl.w #6,d1 add.w d1,d0```

that's 44 cycles

faster is
Code:
```lsl.w #3,d2 ; *8     6+6
move.w d2,d3           4
lsl.w #3,d3 ; *64    6+6
40 cycles

 28 July 2021, 12:17 #9 Don_Adan Registered User   Join Date: Jan 2008 Location: Warsaw/Poland Age: 53 Posts: 1,424 For some code (i dont see your code) is possible to change without speed penalty input D2 value from 0,1,2,3,4,5... to 0,2,4,6,8,10 etc then add.w d2,d2 can be removed, and only move.w from table will be used.
 28 July 2021, 18:47 #10 pink^abyss Registered User   Join Date: Aug 2018 Location: Untergrund/Germany Posts: 321 If you run the code while the blitter is active in parallel or a lot of bitplane DMA is going on then it may be fastest to simply do the mul (on 68k without fastmem).
30 July 2021, 03:41   #11
Gorf
Registered User

Join Date: May 2017
Location: Munich/Bavaria
Posts: 1,453
Quote:
 Originally Posted by bebbo that's 44 cycles faster is Code: ```lsl.w #3,d2 ; *8 6+6 move.w d2,d3 4 lsl.w #3,d3 ; *64 6+6 add.w d3,d2 ; *72 4 add.w d3,d2 ; *136 4 add.w d3,d2 ; *200 4``` 40 cycles
or:
Code:
```move.w d2,d3           4
lsl.w #3,d3 ; *8     6+6
lsl.w #3,d2 ; *200   6+6```
same

 30 July 2021, 12:18 #12 mcgeezer Registered User   Join Date: Oct 2017 Location: Sunderland, England Posts: 2,507 OK thanks for the replies guys... I did in the end settle on this - it's basically a blitter routine which reconstructs bob parts (for sprites). The idea is to save memory and cycles, so for example instead of blitting a 64x80 bob (player character) which may contain a lot of blank space I'm only blitting those that have pixels in them. Screen size is 320/256x5 bitplanes... hence the 200 bytes width. Bob's are interleaved. I haven't optimised the code segments yet - normally I have a6 pointing into a data segment but will do that soon to speed it up. Code: ```; d0 = Frame Number ; d1 = xpos ; d2 = ypos ; a0 = Sprite Sheet struct ; a1 = Screen struct ; a2 = Restore pointer a3 agdPlotFrame: movem.l d0-d2/a0-a1,-(a7) ; Get the frame structure. lea SPRITES_FRAMES,a0 add.w d0,d0 add.w d0,d0 move.l (a0,d0),a0 move.l hScreenPointers(a1),a1 ; This would be the point for entry to the routine. add.w 2(a0),d1 ; add x offset for this frame add.w 4(a0),d2 ; add y offset for this frame lea .mulu200(pc),a3 add.w d2,d2 move.w (a3,d2.w),d2 move.w d1,d4 ; Make a copy of the xpos to d4 and.w #\$fff0,d1 ; Get the Xposition nearest word position lsr.w #3,d1 ; d1 now has nearest word add.l d1,d2 ; d2=Byte position ror.w #4,d4 ; Barrel Shift amount for BLTCON0 Source Mask clr.b d4 move.w d4,d5 or.w #\$fca,d4 ; We want Source A,B & C with D = \$F, and Cookie Cut \$CA = \$FCA move.l d2,d3 ; save plot offset move.w d4,d6 ; save bltcon0 value move.w d5,d7 ; save bltcon1 value move.l #16,a3 .part: move.l d3,d2 ; restore offset move.w d6,d4 ; restore bltcon0 move.w d7,d5 ; restore bltcon1 add.l a3,a0 ; advance 16 bytes move.w (a0)+,d0 ; bltsize move.w (a0),d1 ; mod swap d1 move.w (a0)+,d1 WAIT_FOR_BLITTER move.l #\$ffff0000,BLTAFWM(a5) move.l d1,BLTAMOD(a5) move.l d1,BLTCMOD(a5) move.l (a0)+,BLTBPTH(a5) ; bob move.l (a0)+,BLTAPTH(a5) ; mask add.w (a0)+,d2 ; plot offset y+x word add.w (a0),d4 ; barrel adjust add.w (a0)+,d5 ; barrel adjust bcc.s .ovf addq.w #2,d2 ; overflow to next word .ovf: move.w d0,(a2)+ ; Save Blit size move.w d1,(a2)+ ; Save Modulo move.w d2,(a2)+ ; Save offset add.l a1,d2 move.w d4,BLTCON0(a5) move.w d5,BLTCON1(a5) move.l d2,BLTCPTH(a5) move.l d2,BLTDPTH(a5) move.w d0,BLTSIZE(a5) tst.w (a0) ; Terminate? bpl.s .part .exit: movem.l (a7)+,d0-d2/a0-a1 rts .mulu200: rept 256 dc.w REPTN*200 endr``` Structure of a frame is like this... Code: ```SPRITES_FRAMES: dc.l .gripper_fall_left_frame1 dc.l .gripper_fall_left_frame2 dc.l .gripper_fall_left_frame3 dc.l .gripper_fall_left_frame4 dc.l .gripper_fall_left_frame5 dc.l .gripper_fall_left_frame5 dc.l -1 .gripper_fall_left_frame1: ; Part 1 dc.w 0 ; 0 sprite lock start dc.w 0 ; 2 x src offset dc.w 0 ; 4 y src offset dc.w 1 ; 6 DER x spr size (words) dc.w 40 ; 8 DER y spr size dc.w 0 ; 10 DER x dst offset dc.w 0 ; 12 DER y dst offset dc.w 0 ; 14 ds.b 16 ; Compiled blitter values here ; Part 2 dc.w 60 ; sprite lock start dc.w \$DEAD ; x src offset dc.w \$BEEF ; y src offset dc.w 1 ; x spr size (words) dc.w 19 ; y spr size dc.w 7 ; x dst offset dc.w 40 ; y dst offset (40 pixels down) dc.w 0 ds.b 16 ; Compiled blitter values here dc.l -1 .gripper_fall_left_frame2:```
30 July 2021, 12:41   #13
bebbo
botcher

Join Date: Jun 2016
Location: Hamburg/Germany
Posts: 565
Quote:
 Originally Posted by mcgeezer OK thanks for the replies guys... ... Screen size is 320/256x5 bitplanes... hence the 200 bytes width. Bob's are interleaved. ...

you are counting y like 0, 1, 2
can't you count y as 0, 200, 400, ... ?

then would be no need for * 200

30 July 2021, 12:57   #14
mcgeezer
Registered User

Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,507
Quote:
 Originally Posted by bebbo refering to Don Adams comment: you are counting y like 0, 1, 2 can't you count y as 0, 200, 400, ... ? then would be no need for * 200
Yes potentially I can do that and it's a nice idea, things might become a little tricky though when I start doing collisions. I'll keep the idea on ice for later.

30 July 2021, 21:26   #15
NorthWay
Registered User

Join Date: May 2013
Posts: 721
Quote:
 Originally Posted by mcgeezer Yes potentially I can do that and it's a nice idea, things might become a little tricky though when I start doing collisions. I'll keep the idea on ice for later.
Now you make it sound like you should do both. Unless that has more overhead than the 40 cycles?

 30 July 2021, 23:06 #16 Don_Adan Registered User   Join Date: Jan 2008 Location: Warsaw/Poland Age: 53 Posts: 1,424 Or you can try this table version, perhaps can be ok for PC range, but i dont know size of wait for blitter routine. Code: ```; d0 = Frame Number ; d1 = xpos ; d2 = ypos ; a0 = Sprite Sheet struct ; a1 = Screen struct ; a2 = Restore pointer a3 agdPlotFrame: movem.l d0-d2/a0-a1,-(a7) ; Get the frame structure. lea SPRITES_FRAMES,a0 add.w d0,d0 add.w d0,d0 move.l (a0,d0),a0 move.l hScreenPointers(a1),a1 ; This would be the point for entry to the routine. add.w 2(a0),d1 ; add x offset for this frame add.w 4(a0),d2 ; add y offset for this frame ; lea .mulu200(pc),a3 ; add.w d2,d2 ; move.w (a3,d2.w),d2 lea 16.W,A3 move.w d1,d4 ; Make a copy of the xpos to d4 and.w #\$fff0,d1 ; Get the Xposition nearest word position lsr.w #3,d1 ; d1 now has nearest word ; add.l d1,d2 ; d2=Byte position why add longword not word? ror.w #4,d4 ; Barrel Shift amount for BLTCON0 Source Mask clr.b d4 move.w d4,d5 or.w #\$fca,d4 ; We want Source A,B & C with D = \$F, and Cookie Cut \$CA = \$FCA ; move.l d2,d3 ; save plot offset move.w d4,d6 ; save bltcon0 value move.w d5,d7 ; save bltcon1 value add.w d2,d2 move.w .mulu200(PC,D2.W),D3 add.w D1,D3 ; move.l #16,a3 .part: move.l d3,d2 ; restore offset ; 88 bytes move.w d6,d4 ; restore bltcon0 ; 86 bytes move.w d7,d5 ; restore bltcon1 ; 84 bytes add.l a3,a0 ; advance 16 bytes ; 82 bytes move.w (a0)+,d0 ; bltsize ; 80 bytes move.w (a0),d1 ; mod ; 78 bytes swap d1 ; 76 bytes move.w (a0)+,d1 ; 74 bytes WAIT_FOR_BLITTER ; unknown size move.l #\$ffff0000,BLTAFWM(a5) ; 72 bytes move.l d1,BLTAMOD(a5) ; 64 bytes move.l d1,BLTCMOD(a5) ; 60 bytes move.l (a0)+,BLTBPTH(a5) ; bob ; 56 bytes move.l (a0)+,BLTAPTH(a5) ; mask ; 52 bytes add.w (a0)+,d2 ; plot offset y+x word ; 48 bytes add.w (a0),d4 ; barrel adjust ; 46 bytes add.w (a0)+,d5 ; barrel adjust ; 44 bytes bcc.s .ovf ; 42 bytes addq.w #2,d2 ; overflow to next word ; 40 bytes .ovf: move.w d0,(a2)+ ; Save Blit size ; 38 bytes move.w d1,(a2)+ ; Save Modulo ; 36 bytes move.w d2,(a2)+ ; Save offset ; 34 bytes add.l a1,d2 ; 32 bytes move.w d4,BLTCON0(a5) ; 30 bytes move.w d5,BLTCON1(a5) ; 26 bytes move.l d2,BLTCPTH(a5) ; 22 bytes move.l d2,BLTDPTH(a5) ; 18 bytes move.w d0,BLTSIZE(a5) ; 14 bytes tst.w (a0) ; Terminate? ; 10 bytes bpl.s .part ; 8 bytes .exit: movem.l (a7)+,d0-d2/a0-a1 ; 6 bytes rts ; 2 bytes .mulu200: rept 256 dc.w REPTN*200 endr```

 Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 Similar Threads Thread Thread Starter Forum Replies Last Post mcgeezer Coders. Asm / Hardware 10 06 April 2018 20:29 th4t1guy support.Games 2 26 June 2015 17:41 turrican3 request.UAE Wishlist 13 30 July 2008 19:34 cdoty Coders. General 2 16 December 2007 13:24 Pyromania Retrogaming General Discussion 29 13 November 2003 23:28

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home News Main     Amiga scene     Retrogaming General Discussion     Nostalgia & memories Support     New to Emulation or Amiga scene         Member Introductions     support.WinUAE     support.WinFellow     support.OtherUAE     support.FS-UAE         project.AmigaLive     support.Hardware         Hardware mods         Hardware pics     support.Games     support.Demos     support.Apps     support.Amiga Forever     support.Amix     support.AmigaOS     support.Other Requests     request.UAE Wishlist     request.Old Rare Games     request.Demos     request.Apps     request.Modules     request.Music     request.Other     Looking for a game name ?     Games images which need to be WHDified abime.net - Hall Of Light     HOL news     HOL suggestions and feedback     HOL data problems     HOL contributions abime.net - Amiga Magazine Rack     AMR news     AMR suggestions and feedback     AMR data problems     AMR contributions abime.net - Home Projects     project.Amiga Lore     project.EAB     project.IRC     project.Mods Jukebox     project.Wiki abime.net - Hosted Projects     project.aGTW     project.APoV     project.ClassicWB     project.Jambo!     project.Green Amiga Alien GUIDES     project.Maptapper     project.Sprites     project.WinUAE - Kaillera Other Projects     project.Amiga Demo DVD     project.Amiga Game Factory     project.CARE     project.Amiga File Server     project.CD32 Conversion     project.Game Cover Art         GCA.Feedback and Suggestions         GCA.Work in Progress         GCA.Cover Requests         GCA.Usefull Programs         GCA.Helpdesk     project.KGLoad     project.MAGE     project.Missing Full Shareware Games     project.SPS (was CAPS)     project.TOSEC (amiga only)     project.WHDLoad         project.Killergorilla's WHD packs Misc     Amiga websites reviews     MarketPlace         Swapshop     Kinky Amiga Stuff     Collections     EAB's competition Coders     Coders. General         Coders. Releases         Coders. Tutorials     Coders. Asm / Hardware     Coders. System         Coders. Scripting         Coders. Nextgen     Coders. Language         Coders. C/C++         Coders. AMOS         Coders. Blitz Basic     Coders. Contest         Coders. Entries Creation     Graphics         Graphics. Work In Progress         Graphics. Finished Work         Graphics. Tutorials     Music         Music. Work In Progress         Music. Finished Work         Music. Tutorials

All times are GMT +2. The time now is 12:06.

 -- EAB3 skin ---- EAB2 skin ---- Mobile skin Archive - Top