25 June 2018, 17:33 | #121 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
Most functions in Blitz will automatically (and silently) cast numeric variables from one type to another when required - I'm struggling to think of any that don't. This can be useful, but can also lead to bugs later on with overflows, loss of precision and all that good stuff. I don't know how much CPU time that casting actually takes, but I can't imagine it's for free.
I also suspect there might be a more pronounced difference in performance on a real 68000 system, where conversions involving 32-bit variables (like a quick) will be slower than 16-bit word-specific versions. |
25 June 2018, 22:20 | #122 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
With the most bare-bones ASM I can think of (am I missing a trick?) I can't beat QABS. It must use voodoo.
-- edited to add -- Trying the shift, XOR, mask approach here gives me the same results as ABS, leading me to think that must be how it's done internally. I don't think it can be beaten. Code:
WBStartup DEFTYPE.w ResetTimer For i=0 To 9999 a = Rnd(500) b = Rnd(500) c = Abs(a - b) Next i NPrint "ABS " , Ticks ResetTimer For i=0 To 9999 a = Rnd(500) b = Rnd(500) GetReg d0, a GetReg d1, b SUB.w d1, d0 BMI adIsNeg2 JMP ed adIsNeg2: NEG.w d0 ed: PutReg d0, a Next i NPrint "AbsDiff " , Ticks ResetTimer For i=0 To 9999 a = Rnd(500) b = Rnd(500) c = QAbs(a - b) Next i NPrint "QABS " , Ticks VWait 500 End Last edited by E-Penguin; 25 June 2018 at 23:13. |
25 June 2018, 23:38 | #123 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
What's also interesting is that the speed of Abs() and QAbs() is the same... What CPU was that run on? I don't know about the relative speeds, but perhaps Abs() and QAbs() are using a bit test and EOR internally instead, gaining some cycles that way?
Edit: sorry, didn't see your edit adding that you've already tried it Last edited by Daedalus; 26 June 2018 at 12:25. |
26 June 2018, 00:12 | #124 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
Abs and QAbs are usually within a tick of eachother; I'm putting that down to variance in the Rnd command (I avoided literals to ensure they weren't optimised away). Standard A1200 WinUAE config.
QAbs and Abs can't be doing any branching; it's too slow. I suppose I could try using the shiny new debugger in winuae4 and step through the ASM, but ain't nobody got time for that. Summary: Abs/QAbs are more or less equivalent, and there's little-to-no scope for optimisation. |
26 June 2018, 10:52 | #125 |
Registered User
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
|
I wonder what difference WinUAE might me making... If I have time I might try it out on a 68000 machine later today to see how it goes. I don't think the A1200 is fully cycle-exact, which means there could be shortcuts taken in calculations that are more or less 1:1 with x86 equivalents for example, and the 16-bit bus of the 68000 won't be slowing things down either...
|
26 June 2018, 11:56 | #126 | |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
Quote:
|
|
26 June 2018, 13:19 | #127 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
I guess it's a matter of shift vs a conditional branch + jmp. They look about the same order of duration.
Obviously this could be done very quickly with a lookup table if one doesn't mind creating an array of 128Kb... (that's not necessarily a silly suggestion if you have a bit of Fast ram going spare). |
26 June 2018, 13:57 | #128 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
Here is a branchless solution I found. It might not be any faster on a non-pipelined CPU, though. https://gist.github.com/cahirwpz/19c...f03025874530fc
|
26 June 2018, 14:53 | #129 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
Also the different versions of Blitz is one factor that can affect speed. ABS and QABS may give different results on AmiBlitz and Classic Blitz 2.1, because the code might be different, and also some AmiBlitz commands use the FPU, although I don't know if ABS/QABS is one of them.
But I only use Classic Blitz, and I tested ABS vs QABS on 4 different WinUAE configurations, using this code: Code:
loop=0 Repeat a = RND (100) b = ABS (a) loop + 1 Until loop = 1000 A500, No Fast RAM ABS : Frame 11, VPOS at 14 QABS : Frame 5, VPOS at 275 A500 + Fast RAM ABS : Frame 9, VPOS at 200 QABS : Frame 4, VPOS at 300 A1200, No Fast RAM ABS : Frame 4, VPOS at 250 QABS : Frame 2, VPOS at 275 A1200 + Fast RAM ABS : Frame 3, VPOS at 130 QABS : Frame 2, VPOS at 50 --- Also I tested this Q-Penquins code: Code:
ResetTimer For i=0 To 9999 a = Rnd(500) b = Rnd(500) c = Abs(a - b) Next i A1200, No Fast ABS: 51 Ticks QABS: 25 Ticks A1200 + Fast RAM ABS: 32 Ticks QABS: 19 Ticks A500, No Fast ABS: 136 Ticks QABS: 87 Ticks So in all cases QABS was faster than ABS. And also the Blitz manual says that because QABS handles only Quick variables, it improves the commands speed "quite dramatically", although it doesn't tell how this speed increase happens. So if you have gotten results where the speed of ABS and QABS are the same, then maybe this is the case on AmiBlitz only, but not on Classic Blitz 2.1 ? |
26 June 2018, 17:01 | #130 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
I was using 2.1, but didn't have cycle exact on. Maybe it makes a difference in this case. I'll code up an ASM function per idrougge's link when I get a chance.
|
26 June 2018, 17:46 | #131 | |
Registered User
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
|
Quote:
Code:
move.l d0,d1 ; 4 add.l d1,d1 ; 8 subx.l d1,d1 ; 8 eor.l d1,d0 ; 8 sub.l d1,d0 ; 8 ; =36 cycles Code:
tst.l d0 ; 4 bpl.b done ; 10 neg.l d0 ; 6 done: ; =14 or 20 cycles, depending on the sign of the input value |
|
27 June 2018, 09:55 | #132 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
I tried with the logic flipped (BMI rather than BPL) and it was slower than the built-in function. I'll give it a go with things that way round. Maybe it's the overhead of the statement call
|
27 June 2018, 15:11 | #133 | |
Registered User
Join Date: Sep 2008
Location: Gainesville U.S.A.
Posts: 771
|
Quote:
Code:
ResetTimer For i=0 To 9999 ;switch order so D0 is loaded with last variable b = Rnd(500) a = Rnd(500) ;GetReg d0, a ;GetReg d1, b ; Here 2(a2)=a 4(a2)=b 6(a2)=c ; d0 is already loaded with a MOVE.w 4(a2),d1 ;b to d1 SUB.w d1, d0 BMI adIsNeg2 ;this part could be adjusted JMP ed adIsNeg2: NEG.w d0 ed: ; PutReg d0, a MOVE.w d0,6(a2) ;d0 to c - changed from a for consistency Next i NPrint "AbsDiff " , Ticks Code:
before after ========================= ABS 147 ABS 134 AbsDiff 151 AbsDiff 87 QABS 111 QABS 110 ABS 144 ABS 135 AbsDiff 136 AbsDiff 88 QABS 110 QABS 108 ABS 145 ABS 142 AbsDiff 137 AbsDiff 88 QABS 110 QABS 109 ABS 135 ABS 141 AbsDiff 137 AbsDiff 87 QABS 112 QABS 109 ABS 136 ABS 140 AbsDiff 136 AbsDiff 91 QABS 117 QABS 111 Last edited by clenched; 27 June 2018 at 15:29. |
|
27 June 2018, 17:00 | #134 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
Nice. Instructive about how the variables are mapped to the data registers too. Thanks
|
27 June 2018, 21:55 | #135 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
|
What is located at 0(A2)?
Last edited by idrougge; 27 June 2018 at 22:06. |
27 June 2018, 23:12 | #136 |
Registered User
Join Date: Sep 2008
Location: Gainesville U.S.A.
Posts: 771
|
|
27 June 2018, 23:51 | #137 |
Banana
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
|
I'm beginning to think that the art of 68k programming lies in the mastery of the various addressing modes.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
SetCol/DoColl-How to test collisions with different sprites against different colors? | Shatterhand | Coders. Blitz Basic | 1 | 12 January 2017 18:51 |
Quickest code.... | Galahad/FLT | Coders. Asm / Hardware | 10 | 01 January 2017 17:23 |
[REQ:ASM] Sprite collisions basics | jman | Coders. Tutorials | 5 | 03 September 2011 00:07 |
What is the quickest way | Doc Mindie | support.WinUAE | 6 | 17 October 2007 21:15 |
Disable Sprite Collisions | DeAdLy_cOoKiE | Retrogaming General Discussion | 4 | 24 March 2006 17:56 |
|
|