English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Language > Coders. Blitz Basic

 
 
Thread Tools
Old 25 June 2018, 17:33   #121
Daedalus
Registered User
 
Daedalus's Avatar
 
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
Most functions in Blitz will automatically (and silently) cast numeric variables from one type to another when required - I'm struggling to think of any that don't. This can be useful, but can also lead to bugs later on with overflows, loss of precision and all that good stuff. I don't know how much CPU time that casting actually takes, but I can't imagine it's for free.

I also suspect there might be a more pronounced difference in performance on a real 68000 system, where conversions involving 32-bit variables (like a quick) will be slower than 16-bit word-specific versions.
Daedalus is offline  
Old 25 June 2018, 22:20   #122
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
With the most bare-bones ASM I can think of (am I missing a trick?) I can't beat QABS. It must use voodoo.

-- edited to add --
Trying the shift, XOR, mask approach here gives me the same results as ABS, leading me to think that must be how it's done internally. I don't think it can be beaten.

Code:
WBStartup
DEFTYPE.w

ResetTimer
For i=0 To 9999
  a =  Rnd(500)
  b =  Rnd(500)
  c = Abs(a - b)
Next i
NPrint "ABS " , Ticks

ResetTimer
For i=0 To 9999
  a =  Rnd(500)
  b =  Rnd(500)

  GetReg d0, a
  GetReg d1, b
  SUB.w d1, d0
  BMI adIsNeg2
  JMP ed
adIsNeg2:
  NEG.w d0
ed:
  PutReg d0, a
Next i
NPrint "AbsDiff " , Ticks

ResetTimer
For i=0 To 9999
  a =  Rnd(500)
  b =  Rnd(500)
  c = QAbs(a - b)
Next i
NPrint "QABS " , Ticks

VWait 500
End
Attached Thumbnails
Click image for larger version

Name:	Untitled.png
Views:	99
Size:	2.2 KB
ID:	58654  

Last edited by E-Penguin; 25 June 2018 at 23:13.
E-Penguin is offline  
Old 25 June 2018, 23:38   #123
Daedalus
Registered User
 
Daedalus's Avatar
 
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
What's also interesting is that the speed of Abs() and QAbs() is the same... What CPU was that run on? I don't know about the relative speeds, but perhaps Abs() and QAbs() are using a bit test and EOR internally instead, gaining some cycles that way?

Edit: sorry, didn't see your edit adding that you've already tried it

Last edited by Daedalus; 26 June 2018 at 12:25.
Daedalus is offline  
Old 26 June 2018, 00:12   #124
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
Abs and QAbs are usually within a tick of eachother; I'm putting that down to variance in the Rnd command (I avoided literals to ensure they weren't optimised away). Standard A1200 WinUAE config.

QAbs and Abs can't be doing any branching; it's too slow. I suppose I could try using the shiny new debugger in winuae4 and step through the ASM, but ain't nobody got time for that.

Summary: Abs/QAbs are more or less equivalent, and there's little-to-no scope for optimisation.
E-Penguin is offline  
Old 26 June 2018, 10:52   #125
Daedalus
Registered User
 
Daedalus's Avatar
 
Join Date: Jun 2009
Location: Dublin, then Glasgow
Posts: 6,334
I wonder what difference WinUAE might me making... If I have time I might try it out on a 68000 machine later today to see how it goes. I don't think the A1200 is fully cycle-exact, which means there could be shortcuts taken in calculations that are more or less 1:1 with x86 equivalents for example, and the 16-bit bus of the 68000 won't be slowing things down either...
Daedalus is offline  
Old 26 June 2018, 11:56   #126
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
Quote:
Originally Posted by E-Penguin View Post
Trying the shift, XOR, mask approach here gives me the same results as ABS, leading me to think that must be how it's done internally. I don't think it can be beaten.
Shifting on the plain 68000 is a half-expensive operation, at least if you shift that many steps.
idrougge is offline  
Old 26 June 2018, 13:19   #127
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
I guess it's a matter of shift vs a conditional branch + jmp. They look about the same order of duration.

Obviously this could be done very quickly with a lookup table if one doesn't mind creating an array of 128Kb... (that's not necessarily a silly suggestion if you have a bit of Fast ram going spare).
E-Penguin is offline  
Old 26 June 2018, 13:57   #128
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
Here is a branchless solution I found. It might not be any faster on a non-pipelined CPU, though. https://gist.github.com/cahirwpz/19c...f03025874530fc
idrougge is offline  
Old 26 June 2018, 14:53   #129
Master484
Registered User
 
Master484's Avatar
 
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
Also the different versions of Blitz is one factor that can affect speed. ABS and QABS may give different results on AmiBlitz and Classic Blitz 2.1, because the code might be different, and also some AmiBlitz commands use the FPU, although I don't know if ABS/QABS is one of them.

But I only use Classic Blitz, and I tested ABS vs QABS on 4 different WinUAE configurations, using this code:

Code:
loop=0
Repeat
 a = RND (100)
 b = ABS (a)
 loop + 1
Until loop = 1000
And these were the results, with Cycle Exact ON:

A500, No Fast RAM
ABS : Frame 11, VPOS at 14
QABS : Frame 5, VPOS at 275

A500 + Fast RAM
ABS : Frame 9, VPOS at 200
QABS : Frame 4, VPOS at 300

A1200, No Fast RAM
ABS : Frame 4, VPOS at 250
QABS : Frame 2, VPOS at 275

A1200 + Fast RAM
ABS : Frame 3, VPOS at 130
QABS : Frame 2, VPOS at 50

---

Also I tested this Q-Penquins code:
Code:
ResetTimer
For i=0 To 9999
  a =  Rnd(500)
  b =  Rnd(500)
  c = Abs(a - b)
 Next i
And got these results:

A1200, No Fast
ABS: 51 Ticks
QABS: 25 Ticks

A1200 + Fast RAM
ABS: 32 Ticks
QABS: 19 Ticks

A500, No Fast
ABS: 136 Ticks
QABS: 87 Ticks

So in all cases QABS was faster than ABS. And also the Blitz manual says that because QABS handles only Quick variables, it improves the commands speed "quite dramatically", although it doesn't tell how this speed increase happens.

So if you have gotten results where the speed of ABS and QABS are the same, then maybe this is the case on AmiBlitz only, but not on Classic Blitz 2.1 ?
Master484 is offline  
Old 26 June 2018, 17:01   #130
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
I was using 2.1, but didn't have cycle exact on. Maybe it makes a difference in this case. I'll code up an ASM function per idrougge's link when I get a chance.
E-Penguin is offline  
Old 26 June 2018, 17:46   #131
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
Quote:
Originally Posted by idrougge View Post
Here is a branchless solution I found. It might not be any faster on a non-pipelined CPU, though. https://gist.github.com/cahirwpz/19c...f03025874530fc
That's a pretty clever solution. Still (as you point out) on a 68000 CPU the branching solution is quite a bit faster:

Code:
    move.l   d0,d1  ; 4
    add.l    d1,d1  ; 8
    subx.l   d1,d1  ; 8
    eor.l    d1,d0  ; 8
    sub.l    d1,d0  ; 8
                    ; =36 cycles
Code:
    tst.l    d0     ; 4
    bpl.b    done   ; 10
    neg.l    d0     ; 6
done:
                    ; =14 or 20 cycles, depending on the sign of the input value
Niklas is offline  
Old 27 June 2018, 09:55   #132
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
Quote:
Originally Posted by Niklas View Post
Code:
    tst.l    d0     ; 4
    bpl.b    done   ; 10
    neg.l    d0     ; 6
done:
                    ; =14 or 20 cycles, depending on the sign of the input value
I tried with the logic flipped (BMI rather than BPL) and it was slower than the built-in function. I'll give it a go with things that way round. Maybe it's the overhead of the statement call
E-Penguin is offline  
Old 27 June 2018, 15:11   #133
clenched
Registered User
 
Join Date: Sep 2008
Location: Gainesville U.S.A.
Posts: 771
Quote:
Originally Posted by E-Penguin View Post
I tried with the logic flipped (BMI rather than BPL) and it was slower than the built-in function. I'll give it a go with things that way round. Maybe it's the overhead of the statement call
What is happening is the machine code part is actually running more BASIC statements than the other two. There are a few things to be done. Hopefully they are commented well enough on the snippet. Before and after made with latest WinUAE. Stock A1200 CE.
Code:
 
ResetTimer
For i=0 To 9999
  ;switch order so D0 is loaded with last variable
  b =  Rnd(500) 
  a =  Rnd(500) 
  ;GetReg d0, a
  ;GetReg d1, b
  ; Here 2(a2)=a 4(a2)=b 6(a2)=c
  ; d0 is already loaded with a
  
  MOVE.w 4(a2),d1 ;b to d1
  SUB.w d1, d0
  BMI adIsNeg2  ;this part could be adjusted
  JMP ed
adIsNeg2:
  NEG.w d0
ed:
;  PutReg d0, a
MOVE.w d0,6(a2) ;d0 to c  - changed from a for consistency
Next i
NPrint "AbsDiff " , Ticks
Code:
 
before           after  
=========================
ABS 147          ABS 134
AbsDiff 151      AbsDiff 87
QABS 111         QABS 110
 
ABS 144          ABS 135
AbsDiff 136      AbsDiff 88
QABS 110         QABS 108
 
ABS 145          ABS 142
AbsDiff 137      AbsDiff 88
QABS 110         QABS 109
 
ABS 135          ABS 141
AbsDiff 137      AbsDiff 87
QABS 112         QABS 109
 
ABS 136          ABS 140
AbsDiff 136      AbsDiff 91
QABS 117         QABS 111

Last edited by clenched; 27 June 2018 at 15:29.
clenched is offline  
Old 27 June 2018, 17:00   #134
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
Nice. Instructive about how the variables are mapped to the data registers too. Thanks
E-Penguin is offline  
Old 27 June 2018, 21:55   #135
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
What is located at 0(A2)?

Last edited by idrougge; 27 June 2018 at 22:06.
idrougge is offline  
Old 27 June 2018, 23:12   #136
clenched
Registered User
 
Join Date: Sep 2008
Location: Gainesville U.S.A.
Posts: 771
Quote:
Originally Posted by idrougge View Post
What is located at 0(A2)?
Offhand I would say that is i from the for/next loop.
splice in move.w (a2),$200 somewhere.
When program finishes $200 contains $270f (9999)

E-Penguin - Replace the first two ML lines for a slight reduction:
SUB.w 4(a2),d0
clenched is offline  
Old 27 June 2018, 23:51   #137
E-Penguin
Banana
 
E-Penguin's Avatar
 
Join Date: Jul 2016
Location: Darmstadt
Posts: 1,213
I'm beginning to think that the art of 68k programming lies in the mastery of the various addressing modes.
E-Penguin is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
SetCol/DoColl-How to test collisions with different sprites against different colors? Shatterhand Coders. Blitz Basic 1 12 January 2017 18:51
Quickest code.... Galahad/FLT Coders. Asm / Hardware 10 01 January 2017 17:23
[REQ:ASM] Sprite collisions basics jman Coders. Tutorials 5 03 September 2011 00:07
What is the quickest way Doc Mindie support.WinUAE 6 17 October 2007 21:15
Disable Sprite Collisions DeAdLy_cOoKiE Retrogaming General Discussion 4 24 March 2006 17:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:11.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.15683 seconds with 16 queries