English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 22 May 2021, 19:18   #201
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
Then you used buggy program, i calculated number of instructions manually, 28 instructions, 56 bytes. You can tell me which instruction can not be counted, i signed all 28 instructions from your post.

Really D3 can be odd?
Whow, surprise for me.
But maybe you can learn something new about 68000 coding. How works

addx.w D3,D3

What is wrong with ?
btst #14,(A0)
Think about this or check 68000 asembler book.
I've shown you the listing, the program is ok. So it is your calculations which are buggy. If nobody notes this then it seems we have something very wrong. If nobody comments this I leave this thread. Thanks a lot to all people who helped.
What crazy things are you talking about?! How to connect addx.w D3,D3 with you code?
About BTST just read the manual and study yourself a bit.
Quote:
If a memory location is the destination, a byte is read from that location, and the bit operation performed using the bit number, modulo 8, with zero referring to the least significant bit.
You can note that any value from 0 to 0xffff is allowed here. It really is very sad that you write so much non-sense.

Quote:
Originally Posted by roondar View Post
Of course, no problem.
Thank you very much. I wish Don_Adan could explain his thoughts so clearly. However my code doesn't use absolute addressing.

Quote:
Originally Posted by Thorham View Post
I wasn't talking about more digits, I was talking about memory constraints. Two different things. Not having memory constraints enables more optimizations such as using a table for converting to decimal digits for example.
As I explained afore these things are connected by the nature of pi-spigot algo. If you want more than 64KB memory you need other data types. This is not a trivial matter, it is a kind of math magic around the pi number.
EDIT. I dare to ask you how to test 8-bit systems without 64 KB limit?
litwr is offline  
Old 22 May 2021, 19:38   #202
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by litwr View Post
I've shown you the listing, the program is ok. So it is your calculations which are buggy. If nobody notes this then it seems we have something very wrong.
It's very simple:
Code:
F00:0160       .longdiv
F00:0161         if __VASM&28              ;68020/30?
F00:0162                divul d4,d7:d3
F00:0163         else
F00:0164                swap d3
               S01:000000CE:  48 43
In your listing the opcode comes after the code which generated it - but at the end you have:
Code:
F00:0213                subq #2,d4    ;i <- i - 1
               S01:00000102:  55 44
F00:0214                bcc .l2       ;the main loop
So Don_Adan is counting the final bcc (which occupies bytes 104 / 5) but you're not, so you're measuring 2 bytes fewer than he is.
robinsonb5 is offline  
Old 22 May 2021, 19:44   #203
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by litwr View Post
About BTST just read the manual and study yourself a bit.
Perhaps you need to know that btst in memory is a byte operation so bit number ranges from 0 to 7, hence btst #14,(a0) is incorrect.
meynaf is online now  
Old 22 May 2021, 20:19   #204
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by litwr View Post
As I explained afore these things are connected by the nature of pi-spigot algo. If you want more than 64KB memory you need other data types. This is not a trivial matter, it is a kind of math magic around the pi number.
I'm not talking about the spigot algorithm's table size, but about the total 64 kb limit:
Quote:
3) it uses less than 64 KB RAM for the code and data
This places a limitation on potential optimizations on bigger systems. For example, it makes it impossible to use a base 10000 conversion table (if that would be beneficial) and still get to ~9000 digits.

Last edited by Thorham; 22 May 2021 at 20:27.
Thorham is online now  
Old 22 May 2021, 22:55   #205
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
I found one bug, in my version D7 is not handled correctly for odd values. I must rethink this routine again.
Don_Adan is offline  
Old 23 May 2021, 01:44   #206
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Perhaps fixed now, but code is longer.

Code:
         clr.l -(SP)   ; cv
         moveq #0,D7

.l0      clr.l d5       ;d <- 0
         move.l d6,d4     ;i <- kv, i <- i*2
         adda.l d4,a3
         subq.l #1,d4     ;b <- 2*i-1
         move.w #10000,d1
         bra.b .l4

.l2      sub.l d3,d5
         sub.l d7,d5
         lsr.l #1,d5
.l4
         move -(a3),d0      ; r[i]
         mulu.w d1,d0       ;r[i]*10000
         add.l d0,d5       ;d += r[i]*10000
         move.l d5,d3
         lsr.l #1,D3
         divu.w d4,d3
         move.w d3,d7
         clr.w d3
         swap d3
         addx.w  D3,D3

         move.w D4,D0 ; D4
         sub.w D3,D0 ; check if D3 is greater or equal D4
         sls D0 ; if yes then $FF, if not then 0
         extb.l D0 ; -1 or 0
         add.l D7,D7 
         sub.l D0,D7
         and.w D4,D0 ; D4 or 0
         sub.w D0,D3 ; fixed D3

         exg D3,D7
         move.w D7,(A3)     ;r[i] <- d%b

         subq.w #2,d4    ;i <- i - 1
         bcc.b .l2       ;the main loop
         divu.w d1,d5      ;removed with MULU optimization
 
         add.w (SP),D5 ; cv
         move.l D5,(SP) ; cv
         ext.l D5   ; necessary only for litwr version of PR0000 routine
         bsr PR0000

         sub.w #28,d6   ;kv
         bne.b .l0
         addq.l #4,SP ; restore stack

Last edited by Don_Adan; 23 May 2021 at 03:10.
Don_Adan is offline  
Old 23 May 2021, 04:07   #207
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,544
Quote:
Originally Posted by Thorham View Post
Removing this limit is crucial if you want to use this Pi spigot as a benchmark. Artificially limiting the more powerful systems just makes them look less powerful than they are. A good benchmark doesn't play favorites.
This isn't a 'good' benchmark anyway, if you want something that gauges real-world performance. It's only real use is to show how an algorithm can be implemented on various retro computers. To this end I think it is 'fairer' to specify limits that allow the less powerful systems to be competitive.

One of my goals in life (if I can find the time) is to do some assembly language programming on all of the retro computers in my collection. litwr's 'pipack' has working examples for many of them in one handy archive, so it is useful to me even if the benchmark results aren't very relevant.

Quote:
Originally Posted by Thorham
This places a limitation on potential optimizations on bigger systems. For example, it makes it impossible to use a base 10000 conversion table (if that would be beneficial) and still get to ~9000 digits.
You should see the new FPGA based 68k CPU I am designing for the Amiga. One of the extra instructions is called 'picalc' - which blasts all 9000 digits into RAM in a single clock cycle. it will blow the other machines out of the water!

Seriously though, who cares what optimizations can be done on larger systems? It's not like computing 9000 digits of pi has any practical application.

I would rather see the original algorithm reproduced 'accurately', ie. in a form that closely follows it in an obvious way, rather than 'cheating' with lookup tables etc. Otherwise it opens up the possibility of ridiculous developments like what happened with America's Cup racing boats - which had strict hull dimension rules except that nowhere did the rules state it had to be a single hull (!), opening the way for catamarans and hydrofoils that blew conventional designs out of the water.
Bruce Abbott is offline  
Old 23 May 2021, 09:33   #208
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by Bruce Abbott View Post
Seriously though, who cares what optimizations can be done on larger systems? It's not like computing 9000 digits of pi has any practical application.
How about a number system with the base of a circle circumference where pi is a single digit integer?


Last edited by modrobert; 23 May 2021 at 11:22.
modrobert is offline  
Old 23 May 2021, 12:40   #209
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by Bruce Abbott View Post
This isn't a 'good' benchmark anyway, if you want something that gauges real-world performance.
It's a benchmark, and as such should be fair and not artificially limit systems. Whether or not it's a good benchmark doesn't matter.

Quote:
Originally Posted by Bruce Abbott View Post
Seriously though, who cares what optimizations can be done on larger systems?
This thread is eleven pages long.
Thorham is online now  
Old 23 May 2021, 19:43   #210
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by robinsonb5 View Post
It's very simple:
In your listing the opcode comes after the code which generated it - but at the end you have:
Code:
F00:0213                subq #2,d4    ;i <- i - 1
               S01:00000102:  55 44
F00:0214                bcc .l2       ;the main loop
So Don_Adan is counting the final bcc (which occupies bytes 104 / 5) but you're not, so you're measuring 2 bytes fewer than he is.
Dear Sir! Please look at the math I provided for Don_Adan 0x104-0xCE = 0x36 = 54 bytes. Can you note 0x102 there?! I used 0x104 as the final label.
litwr is offline  
Old 23 May 2021, 19:47   #211
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by meynaf View Post
Perhaps you need to know that btst in memory is a byte operation so bit number ranges from 0 to 7, hence btst #14,(a0) is incorrect.
Cher Monsieur!
I just point the manual snippet about BTST for you afore. Please read it now.
Quote:
If a memory location is the destination, a byte is read from that location, and the bit operation performed using the bit number, modulo 8, with zero referring to the least significant bit.
It is perfectly right to use any number in range 0..0xffff. Why write this non-sense?
litwr is offline  
Old 23 May 2021, 19:56   #212
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Thorham View Post
I'm not talking about the spigot algorithm's table size, but about the total 64 kb limit:

This places a limitation on potential optimizations on bigger systems. For example, it makes it impossible to use a base 10000 conversion table (if that would be beneficial) and still get to ~9000 digits.
I already explained this. Please, don't ignore 8-bit and some 16-bit systems. This limit was initially imposed because those systems just can't address more than 64 kb.
However if we want 10000 digits for 32-bit and some 16-bit systems this makes the algo slower for those systems because they have to use bigger tables and variables even for 1000 digits. And let me repeat again this excludes a lot of systems from the benchmarking. So it gives nothing good. You know every sport has rules.
litwr is offline  
Old 23 May 2021, 20:13   #213
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
Perhaps fixed now, but code is longer.
Code:
         add.l d0,d5       ;d += r[i]*10000
         move.l d5,d3
         lsr.l #1,D3
        divu.w d4,d3
        move.w d3,d7
It seems it's become an obsession for you. Thank you for help but your late posts are strange. It is better for you to stop now. The code is good enough now.
Your code snippet is wrong again. Just try to use D4=1 this can overflow DIVU D4,D3. Please if you want to continue try to run your code at first.
litwr is offline  
Old 23 May 2021, 20:29   #214
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by litwr View Post
Dear Sir! Please look at the math I provided for Don_Adan 0x104-0xCE = 0x36 = 54 bytes. Can you note 0x102 there?! I used 0x104 as the final label.
Edit: please note, the information below is not correct. I made an error while verifying the code length. I'll leave it anyway because it's always good to keep mind that we can all make mistakes
Original post follows...

I've had enough with everyone in this thread bickering over this. Litwr is correct: the code is 54 bytes long. And because I no longer want to see any silly discussion about it, you can find attached a screenshot of the output that VASM gives for the following code when assembled:
Code:
.longdiv
        move d3,d7
        divu d4,d7
        swap d7
        move d7,d3
        swap d3
        divu d4,d3
        move d3,d7
        exg.l d3,d7
        clr d7
        swap d7
        move d7,(a3)     ;r[i] <- d%b
        bra.s .enddiv
.l2     sub.l d3,d5
        sub.l d7,d5
        lsr.l d5
.l4
        move -(a3),d0      ; r[i]
        mulu d1,d0       ;r[i]*10000
        add.l d0,d5       ;d += r[i]*10000
        move.l d5,d3
        divu d4,d3
        bvs.s .longdiv
        move d3,d7
        clr d3
        swap d3
        move d3,(a3)     ;r[i] <- d%b
.enddiv
        subq #2,d4    ;i <- i - 1
        bcc .l2       ;the main loop
.endcode
    printv    .endcode-.longdiv
For those not aware, printv prints a value. In this case the length of the code.

Last edited by BippyM; 01 June 2021 at 18:24.
roondar is offline  
Old 23 May 2021, 20:39   #215
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by roondar View Post
I've had enough with everyone in this thread bickering over this.
Thank you very much. IMHO we've just reached the goals of this thread... Indeed I would like to solve the mystery discovered by modrobert but it is another goal. Maybe I need to start a new topic for this.
Quote:
Originally Posted by modrobert View Post
How about a number system with the base of a circle circumference where pi is a single digit integer?
A lot of algos exist for the pi-number computation. IMHO the pi-spigot is the easiest. It is not the fastest but it is the shortest.
litwr is offline  
Old 23 May 2021, 20:43   #216
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by roondar View Post
I've had enough with everyone in this thread bickering over this. Litwr is correct: the code is 54 bytes long. And because I no longer want to see any silly discussion about it, you can find attached a screenshot of the output that VASM gives for the following code when assembled:

That's all well and good, but you haven't counted the swap at the start of the listing, the $4843 at address $ce.


Quote:
Originally Posted by litwr
Dear Sir! Please look at the math I provided for Don_Adan 0x104-0xCE = 0x36 = 54 bytes. Can you note 0x102 there?! I used 0x104 as the final label.

Yes, indeed, I see the 0x104 - however, the 0x5544 at 0x102 is *not* the bcc, it's the "subq #4, d4". The bcc *starts* at 0x104, and thus ends at 0x106 - therefore you're not counting the bcc.
robinsonb5 is offline  
Old 23 May 2021, 20:59   #217
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by robinsonb5 View Post
That's all well and good, but you haven't counted the swap at the start of the listing, the $4843 at address $ce.

Sorry, I find it really funny that I try to get a 100% answer on this and then make an error myself. Especially considering how I phrased it. So errr, yeah... About that...

You're right, I did accidentally cut off the swap while reformatting the listing. I do appologize, I used the program listing as supplied by litwr (the one with all the line numbers, offsets and such added in) and deleted one line more than I should've. Which means, yeah - it is 56 bytes if the instruction I managed to delete were added back in.
Quote:
Originally Posted by litwr View Post
It is perfectly right to use any number in range 0..0xffff. Why write this non-sense?
In most assemblers, you can certainly use a larger number than 0-7 using BTST in memory, but be aware that the instruction itself only has encoding space for 3 bits when used to test bits in memory and only tests on a single byte. So BTST #14,<<memory>> doesn't check the 14th bit, but the 6th bit.

Last edited by roondar; 24 May 2021 at 00:49. Reason: Corrected my statement on BTST
roondar is offline  
Old 23 May 2021, 21:13   #218
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by roondar View Post

Sorry, I find it really funny that I try to get a 100% answer on this and then make an error myself.
Such is life - it's all good

The funniest part is how much effort and energy we've all expended over two bytes!

It has been an interesting thread, though - I love that we can still learn new things about instruction timings decades after the CPUs were released.
robinsonb5 is offline  
Old 23 May 2021, 22:06   #219
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by litwr View Post
It seems it's become an obsession for you. Thank you for help but your late posts are strange. It is better for you to stop now. The code is good enough now.
Your code snippet is wrong again. Just try to use D4=1 this can overflow DIVU D4,D3. Please if you want to continue try to run your code at first.
You are very funny. You used buggy program which cant calc size of loop routine correctly. You was too lazy to read/check my reply, where I counted all instructions used in main loop. You know better what is correct for using btst at memory. Now you tell me that my routine will be overflow if D4 will be 1. I know this. This routine works only for 1 bit overflow, not more. Maybe you know how works lsr.l #1,D3? You dont show example D4 and D3 values, when overflow problem occured. Present i dont have access to my Amiga to check this. Loop code is good enough, but can be better. You used your program for CPU benchmark. Same for PR0000, your version is only average.
Don_Adan is offline  
Old 23 May 2021, 23:47   #220
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by litwr View Post
I already explained this. Please, don't ignore 8-bit and some 16-bit systems. This limit was initially imposed because those systems just can't address more than 64 kb.
However if we want 10000 digits for 32-bit and some 16-bit systems this makes the algo slower for those systems because they have to use bigger tables and variables even for 1000 digits. And let me repeat again this excludes a lot of systems from the benchmarking. So it gives nothing good. You know every sport has rules.
This is not what I'm talking about. I'm talking about potential speed optimizations. I'm specifically not talking about the number of digits, spigot algorithm table sizes, or changing the algorithm in any way that would make it unpractical/unusable on the small systems.

For example, there's a division by 10000 in the original program. It might be possible to make a division table for this and get some benefit. The artificial limitation prevents this. Another one might be a division + binary to decimal conversion table where the whole thing is done in one go. Has nothing to do with the spigot algorithm, and therefore doesn't affect the smaller systems at all.
Thorham is online now  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68020 Bit Field Instructions mcgeezer Coders. Asm / Hardware 9 27 October 2023 23:21
68060 64-bit integer math BSzili Coders. Asm / Hardware 7 25 January 2021 21:18
Discovery: Math Audio Snow request.Old Rare Games 30 20 August 2018 12:17
Math apps mtb support.Apps 1 08 September 2002 18:59

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:39.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.13292 seconds with 17 queries