English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 01 May 2021, 10:54   #21
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by litwr View Post
Would you like please to run pi-amiga and pi-amiga1200 on your hardware for 100, 1000, and 3000 digits? A screenshot would be a nice addition for me to insert it in the table.
OK, I used the binaries in your attached archive 'pispigot.zip'. Here are the results for v9:

Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 100
314...  .10
Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 1000
314...  4.40
Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 3000
314...  34.16

Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 100
314...  .12
Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 1000
314...  4.42
Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 3000
314...  34.42

The time results vary a bit between runs, most likely due to multitasking and perhaps accuracy of the frame counter, on the first "100" test I get results between 0.09 and 0.12 running the same program.

I attached a screenshot, quality kind of sucks, it's hard to get good ones from CRT screens (looks better in reality).

Last edited by BippyM; 01 June 2021 at 18:24.
modrobert is offline  
Old 01 May 2021, 12:06   #22
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Originally Posted by litwr View Post
I showed the CMP-first version only because it was a/b's demand. Sorry I didn't add more information about it afore.
No need to apologize: it's my fault that I didn't read the thread properly (and still haven't ).

Quote:
Thank you. I didn't know this. However CLR vs SWAP timing is rather odd. The 68000 executes SWAP faster than CLR but the 68020 executes CLR faster than SWAP!
Given that your code uses divu.l and divul.l, which are not available on the 68000, I assumed that it was for 68020+, so I didn't hesitate to replace moveq.l with clr.l (just for readability reasons - it's only a personal preference).

Quote:
It is interesting to reduce instruction dependency in the code that may speed up the execution on the 68020 and higher 68k. Actually I didn't think about it.
For completeness, reducing the register conflicts gives an improvement only on the 68060.

Quote:
Your code again corrupts D7.
How so? It puts the quotient in the low word and clears the upper word, just like your code does.

Quote:
Code:
         divu.w d4,d6
         bvs.s .longdiv

         moveq.l #0,d7
         move.w d6,d7
         clr.w d6
         swap d6
         move.w d6,(a3)
One register conflict can be avoided by using the same trick proposed in my previous post:

Code:
   divu.w d4,d6
   bvs.s  .longdiv

   move.l d6,d7
   clr.w  d6
   eor.l  d6,d7
   swap.w d6
   move.w d6,(a3)
If you can change the array item size to 4 bytes, then the code can become:

Code:
   divu.w d4,d6
   bvs.s  .longdiv

   move.l d6,d7
   clr.w  d6
   eor.l  d6,d7
   move.l d6,(a3)
   swap.w d6
This removes one more register conflict and has the swap executing in parallel with the write (but the latter is not necessarily an improvement, as other instructions after this piece of code could execute in parallel as well).
saimo is offline  
Old 01 May 2021, 12:57   #23
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by saimo View Post
For completeness, reducing the register conflicts gives an improvement only on the 68060.
I still don't know when the 68020/30 can perform 2-3 instructions at once.

Quote:
Originally Posted by saimo View Post
How so? It puts the quotient in the low word and clears the upper word, just like your code does.
Sorry again. I was wrong. What a nice EOR-optimization! Thank you. The 68020 oddity can make me mad, the latest modrobert's results help this. However your optimization just replaces MOVEQ and MOVE.W with MOVE.L and EOR.L and IMHO your pair is even a bit slower.

Quote:
Originally Posted by saimo View Post
If you can change the array item size to 4 bytes, then the code can become:
No, because there is the 4th rule of the pi-spigot: it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits. Actually we can took a double word, process two its words and save the double word back. But I doubt that this can accelerate the program.
litwr is offline  
Old 01 May 2021, 13:57   #24
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Originally Posted by litwr View Post
I still don't know when the 68020/30 can perform 2-3 instructions at once.
They generally can't: they can partly execute stuff in parallel depending on the instructions types, operands and alignment in memory. When the "head" and "tail" (as they're called in the 68030 documentation) timings of instructions overlap, then parallel execution occurs (the head of an instruction can execute in parallel with the tail of the previous instruction; if the head happens to be all that the instruction consists in, then the whole instruction, in practice, takes 0 cycles) The 68040 has a deeper pipeline and the 68060 is also superscalar, so those CPUs can do even better. If you feel like learning the details, check out the user manuals of the 68k CPUs (they're available for download from NXP - here's the one for the M68020).
That said, after an instruction that writes to memory, the CPU can enjoy many "free" cycles for the cached non-memory instructions; for example, I measured that after a write to CHIP RAM, the 68030 on my Blizzard 1230-IV has 26 free cycles which can be used for all the instructions that fit in those cycles; during a write to FAST RAM (which the CPU has exclusive access to and is no-wait-state) the CPU has 4 free cycles. (Exception: if I remember correctly - it's been a while - rotate and, maybe, also shift instructions, for some strange reason, actually can't benefit from the free cycles and cause the CPU to stall until the write finishes.) This is the reason why I proposed the longword write trick.

Quote:
Sorry again. I was wrong. What a nice EOR-optimization! Thank you. The 68020 oddity can make me mad, the latest modrobert's results help this. However your optimization just replaces MOVEQ and MOVE.W with MOVE.L and EOR.L and IMHO your pair is even a bit slower.
The instructions I used are register-only and are as fast as it gets on any 68020+ CPU Basically, my code does not add cycles, but it eliminates a register dependency, thus actually making the code faster on the 68060.

Last edited by saimo; 01 May 2021 at 14:03.
saimo is offline  
Old 01 May 2021, 18:53   #25
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by modrobert View Post
Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 100
314...  .10
Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 1000
314...  4.40
Code:
pi-amiga
number ? calculator v9 (68000)
number of digits (up to 9248)? 3000
314...  34.16
Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 100
314...  .12
Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 1000
314...  4.42
Code:
pi-amiga1200
number ? calculator v9 (68020)
number of digits (up to 9252)? 3000
314...  34.42
Thank you very much. Your results show that the 68020 code optimization is a very uneasy task. They again show that the 68020 code is slower than the pure 68000 code and this is very odd. Would you like please to help me a bit more? I prepared an ultimate test which can show us all unclear details. The attached archive contains the next executable files:
pi-amiga-8 - an old version (68000)
pi-amiga-8mo - an old version with MULUopt=1 (68000)
pi-amiga-9 - BVS optimization (68000)
pi-amiga-9mo - BVS optimization with MULUopt=1 (68000)
pi-amiga1200-8 - an old version (68020)
pi-amiga1200-8mo - an old version with MULUopt=1 (68020)
pi-amiga1200-9 - BVS optimization (68020)
pi-amiga1200-9mo - BVS optimization with MULUopt=1 (68020)

You have already run pi-amiga-9mo and pi-amiga1200-9mo - if you rerun them the results must be the same. So results for pi-amiga-8, pi-amiga-8mo, pi-amiga-9, pi-amiga1200-8, pi-amiga1200-8mo, pi-amiga1200-9 are only required to get information which optimization actually works on the 68020. This time only 3000 digit results are required.

Quote:
Originally Posted by saimo View Post
They generally can't: they can partly execute stuff in parallel depending on the instructions types, operands and alignment in memory. When the "head" and "tail" (as they're called in the 68030 documentation) timings of instructions overlap, then parallel execution occurs (the head of an instruction can execute in parallel with the tail of the previous instruction; if the head happens to be all that the instruction consists in, then the whole instruction, in practice, takes 0 cycles) The 68040 has a deeper pipeline and the 68060 is also superscalar, so those CPUs can do even better. If you feel like learning the details, check out the user manuals of the 68k CPUs (they're available for download from NXP - here's the one for the M68020).
That said, after an instruction that writes to memory, the CPU can enjoy many "free" cycles for the cached non-memory instructions; for example, I measured that after a write to CHIP RAM, the 68030 on my Blizzard 1230-IV has 26 free cycles which can be used for all the instructions that fit in those cycles; during a write to FAST RAM (which the CPU has exclusive access to and is no-wait-state) the CPU has 4 free cycles. (Exception: if I remember correctly - it's been a while - rotate and, maybe, also shift instructions, for some strange reason, actually can't benefit from the free cycles and cause the CPU to stall until the write finishes.) This is the reason why I proposed the longword write trick.
Thank you very much for this very interesting information. This also partially explains why the 68020 simulators are still so far from the perfection.

Quote:
Originally Posted by saimo View Post
The instructions I used are register-only and are as fast as it gets on any 68020+ CPU Basically, my code does not add cycles, but it eliminates a register dependency, thus actually making the code faster on the 68060.
It is sad that Commodore left PC market so early. The 68060 based Amiga would have been quite good until maybe 1998. But now I have large problems even with the 68020 optimization... I doubt that I can discover how to optimize the code even for the 68030 - I will need help from Atari people for this. So the 68060 optimization is rather among pure fantastics for me now.

Last edited by BippyM; 01 June 2021 at 18:24.
litwr is offline  
Old 02 May 2021, 11:23   #26
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
While waiting for new electrolytic capacitors to ship for my original A1200 PSU (it needs recapping), I have connected a PC ATX 250W power supply temporarily which gives me more options.

litwr,

Do you want me to run the new tests with the stock A1200 68020 same as before, or using ACA-1232 68030 @ 33MHz (with 128mb fastram)?

EDIT:

Never mind, the ACA-1232 seems to be broken, so can't do 68030 for now.


Amiga 1200 stock 68020 @ 14MHz with 4mb fastram

Code:
pi-amiga-8
number ? calculator v8 [MULUopt](68000)
number of digits (up to 9252)? 3000
314...  35.90
Code:
pi-amiga-8mo
number ? calculator v8 [MULUopt](68000)
number of digits (up to 9248)? 3000
314...  35.30
Code:
pi-amiga-9
number ? calculator v9(68000)
number of digits (up to 9248)? 3000
314...  34.72
Code:
pi-amiga-9mo
number ? calculator v9 [MULUopt](68000)
number of digits (up to 9248)? 3000
314...  34.72
Code:
pi-amiga1200-8
number ? calculator v8 [MULUopt](68020)
number of digits (up to 9252)? 3000
314...  36.72
Code:
pi-amiga1200-8mo
number ? calculator v8 [MULUopt](68020)
number of digits (up to 9248)? 3000
314...  35.54
Code:
pi-amiga1200-9
number ? calculator v9(68020)
number of digits (up to 9252)? 3000
314...  35.00
Code:
pi-amiga1200-9mo
number ? calculator v9 [MULUopt](68020)
number of digits (up to 9248)? 3000
314...  34.94

EDIT 2:

I played around with 'pi-amiga-9.asm' and modified it (edit/compile) to not print the digits via "Write(a6)" to stdout (under PR0000 label), just to see how much it would gain, and the result was 32.80 seconds, so that saved roughly 2 seconds.

BTW: Noticed you have "jmp Write(a6)" on line 307 in 'pi-amiga-9.asm', shouldn't that be "jsr Write(a6)"?

Last edited by modrobert; 02 May 2021 at 21:56.
modrobert is offline  
Old 02 May 2021, 20:26   #27
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by modrobert View Post
While waiting for new electrolytic capacitors to ship for my original A1200 PSU (it needs recapping), I have connected a PC ATX 250W power supply temporarily which gives me more options.
Thank you very much. Your results clarify some things. So it seems that the BVS-optimization saves 2 cycles and the MULUopt-optimization on the 68020 saves 1. However it is a big mystery why the 68020 code (with DIVUL) is always slower than the plain 68000 code?! This is impossible! I can only suggest that we have some random deviations and your results showed cases when these deviations were in favor of the 68000 code. Maybe only the long run of PI-AMIGA-9MO and PI-AMIGA1200-9MO for 9000 digits can help to find the truth (you can redirect the output to a file to minimize the part of IO-timings).
BTW could you detach fast RAM and run PI-AMIGA-9MO for 3000 digits? Indeed it would also be nice to get results from your 68030 hardware sometime in the future.


Quote:
Originally Posted by modrobert View Post
I played around with 'pi-amiga-9.asm' and modified it (edit/compile) to not print the digits via "Write(a6)" to stdout (under PR0000 label), just to see how much it would gain, and the result was 32.80 seconds, so that saved roughly 2 seconds.

BTW: Noticed you have "jmp Write(a6)" on line 307 in 'pi-amiga-9.asm', shouldn't that be "jsr Write(a6)"?
You can just set IO=0 to disable the screen output. If you selected format option for CPU+IO you would get CPU and IO time separately: the ER-value is calculated basing only on the pure CPU time, no IO time is taken into account.
The JMP instruction is ok - how could it work if it was wrong?
litwr is offline  
Old 02 May 2021, 21:19   #28
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by litwr View Post
Thank you very much. Your results clarify some things. So it seems that the BVS-optimization saves 2 cycles and the MULUopt-optimization on the 68020 saves 1. However it is a big mystery why the 68020 code (with DIVUL) is always slower than the plain 68000 code?! This is impossible! I can only suggest that we have some random deviations and your results showed cases when these deviations were in favor of the 68000 code. Maybe only the long run of PI-AMIGA-9MO and PI-AMIGA1200-9MO for 9000 digits can help to find the truth (you can redirect the output to a file to minimize the part of IO-timings).
BTW could you detach fast RAM and run PI-AMIGA-9MO for 3000 digits? Indeed it would also be nice to get results from your 68030 hardware sometime in the future.
Code:
pi-amiga-9mo
number ? calculator v9 [MULUopt](68000)
number of digits (up to 9248)? 9000
314...  291.88
Code:
pi-amiga1200-9mo
number ? calculator v9 [MULUopt](68020)
number of digits (up to 9248)? 9000
314...  294.38

Without fastram...
Code:
pi-amiga-9mo
number ? calculator v9 [MULUopt](68000)
number of digits (up to 9248)? 3000
314...  36.84
Quote:
Originally Posted by litwr View Post
You can just set IO=0 to disable the screen output. If you selected format option for CPU+IO you would get CPU and IO time separately: the ER-value is calculated basing only on the pure CPU time, no IO time is taken into account.
The JMP instruction is ok - how could it work if it was wrong?
Aha, OK, missed the IO constant in the source code.

I noticed you don't select any type of RAM "clr.l d1" when allocating "AllocMem(a6)", according to documentation it should pick fastram first, but not sure if that can be overridden by compiler settings?
modrobert is offline  
Old 02 May 2021, 21:45   #29
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
litwr,

Check the attached screenshot regarding number of CPU cycles for DIVU.L.

Source:
https://www.nxp.com/docs/en/data-sheet/MC68020UM.pdf

Last edited by BippyM; 01 June 2021 at 18:24.
modrobert is offline  
Old 03 May 2021, 15:18   #30
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by modrobert View Post
Code:
pi-amiga-9mo
number ? calculator v9 [MULUopt](68000)
number of digits (up to 9248)? 9000
314...  291.88
Code:
pi-amiga1200-9mo
number ? calculator v9 [MULUopt](68020)
number of digits (up to 9248)? 9000
314...  294.38

Without fastram...
Code:
pi-amiga-9mo
number ? calculator v9 [MULUopt](68000)
number of digits (up to 9248)? 3000
314...  36.84
I noticed you don't select any type of RAM "clr.l d1" when allocating "AllocMem(a6)", according to documentation it should pick fastram first, but not sure if that can be overridden by compiler settings?
Thank you very much. However your results look impossible. Both programs execute the same code
Code:
        divu d4,d6
         bvs.s .longdiv

         moveq.l #0,d7
and your results show that their timings are different.
Code sequences at .longdiv are different but the branch to .longdiv is taken almost never, it is about 1 branch taken of 10,000,000 cases. I can only suggest one plausible explanation. Maybe your results are caused by programs run order. You ran PI-AMIGA-9 first. I can assume that this processing makes your hardware a bit hotter and slower. IMHO if you run PI-AMIGA1200-9MO first then it will be faster. If it is not right then I am completely baffled.

Thank you for your remark about the AllocMem invocation. But I can't understand what compiler setting can affect memory allocation function call...
litwr is offline  
Old 03 May 2021, 15:20   #31
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by modrobert View Post
litwr,

Check the attached screenshot regarding number of CPU cycles for DIVU.L.

Source:
https://www.nxp.com/docs/en/data-sheet/MC68020UM.pdf
Interestingly that it seems DIVUL.L and two variants of DIVU.L have the same timings despite of they have quite different amount of work to do...
litwr is offline  
Old 03 May 2021, 15:42   #32
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by litwr View Post
Interestingly that it seems DIVUL.L and two variants of DIVU.L have the same timings despite of they have quite different amount of work to do...
I meant compared to DIVU.W, close to twice the number of CPU clocks at 76-79 for DIVU.L (and roughly 20 times slower than average instructions) which could be the reason 68020 code doesn't make much difference in this case.

Last edited by modrobert; 03 May 2021 at 15:52.
modrobert is offline  
Old 03 May 2021, 15:53   #33
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Regarding the timings of instructions, It should be pointed out that the timings in the Motorola manuals for the DIV and MUL instructions represent the maximum number of cycles they can take*, not the minimum. The actual timing varies depending on amongst other things the given input, though Motorola has not included information about how much this variation is**.

If I had to guess, I'd say that the 64 bit DIV instructions will probably perform worse than the 32 bit ones.

*) From the manual linked above, page 8-11: "This CC time is a maximum since the times given for the MULU.L and DIVS.L are maximums.".

**) See the introduction to chapter 8: "This section describes the instruction execution and operations (table searches, etc.) of the MC68020/EC020 in terms of external clock cycles. It provides accurate execution and operation timing guidelines but not exact timings for every possible circumstance. This approach is used since exact execution time for an instruction or operation is highly dependent on memory speeds and other variables."
roondar is online now  
Old 03 May 2021, 16:24   #34
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by roondar View Post
Regarding the timings of instructions, It should be pointed out that the timings in the Motorola manuals for the DIV and MUL instructions represent the maximum number of cycles they can take*, not the minimum. The actual timing varies depending on amongst other things the given input, though Motorola has not included information about how much this variation is**.

If I had to guess, I'd say that the 64 bit DIV instructions will probably perform worse than the 32 bit ones.

*) From the manual linked above, page 8-11: "This CC time is a maximum since the times given for the MULU.L and DIVS.L are maximums.".

**) See the introduction to chapter 8: "This section describes the instruction execution and operations (table searches, etc.) of the MC68020/EC020 in terms of external clock cycles. It provides accurate execution and operation timing guidelines but not exact timings for every possible circumstance. This approach is used since exact execution time for an instruction or operation is highly dependent on memory speeds and other variables."
Looks pretty clear to me, they have columns ranging from "best case" to "worst case", and this is how to interpret the data in section 8.2.8.



Time wise you can do roughly 20 ("average speed" ~4 clocks) instructions for the cost of one DIVU.L!

Last edited by modrobert; 18 February 2024 at 17:10.
modrobert is offline  
Old 03 May 2021, 16:42   #35
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Best case through worst case in those diagrams only refer to time with instruction overlap/no overlap but in cache/no overlap not in cache*. They do not refer to the differences in timing for DIV/MUL due to different values or operand size being passed (in the case of the 64 bit operations).

*) The last one is especially interesting because it assumes memory access is penalized by just 1 cycle per access, which is way better than what the A1200 without Fast RAM actually manages.

Edit: on a side note, it's very logical that DIV/MUL have different execution speeds based on differing inputs as the amount of work needed to be done for both does vary based on input.
roondar is online now  
Old 03 May 2021, 16:46   #36
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Here are the disassembled files attached for 'pi-amiga-9mo' and 'pi-amiga1200-9mo' binaries from 'pi-amiga-cmp.zip'. As you can see the 'pi-amiga1200-9mo' includes "divul.l d4,d7:d6", but this part of the code is not called much according to litwr. Lots of DIVU.W instructions in both which is slow as well.

Code:
lab_fa:
   divul.l  d4,d7:d6
   move.w   d7,(a3)
   bra.b    lab_12e

Last edited by BippyM; 01 June 2021 at 18:24.
modrobert is offline  
Old 03 May 2021, 17:26   #37
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,215
Quote:
Originally Posted by litwr View Post
Interestingly that it seems DIVUL.L and two variants of DIVU.L have the same timings despite of they have quite different amount of work to do...

Not really, they have the same amount of work to do. Your average division algorithm creates the remainder as by-product anyhow. The typical division implementation is a 2nbits/nbits division.


IOWs, the underlying algorithm is probably much the same for all division versions, just that some of the outputs are thrown away and/or some of the inputs are assumed to be zero.
Thomas Richter is offline  
Old 03 May 2021, 17:41   #38
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by modrobert View Post
Here are the disassembled files attached for 'pi-amiga-9mo' and 'pi-amiga1200-9mo' binaries from 'pi-amiga-cmp.zip'. As you can see the 'pi-amiga1200-9mo' includes "divul.l d4,d7:d6", but this part of the code is not called much according to litwr. Lots of DIVU.W instructions in both which is slow as well.

Code:
lab_fa:
   divul.l  d4,d7:d6
   move.w   d7,(a3)
   bra.b    lab_12e
You can add temporary counter how many times this routine is called, if only a few times like about 10 times, i will rewrote fully this pi routine and remove bvs check, because pi is constant. i will call longdiv routine only when is necessary. Something like:

245 times divu.w
1 time divu.l
387 times divu.w
1 time divu.l
1647 times divu.w
2 times divu.l
etc

up to f.e 10000 times

if someone called this routine more than 10000 calls then old routine after 10000 times can be called within bvs check.
Don_Adan is offline  
Old 03 May 2021, 17:50   #39
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by roondar View Post
Regarding the timings of instructions, It should be pointed out that the timings in the Motorola manuals for the DIV and MUL instructions represent the maximum number of cycles they can take*, not the minimum. The actual timing varies depending on amongst other things the given input, though Motorola has not included information about how much this variation is**.

If I had to guess, I'd say that the 64 bit DIV instructions will probably perform worse than the 32 bit ones.

*) From the manual linked above, page 8-11: "This CC time is a maximum since the times given for the MULU.L and DIVS.L are maximums.".

**) See the introduction to chapter 8: "This section describes the instruction execution and operations (table searches, etc.) of the MC68020/EC020 in terms of external clock cycles. It provides accurate execution and operation timing guidelines but not exact timings for every possible circumstance. This approach is used since exact execution time for an instruction or operation is highly dependent on memory speeds and other variables."
My gosh! Have anybody ever tried to find those minimums? IMHO it is quite possible that minimum = maximum because some division implementations have constant time of executions for any their arguments.

Quote:
Originally Posted by modrobert View Post
Here are the disassembled files attached for 'pi-amiga-9mo' and 'pi-amiga1200-9mo' binaries from 'pi-amiga-cmp.zip'. As you can see the 'pi-amiga1200-9mo' includes "divul.l d4,d7:d6", but this part of the code is not called much according to litwr. Lots of DIVU.W instructions in both which is slow as well.

Code:
lab_fa:
   divul.l  d4,d7:d6
   move.w   d7,(a3)
   bra.b    lab_12e
Why make a disassembly, you have sources. I can again confirm that DIVUL is executed almost never.
litwr is offline  
Old 03 May 2021, 17:52   #40
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
You can add temporary counter how many times this routine is called, if only a few times like about 10 times, i will rewrote fully this pi routine and remove bvs check, because pi is constant.
IMHO it is much easier just to believe the author of the code.
litwr is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68020 Bit Field Instructions mcgeezer Coders. Asm / Hardware 9 27 October 2023 23:21
68060 64-bit integer math BSzili Coders. Asm / Hardware 7 25 January 2021 21:18
Discovery: Math Audio Snow request.Old Rare Games 30 20 August 2018 12:17
Math apps mtb support.Apps 1 08 September 2002 18:59

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 10:39.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.23747 seconds with 16 queries