English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 October 2011, 09:31   #1
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Optimizing question: instruction order

Yesterday I was doing a review of some routines of mine. A couple of optimization questions came up to my mind,
wich, maybe are rather general. Before you tell me, I know that the best answer to my questions is "try different possibilities and mesure results",
but since it's a time consuming activity I ask for some "rule of thumb", if any is reasonable.


The routines performs back face culling of polygons and computation (with different formulas) of
the illumination of a surface with respect to a varying position light source.
So they have MULS and in some cases also DIVS. Moreover the routines have to do a pair of memory accesses that
do not depend on the results of the MULS and DIVS, i.e. I can insert the memory access instructions (almost)
in any place inside the routines.


The main target il 68000 + OCS (A500) with fast. Secondary target, any other 68k CPU.

Thinking in terms of my main target, I am doing the following assumption:
the relative order of the instructions, does not change the performance.
I.e. it does not matter (for speed in the main target) where I insert the memory access instructions

Question 1: is it really true?

On the other hand, I believe that for the sake of efficiency in 020+ it is better to interleave MULS and DIVS
with several instructions, so I chose to put the memory access just before or after a MULS or DIVS.

Question 2: is it (in general) a good idea?

Question 3: is it (in general) better to put the memory access before or after a MULS/DIVS?
(my guess is before, because I expect that, at least on 040 and 060, while the CPU waits for the memory access
to complete it can start the MULS/DIVS)

Question 4: Should I avoid waisting time thinking to general rules-of-thumb, which are impossible to give,
and stick to a try-and-measure approach?


PS: please apoligize my questions if are dumb. Yesterday I have managed to do a 1 hour coding session! It has been the first time since 2009! What a wondeful experience!
TheDarkCoder is offline  
Old 27 October 2011, 14:40   #2
KevG
Banned
 
Join Date: Jan 2009
Location: U.K.
Posts: 93
Question 1: How do you know without test results?

Question 2: How do you know without test results?

Question 3: How do you know without test results?

Question 4: The answer is YES.
KevG is offline  
Old 27 October 2011, 18:47   #3
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
1. Depends. If you are activating the blitter for example, then it definately matters. If you are only coding using the CPU, then I don't think so (unless you are trying to code for most of the 680x0 family).

2. No idea. Do some tests!

3. I'd say before is best, but I'd wait for the experts (cue Stingray and Leffmann )

4. Yes. It is very hard (impossible?) to code for all the 680x0 CPU family used in Amiga's, aswell as all the different hardware configurations. Stick with one machine (perhaps a popular one) then you can optimize it heavily


Personally, I'd normally think about using tables for MULS and DIVS. However, it sounds like what you are doing would require an awful lot of tables so you'd probably be better off just using the MULS and DIVS instead.

I'm no expert, still a relative newbie myself, but thought I'd offer my $0.02


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 27 October 2011, 20:10   #4
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by TheDarkCoder View Post
Thinking in terms of my main target, I am doing the following assumption:
the relative order of the instructions, does not change the performance.
I.e. it does not matter (for speed in the main target) where I insert the memory access instructions

Question 1: is it really true?
For 68000 I doubt that the order of instructions changes performance. For 68030 you can gain some speed by doing something with registers only after a write to memory (probably applies to 020/040 and 060 as well). This works even better when writing to chipmem as far as I know (although the behavior seems weird somehow, needs tests).

For 68060 the order certainly does matter. Reorder instructions so that the next instruction doesn't rely on results of the previous one.
Quote:
Originally Posted by TheDarkCoder View Post
On the other hand, I believe that for the sake of efficiency in 020+ it is better to interleave MULS and DIVS
with several instructions, so I chose to put the memory access just before or after a MULS or DIVS.

Question 2: is it (in general) a good idea?
Put them after a write. Doing it before does nothing as far as I know.
Quote:
Originally Posted by TheDarkCoder View Post
Question 3: is it (in general) better to put the memory access before or after a MULS/DIVS?
(my guess is before, because I expect that, at least on 040 and 060, while the CPU waits for the memory access
to complete it can start the MULS/DIVS)
As far as I know, always after for 68020+. First do a write, then do register work.
Quote:
Originally Posted by TheDarkCoder View Post
Question 4: Should I avoid waisting time thinking to general rules-of-thumb, which are impossible to give, and stick to a try-and-measure approach?
NO! Rules of thumb are a general guideline that will point you in the right direction, and are therefore quite useful. To get the best speed you'll still need to do tests, but keeping those rules of thumb in mind will mean your code will immediately be faster than when you don't apply them.
Quote:
Originally Posted by Lonewolf10 View Post
4. Yes. It is very hard (impossible?) to code for all the 680x0 CPU family used in Amiga's, aswell as all the different hardware configurations. Stick with one machine (perhaps a popular one) then you can optimize it heavily
True, but you can still apply rules of thumb to individual CPUs. It's best to try and see if you can get good performance on your lowest CPU target, finish that code and do a separate version for a higher target.
Thorham is offline  
Old 28 October 2011, 10:26   #5
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by Lonewolf10 View Post
1. Depends. If you are activating the blitter for example, then it definately matters. If you are only coding using the CPU, then I don't think so (unless you are trying to code for most of the 680x0 family).
Thanks for your answer Lonewolf. I didn't write well question 1.
A more clear formulation is:

1') In a 68000+OCS (+ optionally fast ram) tipical A500 setting, does the relative order of instructions affect performance?
While the routine run, it may be that the blitter is clearing the screen, but it may also be that it has already finished. How would you chose instruction ordering in this situation?
TheDarkCoder is offline  
Old 28 October 2011, 10:32   #6
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by Thorham View Post
For 68000 I doubt that the order of instructions changes performance. For 68030 you can gain some speed by doing something with registers only after a write to memory (probably applies to 020/040 and 060 as well). This works even better when writing to chipmem as far as I know (although the behavior seems weird somehow, needs tests).

For 68060 the order certainly does matter. Reorder instructions so that the next instruction doesn't rely on results of the previous one.
Put them after a write. Doing it before does nothing as far as I know.
As far as I know, always after for 68020+. First do a write, then do register work.
Thanks Thorham for your advices! :-)
Just to be sure that I understand correctly, since in my question the subject of the sentence is "the memory access instruction" while in your sentence the subject seems to me "the MUL/DIV/register work instructions" :
is your advice to put FIRST the memory access instructions and THEN the MUL/DIV/register work instruction ?
TheDarkCoder is offline  
Old 28 October 2011, 10:42   #7
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
@All

Thanks to any who answerd to my questions! Your advices are highly appriciated! :-)

I would like to clarify that I do NOT believe that "rules of thumb" can substitute tests.
I agree that to have the absolute best optimization one has to do tests.

I was asking for "rules of thumb" that may help:

1) to guide, as Thorham said, the test, i.e. to help excluding some non-optimal reordering
2) my main target is 68000, while as secondary target I don't have a specific CPU. Since I suspect the order does not matter for 68000 (or that there are many optimal ordering), I would like to test just on the 68000 to find a set of equally best ordering and then use "rule of thumb" to select among them one that is not too bad on any 020+
TheDarkCoder is offline  
Old 28 October 2011, 12:53   #8
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,515
Instruction re-ordering won't affect speed on a 68000 (it has no caches or write buffers etc..) but it can affect speed if accessing Agnus bus and other DMA channels are active and re-ordered instructions have different internal idle cycles.
Toni Wilen is offline  
Old 29 October 2011, 15:34   #9
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
The only proper answer imo is to get one of each platform you want to support, and code on them. I think you ask because you want to optimize "theoretically for all models" and not have to code on an A500/A600 Whereas in fact if you just code on an A600 you can just set different bg colors at the start of each routine to see where the bottlenecks are. It's dead easy and saves time by giving instant answers to optimization questions, it simply ends all "will this run fast enough on..." doubts, which is ace

I've ended up with only two choices, A500 512k slowmem (runs fine, speedups are a bonus) and A1200-060 (runs optimally, slowdowns are acceptable by lower hardware users).


The load-use reordering is fine on any machine but won't give optimization on CPUs without caches. Coding for "backward branch is assumed taken" is also good, nothing wrong with that.

But you will optimize much more by eliminating data redundancy, reordering data for sequential access, and reducing shifts, muls, divs and unnecessary memory accesses and instructions. On a higher level, eliminating unnecessary blits and recalculations.

When the code follows the above, the only optimizations left for 68000 are 1) time DMA start perfectly to somehow interleave DMA with CPU internal cycles (very hard), and 2) support putting [almost all] code+data in fastmem.

Now, the above description fits a coder god. My suggestion is to approach coding as problem-solving:
1. Problem: the program isn't finished. Solution: finish the program.
2. Problem: the program isn't fast enough: Solution: find out why.
3. Problem: routine x and y are taking too long. Solution: optimize the inner loops (ONLY) of routine x and y
4. Problem: Program is finished and fast enough, but I want to write god-code where every line is perfectly optimized. Solution: the problem is with you. Release the program, then fix you.

The last point is based on introspection, you may not have that problem
Photon is offline  
Old 29 October 2011, 17:07   #10
pmc
gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
Quote:
Originally Posted by Photon
My suggestion is to approach coding as problem-solving:
1. Problem: the program isn't finished. Solution: finish the program.
2. Problem: the program isn't fast enough: Solution: find out why.
3. Problem: routine x and y are taking too long. Solution: optimize the inner loops (ONLY) of routine x and y
4. Problem: Program is finished and fast enough, but I want to write god-code where every line is perfectly optimized. Solution: the problem is with you. Release the program, then fix you.
Now that's what I call a good set of rules of thumb!
pmc is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Layered tile engine optimizing. Thorham Coders. General 0 30 September 2011 20:43
Instruction cache question Lord Riton Coders. General 2 07 April 2011 12:25
Question about the TAS instruction. Thorham Coders. General 7 03 April 2011 13:12
Benching and optimizing CF-IDE speed Photon support.Hardware 12 15 July 2009 01:48

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:13.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.08677 seconds with 15 queries