68000 code optimisations - Page 4

Photon · 03 May 2012, 02:00

Quote:

Originally Posted by TheDarkCoder

may you explain to me these two optimizations?

Well, the prefetch feature of the 68000 is no secret. My theory was, this will cause an internal stage and another internal stage XOR a word memory access (total 2x 4 cycles) to be executed before a blit starts. The theory hasn't been tested for much more than DIVSing perspective while drawing lines 23 years ago, basically cos I couldn't find anything else that was more useful that wasn't a normal sub-12 cycle instruction. Sometimes the lines would be shorter than the DIVS cycle time of course, but it was rare. The point was that it was started immediately thus calculated internally in parallel finishing faster.

The other one is aligning table lookups (usually 14c) or a taken branch (10c) and similar with the alternating 4-cycle CMA/DMA memory access. In the vblank period, when no DMA is active it's "as written", just sum up the cycles. But while actually displaying something

some of them would just be out of luck and have their CMA execute the NEXT 4-cycle slot the bitplane DMA wasn't hogging access.

Normally this is too much work really (really!) since you can't really go "oh, I'll halve the number of colors on screen and I'll be able to fit one or two CMA's between bitplane accesses" cos you'd have ruined the original idea (by making it look shit) and also would have gained much more already, by removing a bitplane's DMA, both for blitter and CPU.

So I haven't tried this unless I got some routine right by accident

so I expect someone is going to debunk it instantly (and thunderously!)

But it's basically the last straw to grip when a frameful of effect is a sequence of perfectly and godlikely optimized instructions (according to yourself of course). Considering you'd likely have to sync with raster at the start of something, you'd probably lose more by that sync than you gained! But there might be a situation. Not one that wouldn't be completely 'surpassed' by precalc or infinite bobs or whatever, of course. Hah.

meynaf · 20 May 2012, 17:26

A few coding tricks, not specific to 68000 but still ok there i think :

Code:

; sgn - returns d1=0 if d0=0, d1=1 if d0>0, or d1=-1 if d0<0
 add.l d0,d0
 subx.l d1,d1
 negx.l d0
 addx.l d1,d1

; quick-test to check if one byte of d0 is 00
 move.l d0,d1
 not.l d0
 sub.l #$01010101,d1
 and.l #$80808080,d0
 and.l d0,d1
 bne null_byte_found

; check if a=b, (d0=a, d1=b, range 0000-7FFF), but true in all cases if b=$ffff
 eor.w d1,d0
 bgt not_equal

; instead of :
 scs d0
 ext.w d0  ; or extb.l d0
 ext.l d0  ;
; write :
 subx.l d0,d0

; to see if a value is between $FFFF8000 and $7FFF, better put it in An reg :
 cmpa.w a0,a0  ; cmp with a0.w extended to .l, and a0.l

Photon · 25 May 2012, 08:36

Trying to optimize often leads to a few lines of strange new code that does it faster or shorter but looks irrelevant to the task

Even though this looks more like good code for implementing variable typing in a higher level language, I liked it.

pmc · 25 May 2012, 10:28

Quote:

Originally Posted by Photon

Trying to optimize often leads to a few lines of strange new code that does it faster or shorter but looks irrelevant to the task

So true. Leading to the weird experience of looking at some of your very own code and thinking: huh? what the hell was I doing that for?

Followed by a few minutes of groping through hazy memories and realising: oh, yeah, that's why.

It's another reason why I personally find it difficult to nigh on impossible to figure out other people's demo code. They did so many weird little things and optimisations that only they understood the reason for that I've got no chance.

Much easier to code your own effects from scratch than figure out how some other coder did it their way.

phx · 25 May 2012, 12:58

That's why most programming languages allow comments.

meynaf · 28 May 2012, 09:40

Quote:

Originally Posted by phx

That's why most programming languages allow comments.

I second that. Whenever i use a "trick" in asm, i add comments for each line.

But, of course, when you re-source a program, comments are gone

Lonewolf10 · 31 May 2012, 22:31

Some great tips here

Anyone else think this thread is worthy of being made "sticky"?

Regards,
Lonewolf10

TheDarkCoder · 04 June 2012, 09:25

I second that. But I would suggest to move the thread in the ASM section

meynaf · 07 June 2012, 08:47

Perhaps there is a little bit too much OT here to do that, huh ?

pmc · 15 June 2012, 17:26

Instead of this:

Code:

                    cmpi.l              #num,Dn

this:

Code:

                    moveq.l             #num,Dn
                    cmp.l               Dn,Dn

where (to suit the moveq.l) num is in the range -128 to +127

Photon · 15 June 2012, 21:21

Yes, and one I use a lot is masking or subtracting stuff "to another register". When you do gfx stuff you often go

Code:

moveq #15,d1
and.w d0,d1

which retains the value in d0 should you need to mirror it or something.

Thread moved to asm forum and made sticky 8)

StingRay · 15 June 2012, 23:23

Quote:

Originally Posted by pmc

Instead of this:

Code:

                    cmpi.l              #num,Dn

this:

Code:

                    moveq.l             #num,Dn
                    cmp.l               Dn,Dn

where (to suit the moveq.l) num is in the range -127 to +127

moveq range is -128 to +127.

pmc · 16 June 2012, 11:35

Doh! Stung again!

Original post now edited.

TheDarkCoder · 16 June 2012, 16:55

@pmc: you probably meant to keep the small immediate value in a register different to the one against which you do the cmp:

moveq.l #num,Dx
cmp.l Dx,Dn

pmc · 16 June 2012, 17:21

Yes. That was supposed to be implied but, as you say, it reads rather ambiguously. It's much more clear written your way.

pmc · 20 June 2012, 08:41

Here's on optimisation I worked out ages ago using shifting and adding instead of multiplying.

Maybe obvious or well known to others, or maybe not but well... might be useful to someone

As an example, the eight binary digits in a byte represent the decimal numbers:

128 64 32 16 8 4 2 1

So, multiplying a number by shifting is pretty easy: shift left once to multiply a number by 2 for example.

Code:

                    lsl.w               #1,d0

or shift left by three to multiply a number by 8 etc. etc.

But what about multiplying to other numbers?

I've found that it's possible to do a couple of shifts and an add to multiply to other numbers.

For example, if I wanted to multiply a number by 40 - in the binary digits above, there's no 40 but there are a 32 and an 8.

And, as luck would have it, 32 + 8 = 40

So, if I shift a number left by five (multiply by 32) and take the same number and shift it left by three (multiply by 8) and then add the two results

Code:

                    move.w              d0,d1
                    lsl.w               #3,d0
                    lsl.w               #5,d1
                    add.w               d1,d0

I get the original number in d0 multiplied by 40 but without having to use a comparatively slow mulu.w

It can also work with more than two additions - if I wanted to multiply a number by 56:

Code:

                    move.w              d0,d1
                    move.w              d0,d2
                    lsl.w               #3,d0
                    lsl.w               #4,d1
                    lsl.w               #5,d2
                    add.w               d2,d1
                    add.w               d1,d0

This works for other numbers too - try it and see what works. Obviously there'll be a cut off somewhere where all the shifting and adding will mount up and it might be quicker or no different to use a mulu.w instead.

Also, you do have to watch out for shifting digits "off the end" of the size the registers in use can hold.

StingRay · 20 June 2012, 08:52

On 68000, instead of doing shifts/adds, you could also use a multiplication table. Disadvantage is that you might need a spare address register (depending on where in memory your table is) and that the table needs some memory of course. Advantage is that it is faster than lots of shifts+adds.

So f.e. if you want to multiply a number in d0 by 56 you'd do this:

lea multab(pc),a0
add.w d0,d0
move.w (a0,d0.w),d0
...

pmc · 20 June 2012, 09:16

Oh yes, definitely - pre multiplying your values and just dragging them out of a table by index is better than doing calculations in the code but you know, it's nice to have options

StingRay · 20 June 2012, 09:48

Of course it is.

Sometimes you don't even have the memory required for the table f.e. I just mentioned it for the sake of completeness.

Codetapper · 20 June 2012, 10:24

If you want to multiply by "nice" numbers like 40 on a 68000 and not use a multiplication table, you should do the following to avoid a second slow lsl operation rather than pmc's method:

Code:

        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        add.w   d1,d1   ;d1 = Number * 16
        add.w   d1,d1   ;d1 = Number * 32
        add.w   d1,d0   ;d0 = Number * 40

If shifting left by 2 or less, it's quicker to do 2 add's than a shift (on 68000 only!) If you wanted to multiply by 48 for example, it's actually quicker than to multiply by 40:

Code:

        lsl.w   #4,d0   ;d0 = Number * 16
        move.w  d0,d1   ;d1 = Number * 16
        add.w   d1,d1   ;d1 = Number * 32
        add.w   d1,d0   ;d0 = Number * 48

To multiply by 56:

Code:

        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        add.w   d1,d1   ;d1 = Number * 16
        move.w  d1,d2   ;d2 = Number * 16
        add.w   d2,d2   ;d2 = Number * 32
        add.w   d2,d1   ;d1 = Number * 48
        add.w   d1,d0   ;d0 = Number * 56

The other alternative is to use the fact that 56 is 64 - 8:

Code:

        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        lsl.w   #3,d0   ;d0 = Number * 64
        sub.w   d1,d0   ;d0 = Number * 56

Shifts of 3 or more are faster with a shift operation than 3 add's.

15 June 2012, 17:26	#70
pmc gone Join Date: Apr 2007 Location: completely gone Posts: 1,596	Instead of this: Code: cmpi.l #num,Dn this: Code: moveq.l #num,Dn cmp.l Dn,Dn where (to suit the moveq.l) num is in the range -128 to +127 Last edited by pmc; 16 June 2012 at 11:36. Reason: got Stung

15 June 2012, 21:21	#71
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,698	Yes, and one I use a lot is masking or subtracting stuff "to another register". When you do gfx stuff you often go Code: moveq #15,d1 and.w d0,d1 which retains the value in d0 should you need to mirror it or something. Thread moved to asm forum and made sticky 8)

20 June 2012, 08:41	#76
pmc gone Join Date: Apr 2007 Location: completely gone Posts: 1,596	Here's on optimisation I worked out ages ago using shifting and adding instead of multiplying. Maybe obvious or well known to others, or maybe not but well... might be useful to someone As an example, the eight binary digits in a byte represent the decimal numbers: 128 64 32 16 8 4 2 1 So, multiplying a number by shifting is pretty easy: shift left once to multiply a number by 2 for example. Code: lsl.w #1,d0 or shift left by three to multiply a number by 8 etc. etc. But what about multiplying to other numbers? I've found that it's possible to do a couple of shifts and an add to multiply to other numbers. For example, if I wanted to multiply a number by 40 - in the binary digits above, there's no 40 but there are a 32 and an 8. And, as luck would have it, 32 + 8 = 40 So, if I shift a number left by five (multiply by 32) and take the same number and shift it left by three (multiply by 8) and then add the two results Code: move.w d0,d1 lsl.w #3,d0 lsl.w #5,d1 add.w d1,d0 I get the original number in d0 multiplied by 40 but without having to use a comparatively slow mulu.w It can also work with more than two additions - if I wanted to multiply a number by 56: Code: move.w d0,d1 move.w d0,d2 lsl.w #3,d0 lsl.w #4,d1 lsl.w #5,d2 add.w d2,d1 add.w d1,d0 This works for other numbers too - try it and see what works. Obviously there'll be a cut off somewhere where all the shifting and adding will mount up and it might be quicker or no different to use a mulu.w instead. Also, you do have to watch out for shifting digits "off the end" of the size the registers in use can hold.

20 June 2012, 10:24	#80
Codetapper 2 contact me: email only! Join Date: May 2001 Location: Auckland / New Zealand Posts: 3,187	If you want to multiply by "nice" numbers like 40 on a 68000 and not use a multiplication table, you should do the following to avoid a second slow lsl operation rather than pmc's method: Code: lsl.w #3,d0 ;d0 = Number * 8 move.w d0,d1 ;d1 = Number * 8 add.w d1,d1 ;d1 = Number * 16 add.w d1,d1 ;d1 = Number * 32 add.w d1,d0 ;d0 = Number * 40 If shifting left by 2 or less, it's quicker to do 2 add's than a shift (on 68000 only!) If you wanted to multiply by 48 for example, it's actually quicker than to multiply by 40: Code: lsl.w #4,d0 ;d0 = Number * 16 move.w d0,d1 ;d1 = Number * 16 add.w d1,d1 ;d1 = Number * 32 add.w d1,d0 ;d0 = Number * 48 To multiply by 56: Code: lsl.w #3,d0 ;d0 = Number * 8 move.w d0,d1 ;d1 = Number * 8 add.w d1,d1 ;d1 = Number * 16 move.w d1,d2 ;d2 = Number * 16 add.w d2,d2 ;d2 = Number * 32 add.w d2,d1 ;d1 = Number * 48 add.w d1,d0 ;d0 = Number * 56 The other alternative is to use the fact that 56 is 64 - 8: Code: lsl.w #3,d0 ;d0 = Number * 8 move.w d0,d1 ;d1 = Number * 8 lsl.w #3,d0 ;d0 = Number * 64 sub.w d1,d0 ;d0 = Number * 56 Shifts of 3 or more are faster with a shift operation than 3 add's. Last edited by Codetapper; 20 June 2012 at 10:44.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
68000 boot code	billt	Coders. General	15	05 May 2012 20:13
Wasted Dreams on 68000	sanjyuubi	support.Games	5	27 May 2011 17:11
680x0 to 68000	Counia	Hardware mods	1	01 March 2011 10:18
quitting on 68000?	Hungry Horace	project.WHDLoad	60	19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1	EdzUp	Retrogaming General Discussion	0	10 February 2002 11:40

25 May 2012, 08:36	#63
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,698	Trying to optimize often leads to a few lines of strange new code that does it faster or shorter but looks irrelevant to the task Even though this looks more like good code for implementing variable typing in a higher level language, I liked it.

25 May 2012, 12:58	#65
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,553	That's why most programming languages allow comments.

31 May 2012, 22:31	#67
Lonewolf10 AMOS Extensions Developer Join Date: Jun 2007 Location: near Cambridge, UK Age: 44 Posts: 1,924	Some great tips here Anyone else think this thread is worthy of being made "sticky"? Regards, Lonewolf10

04 June 2012, 09:25	#68
TheDarkCoder Registered User Join Date: Dec 2007 Location: Dark Kingdom Posts: 213	I second that. But I would suggest to move the thread in the ASM section

07 June 2012, 08:47	#69
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,365	Perhaps there is a little bit too much OT here to do that, huh ?

16 June 2012, 11:35	#73
pmc gone Join Date: Apr 2007 Location: completely gone Posts: 1,596	Doh! Stung again! Original post now edited.

16 June 2012, 16:55	#74
TheDarkCoder Registered User Join Date: Dec 2007 Location: Dark Kingdom Posts: 213	@pmc: you probably meant to keep the small immediate value in a register different to the one against which you do the cmp: moveq.l #num,Dx cmp.l Dx,Dn

16 June 2012, 17:21	#75
pmc gone Join Date: Apr 2007 Location: completely gone Posts: 1,596	Yes. That was supposed to be implied but, as you say, it reads rather ambiguously. It's much more clear written your way.

20 June 2012, 08:52	#77
StingRay move.l #$c0ff33,throat Join Date: Dec 2005 Location: Berlin/Joymoney Posts: 6,865	On 68000, instead of doing shifts/adds, you could also use a multiplication table. Disadvantage is that you might need a spare address register (depending on where in memory your table is) and that the table needs some memory of course. Advantage is that it is faster than lots of shifts+adds. So f.e. if you want to multiply a number in d0 by 56 you'd do this: lea multab(pc),a0 add.w d0,d0 move.w (a0,d0.w),d0 ...

20 June 2012, 09:16	#78
pmc gone Join Date: Apr 2007 Location: completely gone Posts: 1,596	Oh yes, definitely - pre multiplying your values and just dragging them out of a table by index is better than doing calculations in the code but you know, it's nice to have options

20 June 2012, 09:48	#79
StingRay move.l #$c0ff33,throat Join Date: Dec 2005 Location: Berlin/Joymoney Posts: 6,865	Of course it is. Sometimes you don't even have the memory required for the table f.e. I just mentioned it for the sake of completeness.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)