68000 code optimisations - Page 6

Gunnar · 05 June 2014, 16:44

Quote:

Originally Posted by NorthWay

Use movep to get a byte to the upper half of a word.

A series of subrutine(OS?) calls like
jsr sub1
jsr sub2
jsr sub3
rts

can be turned on the head as
pea sub3
pea sub2
jmp sub1

optionally push the continue address first. I did this for my Exec optimizes.

While this is OK for an old 68000 doing this is bad for new chips.

Modern Cores which have a linkstack will run the original code
jsr sub1
jsr sub2
jsr sub3
rts

Much faster.

Photon · 06 June 2014, 01:01

Remove the rts and make the last jsr a jmp.

meynaf · 09 June 2014, 08:35

Quote:

Originally Posted by Gunnar

While this is OK for an old 68000 doing this is bad for new chips.

Provided there will be new chips at all

Quote:

Originally Posted by Gunnar

Modern Cores which have a linkstack will run the original code
jsr sub1
jsr sub2
jsr sub3
rts

Much faster.

Much faster ? Not necessarily. After the first RTS is seen and the linkstack is found to be obsolete, i think the linkstack will/should somehow readjust itself and then run at full speed for subsequent RTS.

Anyway if you have lots of JSR/RTS in your critical path, you're doing something wrong

Gunnar · 09 June 2014, 08:57

Quote:

Originally Posted by meynaf

Provided there will be new chips at all

Much faster ? Not necessarily. After the first RTS is seen and the linkstack is found to be obsolete, i think the linkstack will/should somehow readjust itself and then run at full speed for subsequent RTS.

Anyway if you have lots of JSR/RTS in your critical path, you're doing something wrong

Apollo needs 2 cycles for a BSR and 2 cycles for a RTS (hitting the Linkstack)
The BSR and RTS can be executed in parallel with another instructions in this cycle. (Not on the Vampire600 as the SS is of here because of FPGA size :-( )
This means the 2 cycle subroutine overhead is not much.

meynaf · 09 June 2014, 09:10

And how many clocks does it take for a missed linkstack ?

Photon · 09 June 2014, 19:39

Quote:

Originally Posted by Gunnar

Apollo needs 2 cycles for a BSR and 2 cycles for a RTS (hitting the Linkstack)
The BSR and RTS can be executed in parallel with another instructions in this cycle. (Not on the Vampire600 as the SS is of here because of FPGA size :-( )
This means the 2 cycle subroutine overhead is not much.

Define what you mean by "Apollo", "Vampire600", and "Modern core".

Gunnar · 09 June 2014, 20:51

Quote:

Originally Posted by Photon

Define what you mean by "Apollo", "Vampire600", and "Modern core".

Apollo = http://www.apollo-core.com

Vampire 600 = http://www.majsta.com

Modern Core = see Apollo

matthey · 09 June 2014, 22:13

Quote:

Originally Posted by Photon

Remove the rts and make the last jsr a jmp.

I hope so. I've been using this optimization everywhere I can as rts is slow without a link stack. I believe this optimization would be good even with a link stack and on all of the 68k family.

To be clear, optimize this:

Code:

   jsr sub1
   jsr sub2
   jsr sub3
   rts

to this:

Code:

   jsr sub1
   jsr sub2
   jmp sub3

Many assemblers like vasm can optimize the jsr->bsr and jmp->bra saving more cycles when possible.

Thorham · 09 June 2014, 22:29

Shouldn't you inline those subroutines to get rid of the function call overhead?

Mrs Beanbag · 09 June 2014, 22:44

Quote:

Originally Posted by Thorham

Shouldn't you inline those subroutines to get rid of the function call overhead?

depending on how big they are

NorthWay · 09 June 2014, 22:53

Quote:

Originally Posted by Thorham

Shouldn't you inline those subroutines to get rid of the function call overhead?

Sometimes those functions are in fact OS calls, and even if the OS itself can do an internal call I don't like it and think it is messy.

You can always do the tail optimize though.

Thorham · 09 June 2014, 23:20

Quote:

Originally Posted by Mrs Beanbag

depending on how big they are

Is that relevant? You'd only inline larger functions when the code is more or less finished. Keep a tidy reference version, and do a messy optimized version based on that.

Quote:

Originally Posted by NorthWay

Sometimes those functions are in fact OS calls

But what are OS calls doing in tight loops?

Mrs Beanbag · 09 June 2014, 23:34

Quote:

Originally Posted by Thorham

Is that relevant? You'd only inline larger functions when the code is more or less finished. Keep a tidy reference version, and do a messy optimized version based on that.

you wouldn't want a large function copied umpty times in your code, besides the saving in terms of cycles compared to the time spent in the function wouldn't be worth the mess.

Thorham · 10 June 2014, 01:53

Quote:

Originally Posted by Mrs Beanbag

you wouldn't want a large function copied umpty times in your code

But how many different tight loops of 100000+ iterations are going to call the same large function?

Quote:

Originally Posted by Mrs Beanbag

the mess

That's why you keep a tidy reference version. It's the version that's done, except for optimizations which will be messy. You copy that, and make a mess in the copy. Inlining also means there's an opportunity to optimize more than just the function call. Depends on the code if it's worth it or not.

Photon · 10 June 2014, 22:34

Quote:

Originally Posted by Gunnar

Apollo = http://www.apollo-core.com

Vampire 600 = http://www.majsta.com

Modern Core = see Apollo

Please refrain from posting about nextgen here. Use the dedicated forum I created for that purpose.

Optimizations here should be based only on features present in legacy Motorola CPUs, or there will be confusion.

In a way, even within the 680x0, optimizations become more pointless the better the features. Caches take care of a lot, the CPU is already fast for what you can do with the rest of the Amiga, and so on.

The best optimizations are general-purpose with a specific gain. If the gain "umm... it depends", it's more likely a niche use that not a lot of coders will have occasion to find a use for.

Gunnar · 11 June 2014, 07:01

Quote:

Originally Posted by Photon

Please refrain from posting about nextgen here. Use the dedicated forum I created for that purpose.

Apollo/Phoenix is a member of 68K-family just like the 68040 or 68060.

britelite · 11 June 2014, 07:11

Quote:

Originally Posted by Gunnar

Apollo/Phoenix is a member of 68K-family just like the 68040 or 68060.

But he said "legacy Motorola CPU", which the Apollo/Phoenix is not.

EDIT: And considering the topic is "68000 code optimisations", I'd be interested in actually reading about 68000 tricks, not stuff for the later CPUs.

Gunnar · 11 June 2014, 07:36

Quote:

Originally Posted by britelite

EDIT: And considering the topic is "68000 code optimisations", I'd be interested in actually reading about 68000 tricks, not stuff for the later CPUs.

Maybe you should re-read my post:
I did not post optmization tricks for another CPU than 68000.
I only pointed out that trick XYZ works good on 68000 but has a drawback on another 68K CPU.

The same way you can say using MOVEP on 68000 might save some cycles,
but can have a huge penalty on an 68060 system.

Pointing out side effects for other 68K cores is good and helps coders.

britelite · 11 June 2014, 07:41

Quote:

Originally Posted by Gunnar

I did not post optmization tricks for another CPU than 68000.

Could be, I mainly reacted to your response to Photon. And he reacted to a post where you specifically wrote about the Apollo/Phoenix.

StingRay · 11 June 2014, 07:53

Quote:

Originally Posted by Gunnar

Maybe you should re-read my post:
I did not post optmization tricks for another CPU than 68000.
I only pointed out that trick XYZ works good on 68000 but has a drawback on another 68K CPU.

Maybe you should re-read what Photon wrote, he certainly has a good reason for that.

Quote:

Originally Posted by Gunnar

The same way you can say using MOVEP on 68000 might save some cycles,
but can have a huge penalty on an 68060 system.

And this thread is about 68040/60 optimisations since when?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
68000 boot code	billt	Coders. General	15	05 May 2012 20:13
Wasted Dreams on 68000	sanjyuubi	support.Games	5	27 May 2011 17:11
680x0 to 68000	Counia	Hardware mods	1	01 March 2011 10:18
quitting on 68000?	Hungry Horace	project.WHDLoad	60	19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1	EdzUp	Retrogaming General Discussion	0	10 February 2002 11:40

06 June 2014, 01:01	#102
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,604	Remove the rts and make the last jsr a jmp.

09 June 2014, 09:10	#105
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	And how many clocks does it take for a missed linkstack ?

09 June 2014, 22:29	#109
Thorham Computer Nerd Join Date: Sep 2007 Location: Rotterdam/Netherlands Age: 47 Posts: 3,764	Shouldn't you inline those subroutines to get rid of the function call overhead?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)