English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 05 June 2014, 16:44   #101
Gunnar
Registered User
 
Join Date: Apr 2014
Location: Germany
Posts: 154
Quote:
Originally Posted by NorthWay View Post
Use movep to get a byte to the upper half of a word.

A series of subrutine(OS?) calls like
jsr sub1
jsr sub2
jsr sub3
rts

can be turned on the head as
pea sub3
pea sub2
jmp sub1

optionally push the continue address first. I did this for my Exec optimizes.
While this is OK for an old 68000 doing this is bad for new chips.

Modern Cores which have a linkstack will run the original code
jsr sub1
jsr sub2
jsr sub3
rts

Much faster.
Gunnar is offline  
Old 06 June 2014, 01:01   #102
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
Remove the rts and make the last jsr a jmp.
Photon is offline  
Old 09 June 2014, 08:35   #103
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Gunnar View Post
While this is OK for an old 68000 doing this is bad for new chips.
Provided there will be new chips at all

Quote:
Originally Posted by Gunnar View Post
Modern Cores which have a linkstack will run the original code
jsr sub1
jsr sub2
jsr sub3
rts

Much faster.
Much faster ? Not necessarily. After the first RTS is seen and the linkstack is found to be obsolete, i think the linkstack will/should somehow readjust itself and then run at full speed for subsequent RTS.

Anyway if you have lots of JSR/RTS in your critical path, you're doing something wrong
meynaf is offline  
Old 09 June 2014, 08:57   #104
Gunnar
Registered User
 
Join Date: Apr 2014
Location: Germany
Posts: 154
Quote:
Originally Posted by meynaf View Post
Provided there will be new chips at all


Much faster ? Not necessarily. After the first RTS is seen and the linkstack is found to be obsolete, i think the linkstack will/should somehow readjust itself and then run at full speed for subsequent RTS.

Anyway if you have lots of JSR/RTS in your critical path, you're doing something wrong
Apollo needs 2 cycles for a BSR and 2 cycles for a RTS (hitting the Linkstack)
The BSR and RTS can be executed in parallel with another instructions in this cycle. (Not on the Vampire600 as the SS is of here because of FPGA size :-( )
This means the 2 cycle subroutine overhead is not much.
Gunnar is offline  
Old 09 June 2014, 09:10   #105
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
And how many clocks does it take for a missed linkstack ?
meynaf is offline  
Old 09 June 2014, 19:39   #106
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
Quote:
Originally Posted by Gunnar View Post
Apollo needs 2 cycles for a BSR and 2 cycles for a RTS (hitting the Linkstack)
The BSR and RTS can be executed in parallel with another instructions in this cycle. (Not on the Vampire600 as the SS is of here because of FPGA size :-( )
This means the 2 cycle subroutine overhead is not much.
Define what you mean by "Apollo", "Vampire600", and "Modern core".
Photon is offline  
Old 09 June 2014, 20:51   #107
Gunnar
Registered User
 
Join Date: Apr 2014
Location: Germany
Posts: 154
Quote:
Originally Posted by Photon View Post
Define what you mean by "Apollo", "Vampire600", and "Modern core".
Apollo = http://www.apollo-core.com

Vampire 600 = http://www.majsta.com

Modern Core = see Apollo
Gunnar is offline  
Old 09 June 2014, 22:13   #108
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Photon View Post
Remove the rts and make the last jsr a jmp.
I hope so. I've been using this optimization everywhere I can as rts is slow without a link stack. I believe this optimization would be good even with a link stack and on all of the 68k family.

To be clear, optimize this:

Code:
   jsr sub1
   jsr sub2
   jsr sub3
   rts
to this:

Code:
   jsr sub1
   jsr sub2
   jmp sub3
Many assemblers like vasm can optimize the jsr->bsr and jmp->bra saving more cycles when possible.
matthey is offline  
Old 09 June 2014, 22:29   #109
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,764
Shouldn't you inline those subroutines to get rid of the function call overhead?
Thorham is offline  
Old 09 June 2014, 22:44   #110
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by Thorham View Post
Shouldn't you inline those subroutines to get rid of the function call overhead?
depending on how big they are
Mrs Beanbag is offline  
Old 09 June 2014, 22:53   #111
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Quote:
Originally Posted by Thorham View Post
Shouldn't you inline those subroutines to get rid of the function call overhead?
Sometimes those functions are in fact OS calls, and even if the OS itself can do an internal call I don't like it and think it is messy.

You can always do the tail optimize though.
NorthWay is offline  
Old 09 June 2014, 23:20   #112
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,764
Quote:
Originally Posted by Mrs Beanbag View Post
depending on how big they are
Is that relevant? You'd only inline larger functions when the code is more or less finished. Keep a tidy reference version, and do a messy optimized version based on that.

Quote:
Originally Posted by NorthWay View Post
Sometimes those functions are in fact OS calls
But what are OS calls doing in tight loops?
Thorham is offline  
Old 09 June 2014, 23:34   #113
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by Thorham View Post
Is that relevant? You'd only inline larger functions when the code is more or less finished. Keep a tidy reference version, and do a messy optimized version based on that.
you wouldn't want a large function copied umpty times in your code, besides the saving in terms of cycles compared to the time spent in the function wouldn't be worth the mess.
Mrs Beanbag is offline  
Old 10 June 2014, 01:53   #114
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,764
Quote:
Originally Posted by Mrs Beanbag View Post
you wouldn't want a large function copied umpty times in your code
But how many different tight loops of 100000+ iterations are going to call the same large function?

Quote:
Originally Posted by Mrs Beanbag View Post
the mess
That's why you keep a tidy reference version. It's the version that's done, except for optimizations which will be messy. You copy that, and make a mess in the copy. Inlining also means there's an opportunity to optimize more than just the function call. Depends on the code if it's worth it or not.
Thorham is offline  
Old 10 June 2014, 22:34   #115
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
Quote:
Originally Posted by Gunnar View Post
Apollo = http://www.apollo-core.com

Vampire 600 = http://www.majsta.com

Modern Core = see Apollo
Please refrain from posting about nextgen here. Use the dedicated forum I created for that purpose.

Optimizations here should be based only on features present in legacy Motorola CPUs, or there will be confusion.

In a way, even within the 680x0, optimizations become more pointless the better the features. Caches take care of a lot, the CPU is already fast for what you can do with the rest of the Amiga, and so on.

The best optimizations are general-purpose with a specific gain. If the gain "umm... it depends", it's more likely a niche use that not a lot of coders will have occasion to find a use for.
Photon is offline  
Old 11 June 2014, 07:01   #116
Gunnar
Registered User
 
Join Date: Apr 2014
Location: Germany
Posts: 154
Quote:
Originally Posted by Photon View Post
Please refrain from posting about nextgen here. Use the dedicated forum I created for that purpose.
Apollo/Phoenix is a member of 68K-family just like the 68040 or 68060.
Gunnar is offline  
Old 11 June 2014, 07:11   #117
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 818
Quote:
Originally Posted by Gunnar View Post
Apollo/Phoenix is a member of 68K-family just like the 68040 or 68060.
But he said "legacy Motorola CPU", which the Apollo/Phoenix is not.

EDIT: And considering the topic is "68000 code optimisations", I'd be interested in actually reading about 68000 tricks, not stuff for the later CPUs.

Last edited by britelite; 11 June 2014 at 07:22.
britelite is offline  
Old 11 June 2014, 07:36   #118
Gunnar
Registered User
 
Join Date: Apr 2014
Location: Germany
Posts: 154
Quote:
Originally Posted by britelite View Post
EDIT: And considering the topic is "68000 code optimisations", I'd be interested in actually reading about 68000 tricks, not stuff for the later CPUs.
Maybe you should re-read my post:
I did not post optmization tricks for another CPU than 68000.
I only pointed out that trick XYZ works good on 68000 but has a drawback on another 68K CPU.

The same way you can say using MOVEP on 68000 might save some cycles,
but can have a huge penalty on an 68060 system.

Pointing out side effects for other 68K cores is good and helps coders.
Gunnar is offline  
Old 11 June 2014, 07:41   #119
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 818
Quote:
Originally Posted by Gunnar View Post
I did not post optmization tricks for another CPU than 68000.
Could be, I mainly reacted to your response to Photon. And he reacted to a post where you specifically wrote about the Apollo/Phoenix.
britelite is offline  
Old 11 June 2014, 07:53   #120
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by Gunnar View Post
Maybe you should re-read my post:
I did not post optmization tricks for another CPU than 68000.
I only pointed out that trick XYZ works good on 68000 but has a drawback on another 68K CPU.
Maybe you should re-read what Photon wrote, he certainly has a good reason for that.

Quote:
Originally Posted by Gunnar View Post
The same way you can say using MOVEP on 68000 might save some cycles,
but can have a huge penalty on an 68060 system.
And this thread is about 68040/60 optimisations since when?
StingRay is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68000 boot code billt Coders. General 15 05 May 2012 20:13
Wasted Dreams on 68000 sanjyuubi support.Games 5 27 May 2011 17:11
680x0 to 68000 Counia Hardware mods 1 01 March 2011 10:18
quitting on 68000? Hungry Horace project.WHDLoad 60 19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 11:40

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:00.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.16160 seconds with 14 queries