English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 08 February 2020, 13:42   #1
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,566
68040 const array init

We are currently working on optimizing the initialization of structures or arrays with constants in the 68k backend. Mainly there are the following two cases:

1. Static data
n-times
move.l #x,lab

or
lea lab,An
followed by n-times
move.l #x,(An)+


2. Temporary stack frame data
n-times
move.l #x,d(SP)

or
lea d(SP),An
followed by n-times
move.l #x,(An)+


For 68000 it is clear that using LEA and an address register is shorter and faster when n is greater than 1. And I assume the same is true up to the 68030 (and probably also the 68060). But what about the 68040? Absolute addressing modes are faster than Address Register Indirect, isn't it?

So, in the first (static data) case it would be better to generate a sequence of
move.l #x,lab
for 68040?

Although for the second case (stack frame), it would still be better to use the LEA with n>1? Does anybody know the timing of d(An) compared to (An)+ on the 040?
phx is offline  
Old 08 February 2020, 18:58   #2
Exodous
Registered User

 
Join Date: Sep 2019
Location: Leicester / England
Posts: 23
All 3 of your move.l timings are identical for the <ea> calculation and the execution sides:

move.l #x,lab = 2 calc + 1L + 1 execute
move.l #x,(An)+ = 2 calc + 1L + 1 execute
move.l #x,d(SP) = 2 calc + 1L + 1 execute

Therefore, your LEA, followed by n-times MOVE will be the number of cycles the LEA takes slower.

If you have a "spare" register, then putting the absolute value in that, then "MOVE.L dn,destination" will be quicker as each move is only 1 calculate + 1 execute.


Reference: https://www.nxp.com/docs/en/referenc.../MC68040UM.pdf

Section 10.4 (page 10.9 / PDF page 300)
Exodous is online now  
Old 08 February 2020, 19:11   #3
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,566
Great! Thanks. That helps a lot.
phx is offline  
Old 08 February 2020, 22:27   #4
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 320
Quote:
Originally Posted by phx View Post
For 68000 it is clear that using LEA and an address register is shorter and faster when n is greater than 1. And I assume the same is true up to the 68030 (and probably also the 68060). But what about the 68040? Absolute addressing modes are faster than Address Register Indirect, isn't it?

Frankly, I would put the initialization data in the data segment and initialize the structure with a loop. The loop should fit into the cache, which should give you a speed advantage. SAS/C generates the initialization data on the data segment, followed by a call to the generic structure copy function. For shorter structures, immediate data is copied.
Thomas Richter is offline  
Old 09 February 2020, 12:37   #5
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,566
Correct. That‘s the way how we do it. But the question is where are the limits for each individual CPU? Up to how many instructions should direct assignments be used?
We have the following three parameters to tune (per CPU):
- Max. number of direct static assignments before copy from data
- Max. number of direct stack assignments before copy from data
- Max. number of assignments before setting up a pointer in a temp. register

My question about the 68040 only touched the last point, although I will happily take advice for the other points as well.
phx is offline  
Old 10 February 2020, 19:30   #6
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 47
Posts: 3,667
As the initialization part is hardly in the cache when called, i would aim for the shortest code. That is, use direct assignments until the overall size gets bigger than with using data.
meynaf is offline  
Old 14 February 2020, 10:27   #7
Bruce Abbott
Registered User

Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 297
Quote:
Originally Posted by phx View Post
We are currently working on optimizing the initialization of structures or arrays with constants in the 68k backend.
Unless it results in a dramatic speedup I wouldn't bother. Better to concentrate on getting the size down to fit in the cache. Most 68040 systems had relatively slow 'fast' memory, so optimizing cache usage has the potential for greater speedup.

Also
move.l #x,lab
generates a relocation entry for every invocation, which bloats the executable and increases loading time.

But the worst thing about optimizing for specific CPUs is having to compile and distribute different versions for each one. Then we disassemble the code and discover that only 2 or 3 instructions in the entire executable are actually different, and wonder what was the point?
Bruce Abbott is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating an array gives me a Guru Shatterhand Coders. Blitz Basic 14 13 August 2019 21:54
640x256 HAM8 mode init milikabdp Coders. Asm / Hardware 2 25 October 2017 22:21
Miami init / virgin media superhbub 2 XsamX1987 support.Hardware 6 08 March 2017 18:12
Amiga a3640 processor card and 68040/68040 processors Euphoria MarketPlace 3 26 February 2017 22:15
Cortex FE init error ragraphics support.Hardware 2 04 May 2015 19:17

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 10:11.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.07480 seconds with 15 queries