English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 March 2021, 05:44   #41
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
I suggested that already, but it's not 100% reliable. If you start at the bottom you eventually have to go through FPU stack frame which also could include a rather large FPU state, so who knows what's in there.
I was also thinking about starting from the top side but it suffers from the same problem, although it could be a little better, depending on stack usage. I don't know how the code looks like and what is happening with stack. I'm pretty sure there are some subroutine calls and maybe even nested, for example what if you end up storing d6 on stack so you have it there multiple times?
It's ~99% likely that the worker task will be switched out during opcode interpretation (and not in the "main loop", which is ~2 instructions) so the stack could look like something like this:
<bottom, task->tc_SPReg> <variable FPU stack frame> pc sr d0 ... a6 <local stack> <rts address> <local stack> <rts address> <top, this is where we are in main loop>
Still very sketchy...
a/b is offline  
Old 27 March 2021, 07:52   #42
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
It is not a problem that the value might be anywhere in stack, IF it's always located at the same place on the same setup.
I mean, i could look for a magical value at startup. Then change the value, trigger a task switch, then check if the value in stack has also changed. If yes keep the offset, else continue scanning.
But i am not sure the fpu stack frame has constant length over time. Reading rom code didn't suggest it had.
meynaf is offline  
Old 27 March 2021, 11:13   #43
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
If you want to be 1000% sure that this is correct D6 word value, you can add second (different) ID for D4 highword too. Later only one extra check must be used.
Don_Adan is offline  
Old 27 March 2021, 11:50   #44
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Don_Adan View Post
If you want to be 1000% sure that this is correct D6 word value, you can add second (different) ID for D4 highword too. Later only one extra check must be used.
This requires extra space in high parts of registers, something not guaranteed (in my current register allocation the only high part that's available is D7).

My current idea anyway is to repeat this code at the end of every routine :
Code:
 move.w (a6)+,d7
 jmp ([a4,d7.w*4])
This means i will not change D6, but A4 (ok, could be D7, but A4 seems more handy here).
That's the same at the end, except that it's faster (removes the branch returning to main loop) and frees D6 (in which i would prefer to have full 32 bits available).
meynaf is offline  
Old 27 March 2021, 13:19   #45
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
OK, one thing I completely missed. If *your* worker task is not using FPU at all or not using it once you go into main loop, FPU stack frame will *always* be NULL (or maybe also IDLE is possible?). So if you kind of calibrate/synchronize your tasks (for the current hw/system) before you enter the main loop, a4 should always be at the same offset.
a/b is offline  
Old 27 March 2021, 13:56   #46
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,300
Quote:
Originally Posted by meynaf View Post
It is not a problem that the value might be anywhere in stack, IF it's always located at the same place on the same setup.
It isn't. The stack frame is FPU-model dependent, and state-dependent as well. The NULL-stateframe is 4 bytes on the 68881 through 68040, but that's all. The 68060 NULL-stateframe is different and 12 bytes. Depending on the state of the FPU, the stateframe may also be an "idle" frame (with less information) or a "busy" frame (with more information) or an "exception stack frame". I neither know where or how the vampire puts its registers there. This is really off-limits and system, hardware and state-dependent.
Thomas Richter is offline  
Old 27 March 2021, 14:08   #47
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Thomas Richter View Post
It isn't. The stack frame is FPU-model dependent, and state-dependent as well. The NULL-stateframe is 4 bytes on the 68881 through 68040, but that's all. The 68060 NULL-stateframe is different and 12 bytes. Depending on the state of the FPU, the stateframe may also be an "idle" frame (with less information) or a "busy" frame (with more information) or an "exception stack frame". I neither know where or how the vampire puts its registers there. This is really off-limits and system, hardware and state-dependent.
But my task will not be using FPU at all (not even thru some math lib), and, let's be honest, i don't care if it fails on the vampire - i didn't agree at first place in adding all these registers.
What does matter isn't that the stack frame depends on the config of the machine, the problem is : does a register remains at same place once it has been located.
meynaf is offline  
Old 27 March 2021, 14:31   #48
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
That's what I addressed in my previous post. If your task is not using FPU at all, then the state is either its initial state (set when the task was created: NULL=0.L), or it doesn't matter because you are running on older KS version that doesn't support FPU (no extra 4 zero bytes).
a/b is offline  
Old 27 March 2021, 14:38   #49
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Sure thing, is that MY task's not gonna use the FPU. But what if another one does in the meanwhile ? Can this change the state ?
meynaf is offline  
Old 27 March 2021, 14:56   #50
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
Yes, but not the FPU state of *your* task. If another task is using it, this happens during task switch:
- FPU state is saved to another task's stack (if not NULL, it's followed my fp0-7, fpcr/fpsr/fpiar and possibly other stuff but we don't have to know that at all)
- task switch
- FPU state is restored from your stack (=NULL)
- you do your thing until task switch
- FPU state is saved to your stack (=NULL)
etc.
a/b is offline  
Old 27 March 2021, 15:17   #51
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,300
Well, it depends on what your task does... If anything in your task uses the FPU, even if only indirectly by opening a math library, or opening something that opens the math library, then the stack frame changes mid-term.

The problem is really that you depend on something the Os does not document, and it does not document this to be extensible.

The old amiga problem. Failing to understand the difference between interface and implementation.
Thomas Richter is offline  
Old 27 March 2021, 15:29   #52
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Thomas Richter View Post
Well, it depends on what your task does... If anything in your task uses the FPU, even if only indirectly by opening a math library, or opening something that opens the math library, then the stack frame changes mid-term.
As i said my task will not, directly or indirectly, use FPU.


Quote:
Originally Posted by Thomas Richter View Post
The problem is really that you depend on something the Os does not document, and it does not document this to be extensible.
It's not a problem. Lots of things the Os does not document have been used already.


Quote:
Originally Posted by Thomas Richter View Post
The old amiga problem. Failing to understand the difference between interface and implementation.
This is better than failing to provide an alternative path. At least there is something that can work.
But maybe you have a better idea ? Something that can work as fast but does not depend on anything undocumented ? As currently it's the choice between doing it this way and not doing it at all...
meynaf is offline  
Old 27 March 2021, 15:32   #53
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
Quote:
Originally Posted by Thomas Richter View Post
The old amiga problem. Failing to understand the difference between interface and implementation.
There's no such failure, I completely understand what I'm suggesting and what risks it implies. I'd avoid that in any public/commercial software of my own, but private stuff... I've done 2^(a lot) worse.
Yeah, it *is* OS internal implementation, and relying on it is 100% unsupported. I can accept that and still proceed with certain probability of success. If the pattern is present in KS1.2 to KS3.1.x I it's 100% and can live with that.
Well, it's up to Meynaf in this case .
a/b is offline  
Old 27 March 2021, 16:48   #54
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by a/b View Post
Well, it's up to Meynaf in this case .
Yep. And i'll go for it as long as there is no alternative giving the same level of performance.
meynaf is offline  
Old 27 March 2021, 16:57   #55
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,300
Quote:
Originally Posted by meynaf View Post
It's not a problem. Lots of things the Os does not document have been used already.
Such practise blocks the development of the Os and the platform, that is the problem.


Quote:
Originally Posted by meynaf View Post

This is better than failing to provide an alternative path. At least there is something that can work.
There are many things that can work. Some work with the system, some against the system, and some by pure chance.



Quote:
Originally Posted by meynaf View Post

But maybe you have a better idea ?
I already gave you a better idea. But first things first: a) measure, b) improve if necessary. I am not convinced that there is much of a noticable difference, and that you should establish that there is a problem that needs to be solved.



Quote:
Originally Posted by meynaf View Post


Something that can work as fast but does not depend on anything undocumented ? As currently it's the choice between doing it this way and not doing it at all...
And that is just not true - why do you state something that is obviously false. You haven't even measured between various implementation choices. It might or might not be faster, depending on what your problem is, or it might be slower by a small margin that does not matter. If that helps that the end result is stable and independent of undocumented internals, it may be worth it.
Thomas Richter is offline  
Old 27 March 2021, 18:15   #56
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Thomas Richter View Post
Such practise blocks the development of the Os and the platform, that is the problem.
Frankly others have made a lot worse. I won't block anything.


Quote:
Originally Posted by Thomas Richter View Post
There are many things that can work. Some work with the system, some against the system, and some by pure chance.
There are also things that don't work.


Quote:
Originally Posted by Thomas Richter View Post
I already gave you a better idea. But first things first: a) measure, b) improve if necessary. I am not convinced that there is much of a noticable difference, and that you should establish that there is a problem that needs to be solved.
You have not in any manner given me a better idea.
What was it already ? The tst.b on a variable ? 4 instructions instead of 2 in the most critical code, not a clever idea.


Quote:
Originally Posted by Thomas Richter View Post
And that is just not true - why do you state something that is obviously false. You haven't even measured between various implementation choices. It might or might not be faster, depending on what your problem is, or it might be slower by a small margin that does not matter. If that helps that the end result is stable and independent of undocumented internals, it may be worth it.
Why would i measure ? If you really need measurement to know that any added instruction in the most inner critical loop of a program will make it slower, you should learn to code.

Sure, cpu designers should measure whether clock cycles added to every instruction will make their cpu slower.
(No, really, adding 1 clock decoding all our instructions is no big deal - most of them already take 4.)
meynaf is offline  
Old 27 March 2021, 20:19   #57
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
Quote:
Originally Posted by Thomas Richter View Post
Well, it depends on what your task does... If anything in your task uses the FPU, even if only indirectly by opening a math library, or opening something that opens the math library, then the stack frame changes mid-term.
Sure, that's a possibility (actually using it, but only Meynaf can answer that since I don't know what dependencies his software has), but if it's only a possibility of probing or opening for whatever reason without actually using it (yeah, it's all hypothetical here because, again, I haven't seen the code/project), you can eliminate that before you start doing any "nasty" things:
Code:
; check exec->AttnFlags if FPU is present, and if it is load a NULL state
	clr.l	-(a7)
	frestore	(a7)+
And you can also do a state+regs save/restore on your own if you are worried about being used after you're done with your "main loop".
a/b is offline  
Old 27 March 2021, 21:18   #58
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,708
I did some testing on my A1200 with Blizzard 1230-IV 50MHz 030 to see what execution speed can be expected. First I timed 50 million nops (a block of 1000 nops repeated 50,000 times) which took 3 seconds. That's 16.7 mips.

Then I timed the following code, threading its way through 1000 different interpreted instructions that all did nothing (ie. equivalent to 50 million nops).

Code:
   move.w   (a6)+,d7
   jmp      ([a5,d7.l*4])
This took 34 seconds, which is ~1.5 mips. That is probably the upper limit on interpretation speed.

Finally I added code to break out of it if any 'flag' bits are set in a particular memory location (pointed to by A4), like this:-

Code:
   move.w   (a6)+,d7
   tst.b    (a4)
   bne.s    break
   jmp      ([a5,d7.l*4])
break:
   jmp     stop
This took 44 seconds, which is ~1.1 mips.

With CPU caches disabled it was slower of course, taking 37 seconds and 51 seconds respectively (about 15% slower).

In practice a lot more code will be required to interpret most instructions, so the difference between the 'fastest possible' code that is difficult to break out of, and the more useful code with test and branch, will be much less than these numbers suggest.

Rather than wasting time trying to figure out some sneaky and problematic way to break out of the execution sequence (like examining stack frames or poking the interpreter code) I suggest using the simple technique above. You can always try changing it later if you manage to make the rest of the interpreter fast enough to justify it.
Attached Files
File Type: lha profile.lha (839 Bytes, 56 views)
Bruce Abbott is offline  
Old 29 March 2021, 08:15   #59
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by a/b View Post
only Meynaf can answer that since I don't know what dependencies his software has
Dependencies are limited : a few libraries (dos, intuition, graphics, keymap, asl), some devices (timer, input, audio), and one resource (ciab).
Not all of them will be used every time.


Quote:
Originally Posted by Bruce Abbott View Post
This took 34 seconds, which is ~1.5 mips. That is probably the upper limit on interpretation speed.
This "upper limit on interpretation speed" is the reason why i'd like to keep it this way.

It is quite obvious that fast instructions will be more sensitive to this than slow instructions.
Some will be very swift :
Code:
; rts - fast if we keep stack ptr in a3
 move.l (a3)+,a6

; 8-bit bra
 extb.l d7
 add.l d7,a6
More typical ones can look like this :
Code:
 moveq #7,d0
 and.w d7,d0
 lsr.w #6,d7
 andi.w #15,d7
 move.l (a5,d7.w*4),$20(a5,d0.w*4)
Note that currently i don't have a real strategy to handle the ccr.
But you can already try to time the above if you want.


Quote:
Originally Posted by Bruce Abbott View Post
Rather than wasting time trying to figure out some sneaky and problematic way to break out of the execution sequence (like examining stack frames or poking the interpreter code) I suggest using the simple technique above. You can always try changing it later if you manage to make the rest of the interpreter fast enough to justify it.
I will be using macros for this, so it could change anytime regardless of the initial choice. So nothing wrong in discussing it right now.
meynaf is offline  
Old 29 March 2021, 11:10   #60
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
I think that you can use/add break interpreter code only for some/few routines, not for all. f.e for your rts interpreter routine and perphaps for a few others. It will be speedup your bigloop routine. For me checking for break signal for every interpreter routine you only waste a few CPU time.
Don_Adan is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Ripping Sprites - Technique... method project.Sprites 43 12 October 2021 16:17
Profiling C code, interpreting results Ernst Blofeld Coders. C/C++ 5 19 November 2020 18:45
Interpreting DMA-Debugger output selco support.WinUAE 10 27 November 2019 20:48
Amazing New Retrobrighting Technique Hewitson Retrogaming General Discussion 12 12 June 2019 09:27
Error while interpreting script Makkinen support.Apps 1 15 October 2004 15:58

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:17.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10311 seconds with 14 queries