14 January 2023, 01:53 | #1 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,440
|
ARexx, WaitPort() Doesn't Return under Some Condition
Hi there!
I have this code that send messages to the ARexx port of some module players, for example EaglePlayer. I looked at a lot of example on-line, like this one, and they all do something like that: Code:
Forbid(); if((_arexx_port = FindPort(...)) == NULL) { Permit(); goto _RETURN_ERROR; } PutMsg(_arexx_port, &rexx_msg->rm_Node); Permit(); WaitPort(reply_port); GetMsg(reply_port); It seems that, upon quitting, EaglePlayer keeps its port open (FindPort() succeeds) but, immediately after, stops answering messages (WaitPort() never returns). Does this make sense or is it my code that's buggy? Cheers! Last edited by tygre; 14 January 2023 at 01:54. Reason: Typos |
14 January 2023, 02:25 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,068
|
If the server isn't shutting down atomically, then there's nothing much you can do. For example, if it replies to all the messages, then doesn't forbid while checking if empty and deleting the port. So a new message right between checking if there''s any left to reply and the port actually being deleted and made unavailable will potentially brick any client regardless whether said client uses forbid (smaller chance) or not.
Your code looks safe, and the problem is on the other side, as far as I can see. |
14 January 2023, 07:13 | #3 |
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
Putting a Forbid/Permit around the whole code, including the WaitPort won't work as the scheduler would never switch task to the one you're waiting on, so you would never get a reply even if it did send a reply.
However, if you know the other end of the message port may not repond, you could just use GetMsg in a timeout/delay loop as GetMsg is effectively asynchronous. If there is a message, it gets it, if not, it returns zero. http://amigadev.elowar.com/read/ADCD.../node035A.html Here's a bit of pseudo code to demonstrate what I'm thinking which tries 5 times with a 200ms delay between each check. At the end of the loop, if received is "true" then there was a message, otherwise there wasn't... Code:
count = 0 received = false while (count < 5 or received == false) { if (GetMsg(reply_port) == null) { Delay(10) // 10 ticks = 200ms count = count + 1 } { received = true; } } |
14 January 2023, 07:32 | #4 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,358
|
Normally waiting for something will 'break' the Forbid state.
|
14 January 2023, 08:18 | #5 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
Err... no. That's totally safe and a not so uncommon programming pattern. As soon as the code runs into Wait() (or the Wait() implicit in WaitPort()), the Forbid() or Disable() state is broken. An exec task that is voluntarily giving up the CPU by that implicitly re-allows interrupts and task switching. The Forbid() or Disable() state will be restored as soon as the signal the task waits for arrives.
|
14 January 2023, 08:20 | #6 |
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
For the original code, this may or may not be the case as it isn't explicity documented in the AutoDocs.
They say that calling Wait() states that it breaks the Forbid status until the next time the task scheduler allocates time to the corresponding task which called Forbid. For WaitPort(), they say "If necessary, the Wait() function will be called". Whilst I presume that if there is a message waiting, it won't call Wait() and will just return, otherwise it calls Wait(), it's making these sort of undocumented presumptions that always come back to bite you when you're lease expecting it and then causes hours of head scratching why things sometimes work and sometimes don't. Though, whilst this discussion is interesting, it's not relevant for my suggested pseudo code, as that doesn't use WaitPort and therefore must not be within a Forbid/Permit pair. However, there are still other problems - if another task created the port, then technically it could "go away" between finding the port and attempting to use if the other task is scheduled to run and closes the port at that time. This means the task reading the message could just be accessing arbitrary data from memory. At best, it would be reading what was there before. At worst, it could lead to corruption and a crash. Within my pseudo code loop, it would probably be best to do the following to prevent this: Forbid() FindPort() If port found GetMsg() Permit() |
14 January 2023, 08:33 | #7 | |
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
Quote:
Making assumptions about how "internal" functions operate is dangerous and comes back to bite you. Therefore, based upon the documentation available, surely it cannot be recommended to use WaitPort() within a Forbid() section? I know how things written can be misunderstood, so want to note that this is a legitimate question as, whilst it may work on all current releases, I can't see it documented anywhere that this is guaranteed to work in the way described. |
|
14 January 2023, 09:56 | #8 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
You do not need the Os sources for that. WaitPort() waits whenever there is no message in the port to remove, thus when waiting is necessary. Otherwise, waiting is not necessary. It is really quite simple. That is not an "assumption".... Check also the RKRMs.
|
14 January 2023, 10:17 | #9 | ||||||||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
Quote:
From the RKRM: Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
The problem is not that the port goes away. It cannot. The problem is that the program handling the port does not reply all messages before the port is removed, or your program already removed the message upfront and calls WaitPort() even though the message has already been delivered and removed. Quote:
If you must wait for the message to return, and if a message is returned by protocol, then Code:
Forbid(); if (port = FindPort(...)) { WaitPort(); msg = GetMsg() } Permit(); |
||||||||
14 January 2023, 11:46 | #10 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,068
|
Forbid() story aside...
If we are talking alternative approaches, other than the obvious back-to-basics polling, if you happen to have an async piece of code running periodically (interrupt handler) you could send yourself a wake-up msg. Or if you use multiple sources instead, e.g. a fat Cancel button that a user could mash if the app becomes unresponsive that would send you an intuimsg and wake you up. Hard to tell without context, polling being the most obvious approach. |
14 January 2023, 13:23 | #11 | |
Registered User
Join Date: Jan 2002
Location: Germany
Posts: 7,032
|
Quote:
FindPort gives you the address of a foreign, i.e. another task's port. You can only wait for messages on ports your own task has allocated. And if you allocated the port, you know it's address, you don't need to call FindPort for it. Also I don't see why you would have to embed WaitPort and GetMsg in Forbid/Permit. When the message has returned to your reply port, you are the owner of the message. It cannot disappear between WaitPort and GetMsg. Only foreign ports can disappear unexpectedly. Therefore FindPort and PutMsg may need Forbid. But not the wait for reply. And if the remote port disappears before it has removed itself from the public port list, then it is a bug of the server program. I doubt there is any workaround you can add to the client program. |
|
14 January 2023, 13:31 | #12 |
Semi-Retired
Join Date: Mar 2012
Location: Leiden / The Netherlands
Posts: 2,049
|
Regarding GetMsg() and Forbid() / Permit(). Is Remove() et al also protected by Forbid() / Permit () in OS3? if you decide to not use it with GetMsg() then if during the removal of a message/node a task switch occurs, and the new tasks also reads the same list, nasty things happen?
|
14 January 2023, 13:51 | #13 |
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
OK, I'm not sure what I read to miss that line, but simply pointing out that I'd missed this bit would have been sufficient.
You seriously didn't need to write more than a screenful of quotes with such a condescending reply! EDIT: In fact, why did it actually need the second response as you had responded once - it almost looks like you were deliberately trying to be antagonistic with the second response? Apologies to OP for the slight derail. |
14 January 2023, 15:03 | #14 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
Quote:
|
|
14 January 2023, 15:05 | #15 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
|
14 January 2023, 18:10 | #16 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,440
|
Hi all!
Thank you all! I really appreciate the thorough discussion! While I understand that the following code would be the right thing to do: Code:
Forbid(); if(arexx_port = FindPort(...)) { PutMsg(arexx_port, rexx_msg); WaitPort(reply_port); GetMsg(reply_port); } Permit(); Code:
while(wait_count < 25 && msg_received == FALSE) { if(GetMsg(reply_port) == NULL) { Delay(10); wait_count++; } else { msg_received = TRUE; } } Cheers! Last edited by tygre; 15 January 2023 at 20:42. Reason: Fixed broken logic! Added longer delay and explanation... |
14 January 2023, 21:55 | #17 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
If a program with an ARexx port does not reply to an Arexx command, then something is fishy about this program, I would say.
Typically, howevrer, you would not wait for the reply from the port in the very same place. Instead,retrieving a returned Rexx message would be part of the event loop of your program - in the sense of: You first fire off the rexx command (Forbid(),FindPort(),PutMsg(),Permit()), and then in the event loop of your program, you check multiple ports for incoming messages to react upon, and the reply port of the rexx message would be just one of them. WaitPort() is then, of course, not the right answer. You rather need to wait on the signal mask of all ports combined, and then check one port after another for any incoming message. You still have the "problem" of what to do about non-replied rexx messages then, though. If the user chooses to terminate your program, you should better check whether all rexx messages send out had been returned back as you cannot safely kill the reply port without having retrieved all messages. |
15 January 2023, 00:37 | #18 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,440
|
Thanks Thomas! Yes, that makes perfect sense
But then what to do if a "rogue" program doesn't reply at all, for example when I quit EaglePlayer: it won't be able to answer anymore at all... Is there a safe way for me to stop my program then, without having to wait and retrieve all messages? |
15 January 2023, 01:41 | #19 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,322
|
First, I would suggest contacting the author of the program is the first thing you should try.
As a practical advice, I would suggest that, upon exit of your program, check which messages are still pending to be replied back to you. If the target port still exists, you can still wait on them before quitting your task because it could still happen that your messages will be replied at some point, and then if their reply port is no longer present, bad things will happen if the destination port owner attempts to reply them. If the target port does not exist anymore (and thus there is a defect in the destination program), I would zero out the mn_ReplyPort by those messages you still wait on (so nothing bad happens if someone still attempts to reply them), and then exit your program without releasing the message. You then have a memory leak (unfortunatly), but at least if someone picks up the message and attemps to reply it, nothing bad will happen. A message with NULL reply port will just have its node type set to NT_FREEMSG (or something like it, I forgot the precise type). |
15 January 2023, 20:47 | #20 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,440
|
Hi all!
Again, thanks for the help I updated to code snippet above because I put a || where it should be a &&. I also increased the delay (count to 25 instead of 5) to give a chance to the other program to actually reply! With a count to 5, for example, EaglePlayer didn't have the time to reply legitimately... I think that increasing the waiting time to max. 5s could make sense: if after 5s the other program hasn't replied, something must be wrong anyway, mustn't it? Cheers! Last edited by tygre; 15 January 2023 at 21:13. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Disk condition of Elvira II in TOSEC | Crashdisk | request.Old Rare Games | 22 | 02 August 2022 19:03 |
Open(xxx, MODE_NEWFILE) on CD-ROM doesn't return | alpine9000 | Coders. General | 4 | 11 May 2018 02:11 |
Utopia - Decent Condition | Neil79 | MarketPlace | 0 | 05 March 2014 00:15 |
Cybervision64 (not in 100% condition) | macce2 | MarketPlace | 3 | 01 December 2006 01:34 |
Looking For Amiga 600 in good condition | Vaclav | MarketPlace | 0 | 06 May 2006 00:03 |
|
|