ARexx, WaitPort() Doesn't Return under Some Condition

tygre · 14 January 2023, 01:53

Hi there!

I have this code that send messages to the ARexx port of some module players, for example EaglePlayer. I looked at a lot of example on-line, like this one, and they all do something like that:

Code:

Forbid();
if((_arexx_port = FindPort(...)) == NULL)
{
	Permit();
	goto _RETURN_ERROR;
}
PutMsg(_arexx_port, &rexx_msg->rm_Node);
Permit();
WaitPort(reply_port);
	GetMsg(reply_port);

Forbid() and Permit() are around FindPort(). But even if I put them around the whole block, I still have this problem: WaitPort() never returns, gets "stuck", if I quit EaglePlayer just at the right moment.

It seems that, upon quitting, EaglePlayer keeps its port open (FindPort() succeeds) but, immediately after, stops answering messages (WaitPort() never returns). Does this make sense or is it my code that's buggy?

Cheers!

a/b · 14 January 2023, 02:25

If the server isn't shutting down atomically, then there's nothing much you can do. For example, if it replies to all the messages, then doesn't forbid while checking if empty and deleting the port. So a new message right between checking if there''s any left to reply and the port actually being deleted and made unavailable will potentially brick any client regardless whether said client uses forbid (smaller chance) or not.
Your code looks safe, and the problem is on the other side, as far as I can see.

Exodous · 14 January 2023, 07:13

Putting a Forbid/Permit around the whole code, including the WaitPort won't work as the scheduler would never switch task to the one you're waiting on, so you would never get a reply even if it did send a reply.

However, if you know the other end of the message port may not repond, you could just use GetMsg in a timeout/delay loop as GetMsg is effectively asynchronous. If there is a message, it gets it, if not, it returns zero.

http://amigadev.elowar.com/read/ADCD.../node035A.html

Here's a bit of pseudo code to demonstrate what I'm thinking which tries 5 times with a 200ms delay between each check. At the end of the loop, if received is "true" then there was a message, otherwise there wasn't...

Code:

count = 0
received = false
while (count < 5 or received == false)
{
  if (GetMsg(reply_port) == null)
  {
    Delay(10)    // 10 ticks = 200ms
    count = count + 1
  }
  {
    received = true;
  }
}

This way your program will never wait indefinitely.

meynaf · 14 January 2023, 07:32

Quote:

Originally Posted by Exodous

Putting a Forbid/Permit around the whole code, including the WaitPort won't work as the scheduler would never switch task to the one you're waiting on, so you would never get a reply even if it did send a reply.

Normally waiting for something will 'break' the Forbid state.

Thomas Richter · 14 January 2023, 08:18

Quote:

Originally Posted by Exodous

Putting a Forbid/Permit around the whole code, including the WaitPort won't work as the scheduler would never switch task to the one you're waiting on, so you would never get a reply even if it did send a reply.

Err... no. That's totally safe and a not so uncommon programming pattern. As soon as the code runs into Wait() (or the Wait() implicit in WaitPort()), the Forbid() or Disable() state is broken. An exec task that is voluntarily giving up the CPU by that implicitly re-allows interrupts and task switching. The Forbid() or Disable() state will be restored as soon as the signal the task waits for arrives.

Exodous · 14 January 2023, 08:20

For the original code, this may or may not be the case as it isn't explicity documented in the AutoDocs.

They say that calling Wait() states that it breaks the Forbid status until the next time the task scheduler allocates time to the corresponding task which called Forbid.

For WaitPort(), they say "If necessary, the Wait() function will be called". Whilst I presume that if there is a message waiting, it won't call Wait() and will just return, otherwise it calls Wait(), it's making these sort of undocumented presumptions that always come back to bite you when you're lease expecting it and then causes hours of head scratching why things sometimes work and sometimes don't.

Though, whilst this discussion is interesting, it's not relevant for my suggested pseudo code, as that doesn't use WaitPort and therefore must not be within a Forbid/Permit pair.

However, there are still other problems - if another task created the port, then technically it could "go away" between finding the port and attempting to use if the other task is scheduled to run and closes the port at that time. This means the task reading the message could just be accessing arbitrary data from memory. At best, it would be reading what was there before. At worst, it could lead to corruption and a crash.

Within my pseudo code loop, it would probably be best to do the following to prevent this:

Forbid()
FindPort()
If port found GetMsg()
Permit()

Exodous · 14 January 2023, 08:33

Quote:

Originally Posted by Thomas Richter

Err... no. That's totally safe and a not so uncommon programming pattern.

Thomas, whilst I appreciate you have probably seen how this works within the OS code, so can answer this with conviction, the documentation we "mere mortals" have available doesn't explicitly state under what circumstances Wait() is called when using WaitPort().

Making assumptions about how "internal" functions operate is dangerous and comes back to bite you. Therefore, based upon the documentation available, surely it cannot be recommended to use WaitPort() within a Forbid() section?

I know how things written can be misunderstood, so want to note that this is a legitimate question as, whilst it may work on all current releases, I can't see it documented anywhere that this is guaranteed to work in the way described.

Thomas Richter · 14 January 2023, 09:56

You do not need the Os sources for that. WaitPort() waits whenever there is no message in the port to remove, thus when waiting is necessary. Otherwise, waiting is not necessary. It is really quite simple. That is not an "assumption".... Check also the RKRMs.

Thomas Richter · 14 January 2023, 10:17

Quote:

Originally Posted by Exodous

For the original code, this may or may not be the case as it isn't explicity documented in the AutoDocs.

From the RKRM:

Quote:

You can call the WaitPort() function to wait for a message to arrive at a port. This function will return the first message (it may not be the only) queued to a port. Note that your application must still call GetMsg() to remove the message from the port. If the port is empty, your task will go to sleep waiting for the first message. If the port is not empty, your task will not go to sleep. It is possible to receive a signal for a port without a message being present yet. The code processing the messages should be able to handle this. The following code illustrates WaitPort().

Quote:

Originally Posted by Exodous

They say that calling Wait() states that it breaks the Forbid status until the next time the task scheduler allocates time to the corresponding task which called Forbid.

Precisely. Thus, for example, if the signal you waited on was received. Or, if you like, the port becomes non-empty and as a result, WaitPort() receives task time.

Quote:

Originally Posted by Exodous

For WaitPort(), they say "If necessary, the Wait() function will be called". Whilst I presume that if there is a message waiting, it won't call Wait() and will just return, otherwise it calls Wait(), it's making these sort of undocumented presumptions that always come back to bite you when you're lease expecting it and then causes hours of head scratching why things sometimes work and sometimes don't.

WaitPort() is bug-free (it is rather trivial, actually), and the only reason why it does not return is that there is no message ever delivered, or alternatively, if the message was received and removed from the port before the discussed code fragment is executed. WaitPort() as part of an event loop is only useful if there is only a single port on which messages shall be retrieved.

Quote:

Originally Posted by Exodous

Though, whilst this discussion is interesting, it's not relevant for my suggested pseudo code, as that doesn't use WaitPort and therefore must not be within a Forbid/Permit pair.

That suggested pseudo-code is over-complicated and sub-optimal. It is over-complicated as it needs another library for the job (which also only runs into a Wait() at some point, and it needs the timer.device), and it is sub-optimal as it waits longer than necessary if a message arrives.

Quote:

Originally Posted by Exodous

However, there are still other problems - if another task created the port, then technically it could "go away" between finding the port and attempting to use if the other task is scheduled to run and closes the port at that time.

Not if there is a Forbid() upfront the FindPort() and no other call between WaitPort() that may break a Forbid(). Note that AddPort() and RemPort() both call Forbid(), and thus will never be called by someone while your task holds the Forbid. Thus, the port cannot go away under your feed.

Quote:

Originally Posted by Exodous

This means the task reading the message could just be accessing arbitrary data from memory. At best, it would be reading what was there before. At worst, it could lead to corruption and a crash.

Not if you do it properly. That is, protect the FindTask() with a Forbid() so the CPU cannot be stolen while your task operates and continues into the wait.

The problem is not that the port goes away. It cannot. The problem is that the program handling the port does not reply all messages before the port is removed, or your program already removed the message upfront and calls WaitPort() even though the message has already been delivered and removed.

Quote:

Originally Posted by Exodous

Within my pseudo code loop, it would probably be best to do the following to prevent this:

Forbid()
FindPort()
If port found GetMsg()
Permit()

That also works, but is a non-blocking version.

If you must wait for the message to return, and if a message is returned by protocol, then

Code:

Forbid();
if (port = FindPort(...)) {
 WaitPort();
 msg = GetMsg()
}
Permit();

does what it should do.

a/b · 14 January 2023, 11:46

Forbid() story aside...
If we are talking alternative approaches, other than the obvious back-to-basics polling, if you happen to have an async piece of code running periodically (interrupt handler) you could send yourself a wake-up msg. Or if you use multiple sources instead, e.g. a fat Cancel button that a user could mash if the app becomes unresponsive that would send you an intuimsg and wake you up.
Hard to tell without context, polling being the most obvious approach.

thomas · 14 January 2023, 13:23

Quote:

Originally Posted by Thomas Richter

If you must wait for the message to return, and if a message is returned by protocol, then

Code:

Forbid();
if (port = FindPort(...)) {
 WaitPort();
 msg = GetMsg()
}
Permit();

does what it should do.

I don't see in which occasion you would use this code.

FindPort gives you the address of a foreign, i.e. another task's port. You can only wait for messages on ports your own task has allocated. And if you allocated the port, you know it's address, you don't need to call FindPort for it.

Also I don't see why you would have to embed WaitPort and GetMsg in Forbid/Permit. When the message has returned to your reply port, you are the owner of the message. It cannot disappear between WaitPort and GetMsg.

Only foreign ports can disappear unexpectedly. Therefore FindPort and PutMsg may need Forbid. But not the wait for reply.

And if the remote port disappears before it has removed itself from the public port list, then it is a bug of the server program. I doubt there is any workaround you can add to the client program.

Hedeon · 14 January 2023, 13:31

Regarding GetMsg() and Forbid() / Permit(). Is Remove() et al also protected by Forbid() / Permit () in OS3? if you decide to not use it with GetMsg() then if during the removal of a message/node a task switch occurs, and the new tasks also reads the same list, nasty things happen?

Exodous · 14 January 2023, 13:51

Quote:

Originally Posted by Thomas Richter

From the RKRM:
....

OK, I'm not sure what I read to miss that line, but simply pointing out that I'd missed this bit would have been sufficient.

You seriously didn't need to write more than a screenful of quotes with such a condescending reply!

EDIT: In fact, why did it actually need the second response as you had responded once - it almost looks like you were deliberately trying to be antagonistic with the second response?

Apologies to OP for the slight derail.

Thomas Richter · 14 January 2023, 15:03

Quote:

Originally Posted by Hedeon

Regarding GetMsg() and Forbid() / Permit(). Is Remove() et al also protected by Forbid() / Permit () in OS3?

No, of course not. Remove() is just removing a node from a list, assuming that you are the exclusive user of the list. It is one of the list-related calls.

Quote:

Originally Posted by Hedeon

If you decide to not use it with GetMsg() then if during the removal of a message/node a task switch occurs, and the new tasks also reads the same list, nasty things happen?

You are typically not reading shared lists. You are reading from your own ports. If you want to share a list that is not a port, you need to serialize access to it, typically by a semaphore or a forbid/permit pair.

Thomas Richter · 14 January 2023, 15:05

Quote:

Originally Posted by thomas

I don't see in which occasion you would use this code.

Sorry, there is a PutMsg() to the found port missing for the message to send, and exclusively wait for. That happens if you do not put arguments. *sigh*

tygre · 14 January 2023, 18:10

Hi all!

Thank you all! I really appreciate the thorough discussion!

While I understand that the following code would be the right thing to do:

Code:

Forbid();
if(arexx_port = FindPort(...))
{
	PutMsg(arexx_port, rexx_msg);
	WaitPort(reply_port);
	GetMsg(reply_port);
}
Permit();

I think that Exodus' code would be better in my case, because some external programs, like EaglePlayer?, may never reply to the WaitPort(reply_port):

Code:

while(wait_count < 25 && msg_received == FALSE)
{
	if(GetMsg(reply_port) == NULL)
	{
		Delay(10);
		wait_count++;
	}
	else
	{
		msg_received = TRUE;
	}
}

I'm going to experiment with this code and let you know!
Cheers!

Thomas Richter · 14 January 2023, 21:55

If a program with an ARexx port does not reply to an Arexx command, then something is fishy about this program, I would say.

Typically, howevrer, you would not wait for the reply from the port in the very same place. Instead,retrieving a returned Rexx message would be part of the event loop of your program - in the sense of: You first fire off the rexx command (Forbid(),FindPort(),PutMsg(),Permit()), and then in the event loop of your program, you check multiple ports for incoming messages to react upon, and the reply port of the rexx message would be just one of them. WaitPort() is then, of course, not the right answer. You rather need to wait on the signal mask of all ports combined, and then check one port after another for any incoming message.

You still have the "problem" of what to do about non-replied rexx messages then, though. If the user chooses to terminate your program, you should better check whether all rexx messages send out had been returned back as you cannot safely kill the reply port without having retrieved all messages.

tygre · 15 January 2023, 00:37

Thanks Thomas! Yes, that makes perfect sense

But then what to do if a "rogue" program doesn't reply at all, for example when I quit EaglePlayer: it won't be able to answer anymore at all... Is there a safe way for me to stop my program then, without having to wait and retrieve all messages?

Thomas Richter · 15 January 2023, 01:41

First, I would suggest contacting the author of the program is the first thing you should try.

As a practical advice, I would suggest that, upon exit of your program, check which messages are still pending to be replied back to you. If the target port still exists, you can still wait on them before quitting your task because it could still happen that your messages will be replied at some point, and then if their reply port is no longer present, bad things will happen if the destination port owner attempts to reply them.

If the target port does not exist anymore (and thus there is a defect in the destination program), I would zero out the mn_ReplyPort by those messages you still wait on (so nothing bad happens if someone still attempts to reply them), and then exit your program without releasing the message. You then have a memory leak (unfortunatly), but at least if someone picks up the message and attemps to reply it, nothing bad will happen.

A message with NULL reply port will just have its node type set to NT_FREEMSG (or something like it, I forgot the precise type).

tygre · 15 January 2023, 20:47

Hi all!

Again, thanks for the help

I updated to code snippet above because I put a || where it should be a &&.

I also increased the delay (count to 25 instead of 5) to give a chance to the other program to actually reply! With a count to 5, for example, EaglePlayer didn't have the time to reply legitimately...

I think that increasing the waiting time to max. 5s could make sense: if after 5s the other program hasn't replied, something must be wrong anyway, mustn't it?

Cheers!

14 January 2023, 01:53	#1
tygre Returning fan! Join Date: Jan 2011 Location: Montréal, QC, Canada Posts: 1,440	ARexx, WaitPort() Doesn't Return under Some Condition Hi there! I have this code that send messages to the ARexx port of some module players, for example EaglePlayer. I looked at a lot of example on-line, like this one, and they all do something like that: Code: Forbid(); if((_arexx_port = FindPort(...)) == NULL) { Permit(); goto _RETURN_ERROR; } PutMsg(_arexx_port, &rexx_msg->rm_Node); Permit(); WaitPort(reply_port); GetMsg(reply_port); Forbid() and Permit() are around FindPort(). But even if I put them around the whole block, I still have this problem: WaitPort() never returns, gets "stuck", if I quit EaglePlayer just at the right moment. It seems that, upon quitting, EaglePlayer keeps its port open (FindPort() succeeds) but, immediately after, stops answering messages (WaitPort() never returns). Does this make sense or is it my code that's buggy? Cheers! Last edited by tygre; 14 January 2023 at 01:54. Reason: Typos

14 January 2023, 07:13	#3
Exodous Registered User Join Date: Sep 2019 Location: Leicester / England Posts: 203	Putting a Forbid/Permit around the whole code, including the WaitPort won't work as the scheduler would never switch task to the one you're waiting on, so you would never get a reply even if it did send a reply. However, if you know the other end of the message port may not repond, you could just use GetMsg in a timeout/delay loop as GetMsg is effectively asynchronous. If there is a message, it gets it, if not, it returns zero. http://amigadev.elowar.com/read/ADCD.../node035A.html Here's a bit of pseudo code to demonstrate what I'm thinking which tries 5 times with a 200ms delay between each check. At the end of the loop, if received is "true" then there was a message, otherwise there wasn't... Code: count = 0 received = false while (count < 5 or received == false) { if (GetMsg(reply_port) == null) { Delay(10) // 10 ticks = 200ms count = count + 1 } { received = true; } } This way your program will never wait indefinitely.

14 January 2023, 18:10	#16
tygre Returning fan! Join Date: Jan 2011 Location: Montréal, QC, Canada Posts: 1,440	Hi all! Thank you all! I really appreciate the thorough discussion! While I understand that the following code would be the right thing to do: Code: Forbid(); if(arexx_port = FindPort(...)) { PutMsg(arexx_port, rexx_msg); WaitPort(reply_port); GetMsg(reply_port); } Permit(); I think that Exodus' code would be better in my case, because some external programs, like EaglePlayer?, may never reply to the WaitPort(reply_port): Code: while(wait_count < 25 && msg_received == FALSE) { if(GetMsg(reply_port) == NULL) { Delay(10); wait_count++; } else { msg_received = TRUE; } } I'm going to experiment with this code and let you know! Cheers! Last edited by tygre; 15 January 2023 at 20:42. Reason: Fixed broken logic! Added longer delay and explanation...

15 January 2023, 20:47	#20
tygre Returning fan! Join Date: Jan 2011 Location: Montréal, QC, Canada Posts: 1,440	Hi all! Again, thanks for the help I updated to code snippet above because I put a \|\| where it should be a &&. I also increased the delay (count to 25 instead of 5) to give a chance to the other program to actually reply! With a count to 5, for example, EaglePlayer didn't have the time to reply legitimately... I think that increasing the waiting time to max. 5s could make sense: if after 5s the other program hasn't replied, something must be wrong anyway, mustn't it? Cheers! Last edited by tygre; 15 January 2023 at 21:13.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Disk condition of Elvira II in TOSEC	Crashdisk	request.Old Rare Games	22	02 August 2022 19:03
Open(xxx, MODE_NEWFILE) on CD-ROM doesn't return	alpine9000	Coders. General	4	11 May 2018 02:11
Utopia - Decent Condition	Neil79	MarketPlace	0	05 March 2014 00:15
Cybervision64 (not in 100% condition)	macce2	MarketPlace	3	01 December 2006 01:34
Looking For Amiga 600 in good condition	Vaclav	MarketPlace	0	06 May 2006 00:03

14 January 2023, 02:25	#2
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,068	If the server isn't shutting down atomically, then there's nothing much you can do. For example, if it replies to all the messages, then doesn't forbid while checking if empty and deleting the port. So a new message right between checking if there''s any left to reply and the port actually being deleted and made unavailable will potentially brick any client regardless whether said client uses forbid (smaller chance) or not. Your code looks safe, and the problem is on the other side, as far as I can see.

14 January 2023, 08:20	#6
Exodous Registered User Join Date: Sep 2019 Location: Leicester / England Posts: 203	For the original code, this may or may not be the case as it isn't explicity documented in the AutoDocs. They say that calling Wait() states that it breaks the Forbid status until the next time the task scheduler allocates time to the corresponding task which called Forbid. For WaitPort(), they say "If necessary, the Wait() function will be called". Whilst I presume that if there is a message waiting, it won't call Wait() and will just return, otherwise it calls Wait(), it's making these sort of undocumented presumptions that always come back to bite you when you're lease expecting it and then causes hours of head scratching why things sometimes work and sometimes don't. Though, whilst this discussion is interesting, it's not relevant for my suggested pseudo code, as that doesn't use WaitPort and therefore must not be within a Forbid/Permit pair. However, there are still other problems - if another task created the port, then technically it could "go away" between finding the port and attempting to use if the other task is scheduled to run and closes the port at that time. This means the task reading the message could just be accessing arbitrary data from memory. At best, it would be reading what was there before. At worst, it could lead to corruption and a crash. Within my pseudo code loop, it would probably be best to do the following to prevent this: Forbid() FindPort() If port found GetMsg() Permit()

14 January 2023, 09:56	#8
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,322	You do not need the Os sources for that. WaitPort() waits whenever there is no message in the port to remove, thus when waiting is necessary. Otherwise, waiting is not necessary. It is really quite simple. That is not an "assumption".... Check also the RKRMs.

14 January 2023, 11:46	#10
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,068	Forbid() story aside... If we are talking alternative approaches, other than the obvious back-to-basics polling, if you happen to have an async piece of code running periodically (interrupt handler) you could send yourself a wake-up msg. Or if you use multiple sources instead, e.g. a fat Cancel button that a user could mash if the app becomes unresponsive that would send you an intuimsg and wake you up. Hard to tell without context, polling being the most obvious approach.

14 January 2023, 13:31	#12
Hedeon Semi-Retired Join Date: Mar 2012 Location: Leiden / The Netherlands Posts: 2,049	Regarding GetMsg() and Forbid() / Permit(). Is Remove() et al also protected by Forbid() / Permit () in OS3? if you decide to not use it with GetMsg() then if during the removal of a message/node a task switch occurs, and the new tasks also reads the same list, nasty things happen?

14 January 2023, 21:55	#17
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,322	If a program with an ARexx port does not reply to an Arexx command, then something is fishy about this program, I would say. Typically, howevrer, you would not wait for the reply from the port in the very same place. Instead,retrieving a returned Rexx message would be part of the event loop of your program - in the sense of: You first fire off the rexx command (Forbid(),FindPort(),PutMsg(),Permit()), and then in the event loop of your program, you check multiple ports for incoming messages to react upon, and the reply port of the rexx message would be just one of them. WaitPort() is then, of course, not the right answer. You rather need to wait on the signal mask of all ports combined, and then check one port after another for any incoming message. You still have the "problem" of what to do about non-replied rexx messages then, though. If the user chooses to terminate your program, you should better check whether all rexx messages send out had been returned back as you cannot safely kill the reply port without having retrieved all messages.

15 January 2023, 00:37	#18
tygre Returning fan! Join Date: Jan 2011 Location: Montréal, QC, Canada Posts: 1,440	Thanks Thomas! Yes, that makes perfect sense But then what to do if a "rogue" program doesn't reply at all, for example when I quit EaglePlayer: it won't be able to answer anymore at all... Is there a safe way for me to stop my program then, without having to wait and retrieve all messages?

15 January 2023, 01:41	#19
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,322	First, I would suggest contacting the author of the program is the first thing you should try. As a practical advice, I would suggest that, upon exit of your program, check which messages are still pending to be replied back to you. If the target port still exists, you can still wait on them before quitting your task because it could still happen that your messages will be replied at some point, and then if their reply port is no longer present, bad things will happen if the destination port owner attempts to reply them. If the target port does not exist anymore (and thus there is a defect in the destination program), I would zero out the mn_ReplyPort by those messages you still wait on (so nothing bad happens if someone still attempts to reply them), and then exit your program without releasing the message. You then have a memory leak (unfortunatly), but at least if someone picks up the message and attemps to reply it, nothing bad will happen. A message with NULL reply port will just have its node type set to NT_FREEMSG (or something like it, I forgot the precise type).

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)