|
|
|
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772758 is a reply to message #1772734] |
Thu, 14 September 2017 19:59 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
Okay, now something strange has happened.
I added the and flags to CMAKE_CXX_FLAGS.
Since then my compiler is giving me the error message:
In file C:/.../forte_thread.cpp:18:0:
c:\sysgcc\raspberry\arm-linux-gnueabihf\include\sys\wait.h:102:22: error: '__WAIT_STATUS' was not declared in this scope
extern __pid_t wait (__WAIT_STATUS __stat_loc);
And even after removing the two flags again the message won't go away. I created a new directory and went through the whole process of setting it up for cross-compilation in CMake-GUI and the error message keeps popping up when attempting to build. Other than setting the CMAKE_CXX_FLAGS, I did not knowingly make any changes between the last successful build and the failed one.
Any idea what could be causing it?
[Updated on: Thu, 14 September 2017 20:01] Report message to a moderator
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772791 is a reply to message #1772779] |
Fri, 15 September 2017 09:03 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
No, I'm still on the release branch. I just reinstalled the raspberry Pi cross compile toolchain and updated the sysroot. I will try uninstalling it and reinstalling it now.
I'm cross-compiling from a Windows machine.
Building the Win32 version of forte works fine. It seems like it's just either the Posix version or the Raspberry Pi's sysroot that got corrupted somehow.
Update: Completely removing the toolchain and updating the sysroot fixed the make build error.
I performed the fixes suggested by CPPCheck, but it doesn't seem to fix the issue. And using scanf without field width limits should only cause crashes for "huge input data". The received data can only have a maximum width of 1499.
I caught forte in a frozen state on the RPi this morning. It is still listed in the running processes, but it appears to be stuck in 4diac-IDE's monitoring. I have the CLIENT FB set up so that if the output qualifier is false (i.e. if it receives an HTTP 500 INTERNAL_SERVER_ERROR response), it re-initializes the CLIENT and attempts to send the request again. This is where it appears stuck. It re-initialized the CLIENT and sent a REQ event, but no CNF event has arrived. It got stuck after about 10 attempts.
Triggering the CNF event manually issues an output event, but nothing arrives at the function block that is connected to the output. I assume this means forte is completely frozen.
For now, I will see if adding a 5 s delay between re-initialization and REQ attempts. I know that sending too many requests in short succession causes an internal server error on the REST device. I will also attempt to add some more stability fixes and logging to my HTTP layer code.
[Updated on: Fri, 15 September 2017 11:35] Report message to a moderator
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772799 is a reply to message #1772791] |
Fri, 15 September 2017 12:17 |
Jose Maria Jesus Cabral Lassalle Messages: 199 Registered: February 2016 |
Senior Member |
|
|
Good that you found more clues.
First thing, from the little I read about the fsanitizer, it's not available for windwos or arm compilers, at least in most cases.
Second, the build error was weird, but good that you solved it.
Third, I wasn't sure about the maximum response data, but cleaning cppcheck is always good.
Fourth. Let's go to the important stuff. Let me see if I understood correctly. You have a CLIENT FB using yout http layer, and some other FBs that when you get and error (QO = 0), you re-initialize the CLIENT, and then try a REQ? Is that right?
If yes: what do you mean with re-initialize? De-initiliaze and initialize again, or just initialize again? Is always better to de-initialize first.
It looks like when you are re-initializing, it doesn't finish it and get stuck (this might explaing why you can't send a CNF event manually). When sending the REQ after reinitilizing, do you wait for INITO of CLIENT FB? This will give you a clue if it gets stuck there.
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772817 is a reply to message #1772799] |
Fri, 15 September 2017 15:45 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
Thanks for going through this with me :)
Here's the function block network.
I think it's easiest to understand if I explain how the http layer works.
Using forte's available CIPComLayer class to receive the data is too slow, because usually, the REST server will close the connection before the data is received.
So I created a slightly modified copy of the CIPComLayer (CHttpIPComLayer) that performs all operations - opening the connection, sending data, getting the response and closing the connection from within the sendData() method.
This way it is fast enough to get the response before the peer closes the connection.
An incoming INIT+ event doesn't actually open a connection, it parses the ID and
writes the connection params to a cache, because the socket handler's methods consume the data.
Since closeConnection() is already called at the end of every sendData() call, an INIT- event effectively does nothing.
The com layer is designed in such a way that there are 4 possible outputs:
INITO+
INITO- is only possible if the ID is invalid or QI is false
CNF+ (if an HTTP OK response is received and the body contains valid data)
CNF- (if an unexpected response is received, the timeout is exceeded or the response is invalid)
I only just added the checks for invalid responses, so maybe that'll fix it.
To explain the most important parts of the function block network:
If a CNF- event is output, it loops back to the INIT+ event and then to the E_DELAY FB, which then goes back to the REQ event. This loop goes on until a CNF+ event occurs.
-
Attachment: Capture.PNG
(Size: 39.08KB, Downloaded 1018 times)
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772822 is a reply to message #1772817] |
Fri, 15 September 2017 18:00 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
I may have found a potential cause for a buffer overflow in the CHttpIPComLayer.
I forgot to reset the m_acRecvBuffer and its m_unBufFillSize at the beginning of the sendData() method.
So it only got reset if receiving the response failed. This could also explain why it worked on my Windows Laptop and not on the Raspberry Pi.
My Windows laptop communicates with a WAN connection, while the RPi is connected to the same LAN as the REST server.
The WAN requests probably failed more often, causing the buffer to be reset before it could overflow, while the LAN connection may have resulted in enough successful requests for m_unBufFillSize to grow and grow.
I will deploy the fix and see if that solves the problem.
Update: It froze after about 16 hours...
I can still initialize a connection with a CSIFB I added for debugging purposes, but I cannot request data from it.
I also have a CSV_WRITER_6 function block that freezes in the middle of writing a line.
Update2: I just saw that calling INIT- on the function block actually deletes the whole communication stack. So I guess doing so could actually fix it. I will give this a try.
Update3: De-initializing first doesn't fix it either. This time it froze after 21 hours.
[Updated on: Mon, 18 September 2017 06:59] Report message to a moderator
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772871 is a reply to message #1772822] |
Mon, 18 September 2017 07:44 |
Jose Maria Jesus Cabral Lassalle Messages: 199 Registered: February 2016 |
Senior Member |
|
|
Thanks for the detailed info! I went through your http layer code. If forte freezes is normally because of an inifinite loop. I think you might have a problem with this section of code:
while ((cg_unIPLayerRecvBufferSize - m_unBufFillSize) <= 0) {
#ifdef WIN32
Sleep(0);
#else
sleep(0);
#endif
}
Once the code enters this loop, I don't see how it can go out. Maybe I missed something, but you could put a DEVLOG to see when this section of code is entered. If your buffer is fulled, it won't leave this loop.
Here:
openConnection();
if (0 >= CIPComSocketHandler::sendDataOnTCP(m_nSocketID, request, pa_unSize)) {
m_eInterruptResp = e_ProcessDataSendFailed;
}
you could check first the return value of openConnection() before trying to send, and also break the while if it fails, to avoid waiting the timeout and skip the handledConnectedDataRecv().
Regarding the CSV_WRITER I'm not sure why it freezes. I see only a possible problem that you could have if one of the inputs of the FB is longer than 100 characters, but in any case it shouldn't freeze, just write a shorter string.
And as always, I recommend to put a lot of DEVLOG information, specially when the FB could fail (and its corresponding success branch), since this code is not compiled when selecting a NODEVLOG configuration
[Updated on: Mon, 18 September 2017 07:46] Report message to a moderator
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772884 is a reply to message #1772871] |
Mon, 18 September 2017 11:29 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
Thanks for the hint. That may have been just what I needed.
The while loop in handleConnectedDataRecv() is actually residue from forte's built in ipcomlayer (on which the httpipcomlayer is based). Technically, it shouldn't be needed, because the HTTP response should never have a character width greater than 55.
But I was able to catch the class with the following corrupted buffer in debug mode:
HTTP/1.1 200 OK
Content-Type: text/plain
1858.656
e="http[192.168.10.193:7000/rest/devices/battery/M03]" forced="false"></Data></Port><Port name="QI"><Data value="TRUE" forced="false"></Data></Port><Port name="QO"><Data value="FALSE" forced="false"></Data></Port><Port name="STATUS"><Data value="" forced="false"></Data></Port><Port name="RD_1"><Data value="0" forced="false"></Data></Port><Port name="INIT"><Data value="0" time="0"></Data></Port><Port name="REQ"><Data value=
It appears to be mixed with monitoring info being sent to 4diac-IDE.
I changed the while loop to the following:
if ((cg_unIPLayerRecvBufferSize - m_unBufFillSize) <= 0) {
// If buffer is full, clear and return
memset(&m_acRecvBuffer[0], 0, sizeof(m_acRecvBuffer));
m_unBufFillSize = 0;
m_eInterruptResp = e_ProcessDataRecvFaild;
DEVLOG_INFO("HTTP recv buffer full\n");
return;
}
So if it enters, it clears the buffer and notifies a failure.
I also noticed some more potential issues.
The REST server neither includes a content-length in its response headers, nor does it send the data in encoded chunks. I was relying on the detection of "\r\n\r\n" to end the loop in which the response is received. I didn't take into account that a response to a GET request has an additional body.
As a result, maybe the '\0' terminator wasn't always received (although I would have assumed that the socket handler's receiveDataFromTCP() method terminates the data it receives).
I changed it so that it always waits for the peer to close the connection (so far I haven't noticed much of a performance impact). That seems to be the only way to ensure that all of the data is received.
If it works well, I will later add some code to detect if the header has a content-length field or if the data is sent in encoded chunks.
By the way, the code I am currently using is on the development branch of the project. I will merge it over to master if there are no new issues.
I have DEVLOGs in all critical places and will see if I can start forte it in a terminal that saves the output to a text file.
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772894 is a reply to message #1772886] |
Mon, 18 September 2017 13:23 |
|
receiveDataFromTCP() will not add '\0' to your string. It will not even give you a valid C-string!!! It will just give you an array of bytes that where received from the network and the return value of receiveDataFromTCP is how many bytes have been written into the array of bytes. the content of this bytes depends on the application protocol. For example the default encoding we use for publish and subscribe will have several '\0' which is in fact 0x00 as value in the byte array. If the data you are receiving is to be interpreted as string you need to do your application layer specific string handling.
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772909 is a reply to message #1772899] |
Mon, 18 September 2017 15:30 |
|
Yes this should be sufficient, One reason for the corruption could be that as monitoring needs quite some space that we have some memory overrunn in the ip layer. Although this should not really happen, but who knows. You could try by monitiring only very few valus and see if this is still happening or to increase the receive buffer size cmake option.
monoitoring is using the port 61499 per default, so the same as for downloading. Furthermore in monitoring forte is the server and in your case forte is the client. So I would expect that it is a rather strange memory issue. I'm currently traveling but i could try to look on your code later this week.
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772926 is a reply to message #1772909] |
Mon, 18 September 2017 19:32 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
Okay, it froze again.
I will have to add some more log entries again.
At the end of the current log, there are 7 repetitions of
INFO: T#014761348ms: Unexpected HTTP GET response code
INFO: T#014761348ms: CBSDSocketInterface: Opening TCP-Client connection at: 192.168.10.101:7979
So it happens at some point after opening the connection. Could receiveDataFromTCP() run an infinite loop under any circumstances?
I have spammed the code with log entries and will hopefully be able to pinpoint it more precisely.
Update:
Here's the end of the new log:
INFO: T#017081859ms: Handling received HTTP response
INFO: T#017081859ms: Unexpected HTTP PUT response code
INFO: T#017081859ms: CBSDSocketInterface: Opening TCP-Client connection at: 192.168.10.101:7979
INFO: T#017081860ms: Sending request on TCP
INFO: T#017081860ms: Attempting to receive data from TCP
The last line is just before receiveDataFromTCP(), so I guess that method must somehow be causing an infinite loop when the remote server has an error (unless for some reason, closeSocket() is the culprit).
I suppose I will have to put it in a separate thread with a timeout or something.
[Updated on: Tue, 19 September 2017 07:53] Report message to a moderator
|
|
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772956 is a reply to message #1772947] |
Tue, 19 September 2017 12:51 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
Thanks so much!
That's exactly the kind of thing I was looking for. Much simpler than a separate thread!
I still have one small problem though. I added the following code to handleConnectedDataRecv()
struct timeval tv; // Timeout
tv.tv_sec = 10;
tv.tv_usec = 10000;
fd_set fdset;
FD_ZERO(&fdset);
FD_SET(m_nSocketID, &fdset);
if (select(1, &fdset, NULL, NULL, &tv) > 0) {
// call receiveDataFromTCP()
}
else {
nRetVal = -1;
DEVLOG_INFO("No data received from TCP\n");
}
It works fine on the Win32 version of forte, but on the Posix version, the select() return value is never greater than 0.
Update:
I got it to work by passing (m_nSocketID + 1) as the first argument to select().
Thanks again! I will post an update if it is still running smoothly in the next 2 days.
[Updated on: Tue, 19 September 2017 12:59] Report message to a moderator
|
|
|
|
|
|
Re: Forte stops running on Raspberry Pi after a certain time [message #1772985 is a reply to message #1772959] |
Tue, 19 September 2017 17:54 |
Marc Jakobi Messages: 67 Registered: April 2017 |
Member |
|
|
No problem :)
The first iteration of the HTTP com layer actually worked in that way, using forte's standard IP com layer that the FBDK layer uses as its bottom layer.
But it turned out to have a lot of issues.
HTTP servers usually close the connection immediately after sending the response. In the HTTP standard, this is intended behaviour, while in forte, it is treated as unwanted.
As a result, the process would be interrupted (often before the response was handled), causing the function block to issue an INIT- event almost immediately after opening the connection and receiving data.
Attempts to work around the processInterrupt quickly got messy, so I decided to create my own IP layer that is based on the FORTE one.
But if the current version works and I end up adding it to FORTE after my thesis, I could create a common parent class for the two IP layers with shared code in its protected methods.
Update:
I played around with it and got it working with forte's select loop (see development branch). But I think either changing the standard CIPComLayer's private methods to protected and subclassing it or creating a common parent class would be necessary to truly reduce code duplication.
I will leave the changes in the development branch for now, since the master branch seems to be working so far.
[Updated on: Wed, 20 September 2017 06:09] Report message to a moderator
|
|
|
|
|
|