Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Mosquitto » File descriptors leak(Some sockets don't seem to be properly closed under certain circumstances)
File descriptors leak [message #1839865] Tue, 30 March 2021 14:34 Go to next message
Alfredo Quesada is currently offline Alfredo QuesadaFriend
Messages: 5
Registered: July 2009
Junior Member
After having detected a problem with a C app I developed that uses the mosquitto library, I finally found where the problem is and how to force it.

Attached there's a simple test source file that I've used to show how to force the problem (I've skipped some checks, that was not the point).

First of all, my scenario includes a Raspberry Pi with Raspberry Pi OS (the new name for Raspbian) Buster (2021-01-11 / kernel 5.4.83 and libmosquitto1 1.5.7-1+deb10u1) connected using a switch to my PC which has 2 network interfaces. The machine hosts the mosquitto server and a DHCP server (DHCP for the internal network including the RBPI).

Under normal circumstances everything works fine and the file descriptors related to the socket used by the library are closed and reused once it's time to reconnect. If I just stop the mosquitto server, the TCP/SYN gets no response and the TCP layer works fine.

However, if I put down the internal interface from the PC right after closing mosquitto, in some cases the socket isn't fully closed and the file descriptor remains open. As a result a new call to mosquitto_connect_async (and indirectly to socket) returns a new file descriptor.

The steps to follow to reproduce the problem are these:
- Start mosquitto (server) with a configuration that allows anonymous users to connect.
- Start the app (RBPI).
- Wait until the connection is established. You should see a debug messsage main: WAITING_ACK_CONNECT / returnCode 0.
- In another terminal get the PID of the app and execute sudo lsof -p $PID. You should see something like this:
app 8039 pi 5u IPv4 55275 0t0 TCP 192.168.1.100:58976->192.168.1.88:1883 (ESTABLISHED)
- Stop mosquitto and disable the internal interface at the server with sudo ip link set dev ethX down.
- In the output of the app you should see it trying to reconnect and you should see mosquitto_connect_async getting a new file descriptor in some cases. If you don't, just repeat the process (enable the interface at the PC, start mosquitto again, wait until the app connects and wait for a while to stop mosquitto and disable the interface).
- Once this happens, you can check lsof again and you'll see now multiple entries for those file descriptors that haven't been closed.

Is there any way to fix this or prevent this from happening? As you can imagine, eventually the app runs out of file descriptors and you get an errno 24 (EMFILE -> Too many open files) when you call mosquitto_connect_async. After that there isn't much more you can do, at least not in a thread-safe way.

Regards
  • Attachment: main.c
    (Size: 3.37KB, Downloaded 80 times)

[Updated on: Tue, 30 March 2021 17:14]

Report message to a moderator

Re: File descriptors leak [message #1839962 is a reply to message #1839865] Fri, 02 April 2021 10:07 Go to previous messageGo to next message
Roger Light is currently offline Roger LightFriend
Messages: 90
Registered: September 2013
Member
Thank you for the description and example program, it's made it much easier - although it's taken me two more than two nights to track down.

This is now fixed in the `fixes` branch and will be part of 2.0.10 - I'll look into back porting it to 1.6.x and 1.5.x as well, although they won't be getting a release for a while.

A general note for you though, if you're using `mosquitto_loop_start()`, then the library will try and reconnect for you.
Re: File descriptors leak [message #1839963 is a reply to message #1839962] Fri, 02 April 2021 10:29 Go to previous messageGo to next message
Alfredo Quesada is currently offline Alfredo QuesadaFriend
Messages: 5
Registered: July 2009
Junior Member
That's good to hear, thank you :)

Just out of curiosity, where was the problem? Although I didn't properly debug the library because I didn't want to have to recompile it, I analyzed the main parts of the source code and they looked correct. I was starting to think there was a problem with close considering it may not close the file descriptor in certain cases as stated in the man page.

Well, there's actually a well-known potential leak point there that can't always be fixed as it depends on the implementation and POSIX is not 100% strict on this function's behavior.

Regards

[Updated on: Fri, 02 April 2021 10:30]

Report message to a moderator

Re: File descriptors leak [message #1839985 is a reply to message #1839962] Fri, 02 April 2021 21:35 Go to previous messageGo to next message
Alfredo Quesada is currently offline Alfredo QuesadaFriend
Messages: 5
Registered: July 2009
Junior Member
Ok, I just took a look at the patch and I understand the problem. By using a temporary variable, the socket referred by it wouldn't be closed if net__try_connect didn't return MOSQ_ERR_SUCCESS.

However this should happen (and that was something to be fixed indeed) as long as the server was out of reach including those cases when it was just offline. IIRC I managed to reproduce the problem only with the put-the-link-down thing and just stopping the server wasn't enough. I'll try to recheck it next week when I'm back to work in order to confirm it.

Regards

[Updated on: Fri, 02 April 2021 23:24]

Report message to a moderator

Re: File descriptors leak [message #1840555 is a reply to message #1839985] Mon, 19 April 2021 10:05 Go to previous message
allen deng is currently offline allen dengFriend
Messages: 1
Registered: April 2021
Junior Member
In my opinion, this may be a problem caused by thread cancellation
-Since the socket created in the thread started by mosquitto_loop_start is non-blocking, it will wait for the return in the connect function
-Because it is asynchronous, when we call mosquitto_loop_stop, the thread will be cancelled, because connect is the thread cancellation point defined by posix, so the thread exits directly without closing the socket when the connect function is exited
-Because the net__try_connect function is called using local variables, the value in mosq->sock is still -1, which leads to the socket not being closed correctly when we call mosquitto__reconnect again.

how about your view! @Alfredo Quesada @Roger Light
Previous Topic:reconnect clients when bridge reconnects
Next Topic:Mosquitto bridge and retain behaviour
Goto Forum:
  


Current Time: Thu Apr 25 22:49:16 GMT 2024

Powered by FUDForum. Page generated in 0.03019 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top