Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [paho-dev] MQTT C Client - Segmentation fault

Hi Franz,

yes, please raise a bug. Thanks very much for your analysis. I'll consider a fix today, and let you know.

Ian

On 10/16/2014 09:08 PM, Franz Schnyder wrote:
Hi

I'm still working on implement a MQTT-SN gateway using the Paho 'MQTT
C Client for Posix and Windows'. And now had another problem. This
time I think it is a concurrency bug of the MQTT.C library:

My gateway run for more than a week an crashed with a Segmentation
fault. My guess is that the Segmentation fault is due to a race
condition caused by a missing lock(mutex) for s.cur_clientsds inside
of Socket.c.

Followed I provide the information and reasoning for my guess:

Info of the segmentation fault:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xb4509460 (LWP 17845)]
    0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
        at source/lib/Mqtt.C/src/Socket.c:278

the code there looks like this:

    274 if (s.cur_clientsds == NULL)
    275    rc = 0;
    276 else
    277 {
    278    rc = *((int*)(s.cur_clientsds->content));
    279    ListNextElement(s.clientsds, &s.cur_clientsds);
    280 }

the backtrace of the crash is:

    (gdb) bt
    #0  0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
        at source/lib/Mqtt.C/src/Socket.c:278
    #1  0x00077518 in MQTTClient_cycle (sock=0xb4508dc4, timeout=1000,
rc=0xb4508dc8)
        at source/lib/Mqtt.C/src/MQTTClient.c:1547
    #2  0x00074a90 in MQTTClient_run (n=0x12a9b4)
        at source/lib/Mqtt.C/src/MQTTClient.c:483
    #3  0xb6e40bfc in start_thread (arg=0xb4509460)
        at pthread_create.c:306
    #4  0xb6dd5968 in ?? ()
        at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116

and a the variables are:

    (gdb) p s
    $1 = {rset = {__fds_bits = {2048, 0 <repeats 31 times>}}, rset_saved = {
        __fds_bits = {1024, 0 <repeats 31 times>}}, maxfdp1 = 11,
      clientsds = 0x12a774, cur_clientsds = 0x0, connect_pending = 0x12a804,
      write_pending = 0x12a894, pending_wset = {__fds_bits = {
          0 <repeats 32 times>}}}
    (gdb) p s.cur_clientsds
    $2 = (ListElement *) 0x0

The Segmentation fault is because s.cur_clientsds is NULL when line
278 is executed. Since line 278 is actually inside an else of a
(s.cur_clientsds == NULL) check I guess that  s.cur_clientsds must
have be changed by an other thread in between the check (line 274) and
the access (line 278). So I searched where s.cur_clientsds is changed
and found that the Socket_close() function sets it to NULL:

    void Socket_close(int socket)
    {
       FUNC_ENTRY;
       Socket_close_only(socket);
       FD_CLR(socket, &(s.rset_saved));
       if (FD_ISSET(socket, &(s.pending_wset)))
          FD_CLR(socket, &(s.pending_wset));
       if (s.cur_clientsds != NULL && *(int*)(s.cur_clientsds->content)
== socket)
          s.cur_clientsds = s.cur_clientsds->next;
       ...

Therefore s.cur_clientsds is accessed and changed by multiple threads:
By the libraries 'worker thread' during the MQTTClient_run() and by a
client thread when the client calls MQTTClient_disconnect() since this
calls Socket_close(). So I checked the call trees for mutex locks: The
Socket_close() is inside a Thread_lock_mutex(mqttclient_mutex) but the
Socket_getReadySocket() is not inside any mutex lock since
MQTTClient_cycle() is explicitly called outside of the
mqttclient_mutex:

    Thread_unlock_mutex(mqttclient_mutex);
    pack = MQTTClient_cycle(&sock, timeout, &rc);  ==> Socket_getReadySocket()
    Thread_lock_mutex(mqttclient_mutex);

Therefore I think my guess is correct that a race condition, because
of missing synchronization for s.cur_clientsds, caused the
Segmentation fault.

My quick solution would be to introduce a new socket_mutex inside
Socket.c to protect accesses to :
    /**
    * Structure to hold all socket data for the module
    */
    Sockets s;

but I would need to invest some more time to fully grasp the locking
strategy of the library. Therefore should I try to invest some time to
understand the internals and introduce this mutex to create a patch?
Or is it not worth my effort because one of the commiters quickly can
confirm and fix this problem.

Should I raise a bug for this problem?

Thanks and Regards

Franz
_______________________________________________
paho-dev mailing list
paho-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/paho-dev

--
Ian Craggs
icraggs@xxxxxxxxxx                 IBM United Kingdom
Paho Project Lead; Committer on Mosquitto



Back to the top