[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [paho-dev] MQTT C Client - Segmentation fault | 
Hi Ian
Just at the same time I added a bug. And now just marked it as a
duplicate of yours :-).
Thanks a lot for looking into this and fixing this so quickly. This is
a really cool fast feedback loop.
Regards
Franz
On Fri, Oct 17, 2014 at 11:56 AM, Ian Craggs
<icraggs@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Franz,
>
> I've raised a bug - it's 447672.  I have also implemented a proposed fix,
> which I'll be adding to the development branch very soon.
>
> Ian
>
>
> On 10/16/2014 09:08 PM, Franz Schnyder wrote:
>
> I'm still working on implement a MQTT-SN gateway using the Paho 'MQTT
> C Client for Posix and Windows'. And now had another problem. This
> time I think it is a concurrency bug of the MQTT.C library:
>
> My gateway run for more than a week an crashed with a Segmentation
> fault. My guess is that the Segmentation fault is due to a race
> condition caused by a missing lock(mutex) for s.cur_clientsds inside
> of Socket.c.
>
> Followed I provide the information and reasoning for my guess:
>
> Info of the segmentation fault:
>
>    Program received signal SIGSEGV, Segmentation fault.
>    [Switching to Thread 0xb4509460 (LWP 17845)]
>    0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
>        at source/lib/Mqtt.C/src/Socket.c:278
>
> the code there looks like this:
>
>    274 if (s.cur_clientsds == NULL)
>    275    rc = 0;
>    276 else
>    277 {
>    278    rc = *((int*)(s.cur_clientsds->content));
>    279    ListNextElement(s.clientsds, &s.cur_clientsds);
>    280 }
>
> the backtrace of the crash is:
>
>    (gdb) bt
>    #0  0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
>        at source/lib/Mqtt.C/src/Socket.c:278
>    #1  0x00077518 in MQTTClient_cycle (sock=0xb4508dc4, timeout=1000,
> rc=0xb4508dc8)
>        at source/lib/Mqtt.C/src/MQTTClient.c:1547
>    #2  0x00074a90 in MQTTClient_run (n=0x12a9b4)
>        at source/lib/Mqtt.C/src/MQTTClient.c:483
>    #3  0xb6e40bfc in start_thread (arg=0xb4509460)
>        at pthread_create.c:306
>    #4  0xb6dd5968 in ?? ()
>        at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116
>
> and a the variables are:
>
>    (gdb) p s
>    $1 = {rset = {__fds_bits = {2048, 0 <repeats 31 times>}}, rset_saved = {
>        __fds_bits = {1024, 0 <repeats 31 times>}}, maxfdp1 = 11,
>      clientsds = 0x12a774, cur_clientsds = 0x0, connect_pending = 0x12a804,
>      write_pending = 0x12a894, pending_wset = {__fds_bits = {
>          0 <repeats 32 times>}}}
>    (gdb) p s.cur_clientsds
>    $2 = (ListElement *) 0x0
>
> The Segmentation fault is because s.cur_clientsds is NULL when line
> 278 is executed. Since line 278 is actually inside an else of a
> (s.cur_clientsds == NULL) check I guess that  s.cur_clientsds must
> have be changed by an other thread in between the check (line 274) and
> the access (line 278). So I searched where s.cur_clientsds is changed
> and found that the Socket_close() function sets it to NULL:
>
>    void Socket_close(int socket)
>    {
>       FUNC_ENTRY;
>       Socket_close_only(socket);
>       FD_CLR(socket, &(s.rset_saved));
>       if (FD_ISSET(socket, &(s.pending_wset)))
>          FD_CLR(socket, &(s.pending_wset));
>       if (s.cur_clientsds != NULL && *(int*)(s.cur_clientsds->content)
> == socket)
>          s.cur_clientsds = s.cur_clientsds->next;
>       ...
>
> Therefore s.cur_clientsds is accessed and changed by multiple threads:
> By the libraries 'worker thread' during the MQTTClient_run() and by a
> client thread when the client calls MQTTClient_disconnect() since this
> calls Socket_close(). So I checked the call trees for mutex locks: The
> Socket_close() is inside a Thread_lock_mutex(mqttclient_mutex) but the
> Socket_getReadySocket() is not inside any mutex lock since
> MQTTClient_cycle() is explicitly called outside of the
> mqttclient_mutex:
>
>    Thread_unlock_mutex(mqttclient_mutex);
>    pack = MQTTClient_cycle(&sock, timeout, &rc);  ==>
> Socket_getReadySocket()
>    Thread_lock_mutex(mqttclient_mutex);
>
> Therefore I think my guess is correct that a race condition, because
> of missing synchronization for s.cur_clientsds, caused the
> Segmentation fault.
>
> My quick solution would be to introduce a new socket_mutex inside
> Socket.c to protect accesses to :
>    /**
>    * Structure to hold all socket data for the module
>    */
>    Sockets s;
>
> but I would need to invest some more time to fully grasp the locking
> strategy of the library. Therefore should I try to invest some time to
> understand the internals and introduce this mutex to create a patch?
> Or is it not worth my effort because one of the commiters quickly can
> confirm and fix this problem.
>
>
> --
> Ian Craggs
> icraggs@xxxxxxxxxx                 IBM United Kingdom
> Paho Project Lead; Committer on Mosquitto
>
>
> _______________________________________________
> paho-dev mailing list
> paho-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/paho-dev