[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[paho-dev] MQTT C Client - Segmentation fault
|
Hi
I'm still working on implement a MQTT-SN gateway using the Paho 'MQTT
C Client for Posix and Windows'. And now had another problem. This
time I think it is a concurrency bug of the MQTT.C library:
My gateway run for more than a week an crashed with a Segmentation
fault. My guess is that the Segmentation fault is due to a race
condition caused by a missing lock(mutex) for s.cur_clientsds inside
of Socket.c.
Followed I provide the information and reasoning for my guess:
Info of the segmentation fault:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb4509460 (LWP 17845)]
0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
at source/lib/Mqtt.C/src/Socket.c:278
the code there looks like this:
274 if (s.cur_clientsds == NULL)
275 rc = 0;
276 else
277 {
278 rc = *((int*)(s.cur_clientsds->content));
279 ListNextElement(s.clientsds, &s.cur_clientsds);
280 }
the backtrace of the crash is:
(gdb) bt
#0 0x00080750 in Socket_getReadySocket (more_work=0, tp=0xb4508d88)
at source/lib/Mqtt.C/src/Socket.c:278
#1 0x00077518 in MQTTClient_cycle (sock=0xb4508dc4, timeout=1000,
rc=0xb4508dc8)
at source/lib/Mqtt.C/src/MQTTClient.c:1547
#2 0x00074a90 in MQTTClient_run (n=0x12a9b4)
at source/lib/Mqtt.C/src/MQTTClient.c:483
#3 0xb6e40bfc in start_thread (arg=0xb4509460)
at pthread_create.c:306
#4 0xb6dd5968 in ?? ()
at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116
and a the variables are:
(gdb) p s
$1 = {rset = {__fds_bits = {2048, 0 <repeats 31 times>}}, rset_saved = {
__fds_bits = {1024, 0 <repeats 31 times>}}, maxfdp1 = 11,
clientsds = 0x12a774, cur_clientsds = 0x0, connect_pending = 0x12a804,
write_pending = 0x12a894, pending_wset = {__fds_bits = {
0 <repeats 32 times>}}}
(gdb) p s.cur_clientsds
$2 = (ListElement *) 0x0
The Segmentation fault is because s.cur_clientsds is NULL when line
278 is executed. Since line 278 is actually inside an else of a
(s.cur_clientsds == NULL) check I guess that s.cur_clientsds must
have be changed by an other thread in between the check (line 274) and
the access (line 278). So I searched where s.cur_clientsds is changed
and found that the Socket_close() function sets it to NULL:
void Socket_close(int socket)
{
FUNC_ENTRY;
Socket_close_only(socket);
FD_CLR(socket, &(s.rset_saved));
if (FD_ISSET(socket, &(s.pending_wset)))
FD_CLR(socket, &(s.pending_wset));
if (s.cur_clientsds != NULL && *(int*)(s.cur_clientsds->content)
== socket)
s.cur_clientsds = s.cur_clientsds->next;
...
Therefore s.cur_clientsds is accessed and changed by multiple threads:
By the libraries 'worker thread' during the MQTTClient_run() and by a
client thread when the client calls MQTTClient_disconnect() since this
calls Socket_close(). So I checked the call trees for mutex locks: The
Socket_close() is inside a Thread_lock_mutex(mqttclient_mutex) but the
Socket_getReadySocket() is not inside any mutex lock since
MQTTClient_cycle() is explicitly called outside of the
mqttclient_mutex:
Thread_unlock_mutex(mqttclient_mutex);
pack = MQTTClient_cycle(&sock, timeout, &rc); ==> Socket_getReadySocket()
Thread_lock_mutex(mqttclient_mutex);
Therefore I think my guess is correct that a race condition, because
of missing synchronization for s.cur_clientsds, caused the
Segmentation fault.
My quick solution would be to introduce a new socket_mutex inside
Socket.c to protect accesses to :
/**
* Structure to hold all socket data for the module
*/
Sockets s;
but I would need to invest some more time to fully grasp the locking
strategy of the library. Therefore should I try to invest some time to
understand the internals and introduce this mutex to create a patch?
Or is it not worth my effort because one of the commiters quickly can
confirm and fix this problem.
Should I raise a bug for this problem?
Thanks and Regards
Franz