[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[tracecompass-dev] Kernel Lock Analysis

Hello List!

Just as I have done with the LTTng people I would like to share my
experience playing with Kernel lock instrumentation, LTTng and Trace
Compass which might be useful for others when analyzing applications
behavior.

Currently, the Linux kernel features a set of lock tracepoints which
track lock acquisition and contention [1]:

 - lock_acquire
 - lock_acquired
 - lock_contended
 - lock_release

However, the current version of LTTng does not instrument this
tracepoints (the code is in the source code but commented [2]). The
guys on LTTng IRC helped me enabling them again by just uncommenting
the code. I have currently reported a bug about this here [3].

Thanks to these tracepoints, I could create a couple of Trace Compass
views (attached in this mail), to easily track the kernel lock
contention, which clearly showed me what was going on. One view shows
locking status per each kernel lock and the other shows per thread
contention for all locks). Please, see the attached screenshot for an
example of per thread view.

The application under analysis is a parallel cholesky benchmark run on
a server with 56 CPUs. I was trying to figure out why almost all
application threads became blocked at some point as seen in the
screenshot. The lock view showed that there was a huge contention on
the mm->mmap_sem lock when all threads tried to allocate memory by
calling mmap(), mmprotect() and triggered page faults when data is
written on the recently mmapped memory.

Hence, I would like to point out how useful it has been for me to
enable the LTTng lock tracepoints and use the custom lock views. If
the lock tracepoints are enabled again, I think that it might useful
for others to add similar views. However, It is worth noting that this
tracepoints are only enabled if compiling the kernel with
CONFIG_LOCK_STAT, which is a kernel debugging feature not set by
default.

Thanks for your work!

 [1] See Documentation/locking/lockstat.txt on the Linux Kernel source
     for more information.
 [2] As Compudj pointed out, the reason is found in this conversation:
     https://lists.lttng.org/pipermail/lttng-dev/2012-December/019256.html
 [3] LTTng lock bug: https://bugs.lttng.org/issues/1157


http://bsc.es/disclaimer

Attachment: cholesky-lock-analysis.png
Description: PNG image

Attachment: lock_contention_analysis.xml
Description: application/xml

Attachment: per_lock_analysis.xml
Description: application/xml

Attachment: cholesky-per-lock.png
Description: PNG image