Any known issues with analyzing huge heap dumps? [message #2628] |
Tue, 19 August 2008 17:06  |
Eclipse User |
|
|
|
Originally posted by: danwork.dshunter.co.uk
Hi,
We're currently having problems with a java process that occupies a 12GB
of heap, using Java 1.5.0_12 on a 64 bit machine. We are getting
unexplained OOM errors after around 24 hours of use, something we have
been unable to replicate in test.
I've got a unix core dump (~13GB) from the misbehaving process and I've
created an hprof file (~9GB) from that to analyse for any potential memory
leaks etc.
I've already created a histogram of the the core using jmap but
unfortunately, as this is Java 5, it includes non-live objects. It clearly
shows the heap maxed out. I've had a go with JHat, but haven't really
found it very useful for locating memory leaks.
However, memory analyzer reports that only 12MB (megabytes) of the heap is
being used *in total*. There doesn't look like there is anything missing
(our service only requires around 8MB when it isn't busy) so it looks to
me like the garbage collection isn't working properly. I'd like to push
for an upgrade to the JVM if this is the case.
Is this a reliable analysis or does memory analyzer have problems
analysing large heaps? I should point out that I am having to run it in a
32bit jvm (due to an unrelated problem with Eclipse) but I don't get any
error messages etc.
Cheers,
Daniel
|
|
|
Re: Any known issues with analyzing huge heap dumps? [message #2656 is a reply to message #2628] |
Wed, 20 August 2008 07:52   |
Eclipse User |
|
|
|
Hi Daniel,
glad to hear that you are using the Memory Analyzer, sorry to hear that
you run into problems...
First: Analyzing 64bit dumps on 32bit machines is perfectly fine. We often
do this ourselves. The only restriction is that you have less memory (~1.2
GB) available for analysis. Sometimes is helps to parse the heap dump
(happens when the dump is opened for the first time) on 64bit. Then
continue analysis on 32bit.
Second: When reading the initial dump, we are removing objects which are
not reachable. These are usually "left-overs" in the form of byte[] arrays
which the Garbage Collector leaves hanging around for performance reasons
(e.g. avoid moving objects and therefor avoid assigning new addresses).
This is the reason, why the reported size is usually smaller than the file
size or the initially reported number of objects.
But, of course, removing 9 GB and leaving only 12 MB sounds strange. So I
don't want to rule out an issue with MAT.
To isolate the problem:
a) We print to the log file (during parsing) the initial number of objects
found. What does this number say? And what does it say on the first page
when the heap dump is opened?
b) You say, you were able to read the heap dump with JHat. What is the
number of objects reported there? (Because JHat usually opens only smaller
heap dumps, this could indicate either really big primitive arrays (e.g.
few objects) or only few objects.)
c) How did you generate the heap dump? While jmap with VM 1.5 is sometimes
unreliable (e.g. corrupted dumps for no reason), the
+XX:+HeapDumpOnOutOfMemoryError option usually produces readable dumps.
d) What operating system are you using?
Andreas.
|
|
|
Re: Any known issues with analyzing huge heap dumps? [message #3355 is a reply to message #2656] |
Tue, 26 August 2008 12:04   |
Eclipse User |
|
|
|
Originally posted by: danwork.dshunter.co.uk
Hi Andreas,
Just to give you a few updates, we upgraded to JVM 1.6 to see if this
eradicated the problem. Unfortunately it didn't solve the problem but it
did create a more realistic heap to analyse.
From a 6 gig heap OOM, memory analyzer showed a 1 gig live heap. This
allowed us to locate a minor memory leak, but there is still a 5 gig
discrepancy that makes me a little nervous that we are missing the root
cause.
The answers to your questions:
a) The log file reports 14,722,946 and the first page reports 14.1M
b) JHat lists 14722945
c) I believe we did a kill -6 on the process... not ideal but the process
would often do a GC death spiral rather than thrown an OOM. Unfortunately,
this process is pretty mission critical to what we do so we can't wait
around for it to die 'gracefully'. However, I don't think the JVM has
entered an unhealthy state as I have seen similar results on healthy
versions of the process.
d) Red Hat AS3 to run Memory Analyzer and AS4 for the java process.
Thanks for your help,
Daniel
|
|
|
Re: Any known issues with analyzing huge heap dumps? [message #3556 is a reply to message #3355] |
Wed, 27 August 2008 12:17  |
Eclipse User |
|
|
|
Hi Daniel!
> a) The log file reports 14,722,946 [...]
> b) JHat lists 14722945
The one object difference is expected. We create a pseudo object for the
system class loader (which is not included in HPROF) because it makes the
analysis consistent.
> a) [...] and the first page reports 14.1M
As explained earlier, we remove unreachable objects because the various
garbage collector algorithms tend to leave some garbage behind (if the
object is too small, moving and re-assigning addresses is to expensive).
In this particular case about 4%... Not necessarily a problem.
> c) I believe we did a kill -6 on the process...
Okay, I don't know this option. Usually, I use JConsole and/or jps/jmap to
trigger the heap dump.
Let's try to figure out what the 4% are and what size they have. I will
develop patch, that will print out in detail what objects are not
reachable. I created this bugzilla entry to track this debugging feature:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=245410 Maybe add yourself to
the CC list.
Andreas.
|
|
|
Powered by
FUDForum. Page generated in 0.04316 seconds