Re: [openj9-dev] Performance questions for low-latency app



Controls mutator vs GC (default 70 vs 30) CPU utilization, while GC is in active cycle. With higher numbers you can ensure higher throughput of the application during an active GC cycle, but at a cost of higher heap pressure (more frequent GCs, hence lower average throuhgput across a larger period of time with several GC cycles). For example, value of 85 in theory should give you approximately 20% (85/70 = 1.2x) better throughput while GC is active, but it will increase the duration of GC cycles by factor of (100-70)/(100-85) = 2x, so one will need larger heap to ensure a GC cycle finished before it runs of memory and switches to non-incremental mode (so called syncGC), hence with larger pauses.


With very large heaps (which indeed seems to be a case here) one can increase the regions size to reduce region related processing overhead (which could somewhat reduce GC cycle time). For example, it's reasonable to try 4x larger than default (64K). Regions size will be rounded to the next lower power of 2. The downside of choosing too large region size is increased heap fragmentation (lower heap utilisation)

Some more info (in addition to verbose GC) about heap occupancy and other meta deta, one can get with -Xtgc:heap.



Controls the GC cycle trigger point expressed as heap occupancy. Default is 50% of heap, expressed in bytes. Larger value will delay trigger, hence increase heap utilisation/reduce GC frequency, but at the cost of increasing probability of not being able to finish the GC cycle incrementally. Since it's a static trigger, the worst case scenario of allocation rate, and live set size should be accounted when overriding this value.


From: akrus <akrus@xxxxxxxxxxx>
To: openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
Date: 06/20/2018 04:10 AM
Subject: [openj9-dev] Performance questions for low-latency app
Hello everyone!

I was able to switch one of our app instances to OpenJ9 finally and it
works good so far. It was running on Azul Zing before, so I can compare
what we have now and what we had before.

There are couple of questions:
1) I have the following flags set so far: -Xmx200g
-Xshareclasses:name=myapp -Xgcpolicy:metronome -Xgc:targetPauseTime=1
Are there any other flags to consider for low-latency app?

2) I've tried enabling hugepages support for Java heap, but it was
complaining that such amount of memory is unavailable (which is not
true). I suppose that this can happen because hugepages were enabled on
a working system so it's advised in docs somewhere to reboot the
machine to fix this. Is anyone using hugepages for heap? Is there any
performance comparison for OpenJ9 with hugepages and without?

3) there is CUDA support available. Although we have no nVidia cards on
the servers, is it worth adding one? Is there any performance
comparison for this as well?

4) I've tried switching JIT optLevel to scorching and this is consuming
too much CPU. Is it worth changing default JIT level to any other?

5) I can see that Java memory usage never exceeds 50% of Xmx value, is
it supposed to be like that? Is there any flag to change this?

6) OpenJ9 supports RDMA, but there is no
configuration example I found. As I can see this was tested with
Mellanox cards, did anyone test Mellanox VMA?

7) comparing with Zing again, I can see that some app threads are
sometimes 'hanging' for ~20-40 msec and this happens quite frequently.
While the server is quite loaded and this can be normal behaviour, with
Zing it's happening much less frequently, but pauses are more

8) are there support options available (IBM?)?

Sorry for such a bunch of questions, but this mailing list is the only
place to find out the answers :)

