[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [openj9-dev] Performance questions for low-latency app
|
Thank you everyone for your comments!
I'll try some changes there and tell how it goes. We have heaps of 500G+ on some servers, does the recommendation for regionSize stay the same?
On Thu, 2018-06-21 at 01:04 -0400, Mark Stoodley wrote:
akrus asked:
>> 5)
I can see that Java memory usage never exceeds 50% of Xmx value, is
it supposed to be like that? Is there any flag to change this?
Aleks answered:
> Controls the
GC cycle trigger point expressed as heap occupancy. Default is 50% of heap,
expressed in bytes. Larger value will delay trigger, hence increase heap
utilisation/reduce GC frequency, but at the cost of increasing probability
of not being able to finish the GC cycle incrementally. Since it's a static
trigger, the worst case scenario of allocation rate, and live set size
should be accounted when overriding this value.
I'll just add a few details that may
help to understand why Metronome has this kind of option and behaviour:
Metronome is a snapshot-at-beginning collector (unlike any of our other
collectors, I believe), which means Metronome will only collect memory
that was garbage when the GC cycle began. It also means you need to have
sufficient free space on the heap to be able to outlast most of the GC
cycle if you want to avoid a STW pause (because freed memory only becomes
available to the application to service allocations near the end of the
GC cycle).
"GC cycle" here, refers to
the full process of collecting the entire heap, which may require many
many individual GC pauses (which we cryptically tend to call "quanta"),
especially if you're tuning for a small pause time (you mentioned 1ms).
Another perhaps subtle implication of the "snapshot at beginning"
approach is that the longer the GC cycle stretches out (i.e. the more pauses
you need to complete the full GC cycle), the more memory the application
can allocate during a GC cycle (see Aleks' comment about "allocation
rate" which applies here :) ) and that memory cannot be reclaimed
until near the end of the *next* GC cycle (remember: snapshot at *beginning*!).
Specifying a high utilization for the application (80% versus the default
70%) has the same "lengthening" effect on the GC cycle time because
not as many pauses can be scheduled in any given time window), which may
also warrant a larger heap size / lower trigger value simply to buy more
headroom for the application to allocate memory while the collector is
operating.
I hope that was useful, but if I didn't
make sense just keep asking questions and we'll straighten it out (and
generate lots of words to seed better open documentation as well :) ).
|
|
Mark
Stoodley | 8200
Warden Avenue | |
Senior
Software Developer | Markham,
L6G 1C7 |
IBM
Runtime Technologies | Canada |
Phone: | +1-905-413-5831 |
| |
e-mail: | mstoodle@xxxxxxxxxx |
| |
We cannot solve our problems with the same
thinking we used when we created them - Albert Einstein |
|
|
From:
"Aleksandar Micic"
<Aleksandar_Micic@xxxxxxxxxx>
To:
openj9 developer discussions
<openj9-dev@xxxxxxxxxxx>
Date:
2018/06/20 02:54 PM
Subject:
Re: [openj9-dev]
Performance questions for low-latency app
Sent by:
openj9-dev-bounces@xxxxxxxxxxx
1)
-Xgc:targetUtilization=70
Controls mutator vs GC (default 70 vs 30) CPU utilization, while GC is
in active cycle. With higher numbers you can ensure higher throughput of
the application during an active GC cycle, but at a cost of higher heap
pressure (more frequent GCs, hence lower average throuhgput across a larger
period of time with several GC cycles). For example, value of 85 in theory
should give you approximately 20% (85/70 = 1.2x) better throughput while
GC is active, but it will increase the duration of GC cycles by factor
of (100-70)/(100-85) = 2x, so one will need larger heap to ensure a GC
cycle finished before it runs of memory and switches to non-incremental
mode (so called syncGC), hence with larger pauses.
-Xgc:regionSize=65536
With very large heaps (which indeed seems to be a case here) one can increase
the regions size to reduce region related processing overhead (which could
somewhat reduce GC cycle time). For example, it's reasonable to try 4x
larger than default (64K). Regions size will be rounded to the next lower
power of 2. The downside of choosing too large region size is increased
heap fragmentation (lower heap utilisation)
Some more info (in addition to verbose GC) about heap occupancy and other
meta deta, one can get with -Xtgc:heap.
5)
-XXgc:trigger=
Controls the GC cycle trigger point expressed as heap occupancy. Default
is 50% of heap, expressed in bytes. Larger value will delay trigger, hence
increase heap utilisation/reduce GC frequency, but at the cost of increasing
probability of not being able to finish the GC cycle incrementally. Since
it's a static trigger, the worst case scenario of allocation rate, and
live set size should be accounted when overriding this value.
Regards,
Aleks
akrus
---06/20/2018 04:10:02 AM---Hello everyone! I was able to switch one of
our app instances to OpenJ9 finally and it
From: akrus <akrus@xxxxxxxxxxx>
To: openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
Date: 06/20/2018 04:10 AM
Subject: [openj9-dev] Performance questions for low-latency
app
Sent by: openj9-dev-bounces@xxxxxxxxxxx
Hello everyone!
I was able to switch one of our app instances to OpenJ9 finally and it
works good so far. It was running on Azul Zing before, so I can compare
what we have now and what we had before.
There are couple of questions:
1) I have the following flags set so far: -Xmx200g
-Xshareclasses:name=myapp -Xgcpolicy:metronome -Xgc:targetPauseTime=1
Are there any other flags to consider for low-latency app?
2) I've tried enabling hugepages support for Java heap, but it was
complaining that such amount of memory is unavailable (which is not
true). I suppose that this can happen because hugepages were enabled on
a working system so it's advised in docs somewhere to reboot the
machine to fix this. Is anyone using hugepages for heap? Is there any
performance comparison for OpenJ9 with hugepages and without?
3) there is CUDA support available. Although we have no nVidia cards on
the servers, is it worth adding one? Is there any performance
comparison for this as well?
4) I've tried switching JIT optLevel to scorching and this is consuming
too much CPU. Is it worth changing default JIT level to any other?
5) I can see that Java memory usage never exceeds 50% of Xmx value, is
it supposed to be like that? Is there any flag to change this?
6) OpenJ9 supports RDMA, but there is no -Dcom.ibm.net.rdma.conf
configuration example I found. As I can see this was tested with
Mellanox cards, did anyone test Mellanox VMA?
7) comparing with Zing again, I can see that some app threads are
sometimes 'hanging' for ~20-40 msec and this happens quite frequently.
While the server is quite loaded and this can be normal behaviour, with
Zing it's happening much less frequently, but pauses are more
noticeable.
8) are there support options available (IBM?)?
Sorry for such a bunch of questions, but this mailing list is the only
place to find out the answers :)
Regards,
akrus.
_______________________________________________
openj9-dev mailing list
openj9-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe
from this list, visit
https://dev.eclipse.org/mailman/listinfo/openj9-dev
_______________________________________________
openj9-dev mailing list
openj9-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe
from this list, visit
https://dev.eclipse.org/mailman/listinfo/openj9-dev
_______________________________________________
openj9-dev mailing list
openj9-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/openj9-dev