Re: [openj9-dev] Performance questions for low-latency app

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [openj9-dev] Performance questions for low-latency app

From: "Aleksandar Micic" <Aleksandar_Micic@xxxxxxxxxx>
Date: Thu, 21 Jun 2018 09:59:14 -0400
Delivered-to: openj9-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/openj9-dev>
List-help: <mailto:openj9-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/openj9-dev>, <mailto:openj9-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/openj9-dev>, <mailto:openj9-dev-request@eclipse.org?subject=unsubscribe>

I don't have a specific region size to recommend for any of the heap sizes. I'm just, in general, saying that with very large heaps, it's worth to try increasing the region size. We would welcome some feedback on this; if/how region size affected 1) cycle duration (in terms of quantas or secs) and 2) heap utilisation (could be expressed as cycle frequency too). With some real life data on such huge heaps, we could add a heuristic, that would make region size a function of the heap size, so that we minimise the need for this tuning.

akrus ---06/21/2018 07:51:19 AM---Thank you everyone for your comments! I'll try some changes there and tell how it goes. We have heap

From: akrus <akrus@xxxxxxxxxxx>
To: openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
Date: 06/21/2018 07:51 AM
Subject: Re: [openj9-dev] Performance questions for low-latency app
Sent by: openj9-dev-bounces@xxxxxxxxxxx

Thank you everyone for your comments!

I'll try some changes there and tell how it goes. We have heaps of 500G+ on some servers, does the recommendation for regionSize stay the same?

On Thu, 2018-06-21 at 01:04 -0400, Mark Stoodley wrote:

akrus asked:
>>

5) I can see that Java memory usage never exceeds 50% of Xmx value, is it supposed to be like that? Is there any flag to change this?

Aleks answered:
>

Controls the GC cycle trigger point expressed as heap occupancy. Default is 50% of heap, expressed in bytes. Larger value will delay trigger, hence increase heap utilisation/reduce GC frequency, but at the cost of increasing probability of not being able to finish the GC cycle incrementally. Since it's a static trigger, the worst case scenario of allocation rate, and live set size should be accounted when overriding this value.

I'll just add a few details that may help to understand why Metronome has this kind of option and behaviour: Metronome is a snapshot-at-beginning collector (unlike any of our other collectors, I believe), which means Metronome will only collect memory that was garbage when the GC cycle began. It also means you need to have sufficient free space on the heap to be able to outlast most of the GC cycle if you want to avoid a STW pause (because freed memory only becomes available to the application to service allocations near the end of the GC cycle).

"GC cycle" here, refers to the full process of collecting the entire heap, which may require many many individual GC pauses (which we cryptically tend to call "quanta"), especially if you're tuning for a small pause time (you mentioned 1ms). Another perhaps subtle implication of the "snapshot at beginning" approach is that the longer the GC cycle stretches out (i.e. the more pauses you need to complete the full GC cycle), the more memory the application can allocate during a GC cycle (see Aleks' comment about "allocation rate" which applies here :) ) and that memory cannot be reclaimed until near the end of the *next* GC cycle (remember: snapshot at *beginning*!). Specifying a high utilization for the application (80% versus the default 70%) has the same "lengthening" effect on the GC cycle time because not as many pauses can be scheduled in any given time window), which may also warrant a larger heap size / lower trigger value simply to buy more headroom for the application to allocate memory while the collector is operating.

I hope that was useful, but if I didn't make sense just keep asking questions and we'll straighten it out (and generate lots of words to seed better open documentation as well :) ).



Mark Stoodley		8200 Warden Avenue
Senior Software Developer		Markham, L6G 1C7
IBM Runtime Technologies		Canada
Phone:	+1-905-413-5831
e-mail:	mstoodle@xxxxxxxxxx
We cannot solve our problems with the same thinking we used when we created them - Albert Einstein

From:

"Aleksandar Micic" <Aleksandar_Micic@xxxxxxxxxx>

To:

openj9 developer discussions <openj9-dev@xxxxxxxxxxx>

Date:

2018/06/20 02:54 PM

Subject:

Re: [openj9-dev] Performance questions for low-latency app

Sent by:

openj9-dev-bounces@xxxxxxxxxxx

1)

-Xgc:

targetUtilization=

70

Controls mutator vs GC (default 70 vs 30) CPU utilization, while GC is in active cycle. With higher numbers you can ensure higher throughput of the application during an active GC cycle, but at a cost of higher heap pressure (more frequent GCs, hence lower average throuhgput across a larger period of time with several GC cycles). For example, value of 85 in theory should give you approximately 20% (85/70 = 1.2x) better throughput while GC is active, but it will increase the duration of GC cycles by factor of (100-70)/(100-85) = 2x, so one will need larger heap to ensure a GC cycle finished before it runs of memory and switches to non-incremental mode (so called syncGC), hence with larger pauses.

-Xgc:regionSize=65536

With very large heaps (which indeed seems to be a case here) one can increase the regions size to reduce region related processing overhead (which could somewhat reduce GC cycle time). For example, it's reasonable to try 4x larger than default (64K). Regions size will be rounded to the next lower power of 2. The downside of choosing too large region size is increased heap fragmentation (lower heap utilisation)

Some more info (in addition to verbose GC) about heap occupancy and other meta deta, one can get with -Xtgc:heap.

5)

-XXgc:trigger=

Controls the GC cycle trigger point expressed as heap occupancy. Default is 50% of heap, expressed in bytes. Larger value will delay trigger, hence increase heap utilisation/reduce GC frequency, but at the cost of increasing probability of not being able to finish the GC cycle incrementally. Since it's a static trigger, the worst case scenario of allocation rate, and live set size should be accounted when overriding this value.

Regards,
Aleks

akrus ---06/20/2018 04:10:02 AM---Hello everyone! I was able to switch one of our app instances to OpenJ9 finally and it

From:

akrus <akrus@xxxxxxxxxxx>

To:

openj9 developer discussions <openj9-dev@xxxxxxxxxxx>

Date:

06/20/2018 04:10 AM

Subject:

[openj9-dev] Performance questions for low-latency app

Sent by:

openj9-dev-bounces@xxxxxxxxxxx

Hello everyone! I was able to switch one of our app instances to OpenJ9 finally and it works good so far. It was running on Azul Zing before, so I can compare what we have now and what we had before. There are couple of questions: 1) I have the following flags set so far: -Xmx200g -Xshareclasses:name=myapp -Xgcpolicy:metronome -Xgc:targetPauseTime=1 Are there any other flags to consider for low-latency app? 2) I've tried enabling hugepages support for Java heap, but it was complaining that such amount of memory is unavailable (which is not true). I suppose that this can happen because hugepages were enabled on a working system so it's advised in docs somewhere to reboot the machine to fix this. Is anyone using hugepages for heap? Is there any performance comparison for OpenJ9 with hugepages and without? 3) there is CUDA support available. Although we have no nVidia cards on the servers, is it worth adding one? Is there any performance comparison for this as well? 4) I've tried switching JIT optLevel to scorching and this is consuming too much CPU. Is it worth changing default JIT level to any other? 5) I can see that Java memory usage never exceeds 50% of Xmx value, is it supposed to be like that? Is there any flag to change this? 6) OpenJ9 supports RDMA, but there is no -Dcom.ibm.net.rdma.conf configuration example I found. As I can see this was tested with Mellanox cards, did anyone test Mellanox VMA? 7) comparing with Zing again, I can see that some app threads are sometimes 'hanging' for ~20-40 msec and this happens quite frequently. While the server is quite loaded and this can be normal behaviour, with Zing it's happening much less frequently, but pauses are more noticeable. 8) are there support options available (IBM?)? Sorry for such a bunch of questions, but this mailing list is the only place to find out the answers :) Regards, akrus. _______________________________________________ openj9-dev mailing list openj9-dev@xxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://dev.eclipse.org/mailman/listinfo/openj9-dev

_______________________________________________ openj9-dev mailing list openj9-dev@xxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://dev.eclipse.org/mailman/listinfo/openj9-dev

_______________________________________________

openj9-dev mailing list

openj9-dev@xxxxxxxxxxx

To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://dev.eclipse.org/mailman/listinfo/openj9-dev

_______________________________________________ openj9-dev mailing list openj9-dev@xxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://dev.eclipse.org/mailman/listinfo/openj9-dev

References:
- [openj9-dev] Performance questions for low-latency app
  - From: akrus
- Re: [openj9-dev] Performance questions for low-latency app
  - From: Aleksandar Micic
- Re: [openj9-dev] Performance questions for low-latency app
  - From: Mark Stoodley
- Re: [openj9-dev] Performance questions for low-latency app
  - From: akrus

Prev by Date: Re: [openj9-dev] Performance questions for low-latency app
Next by Date: Re: [openj9-dev] Performance questions for low-latency app
Previous by thread: Re: [openj9-dev] Performance questions for low-latency app
Next by thread: Re: [openj9-dev] Performance questions for low-latency app
Index(es):
- Date
- Thread

Breadcrumbs