Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [openj9-dev] Performance questions for low-latency app

I don't have a specific region size to recommend for any of the heap sizes. I'm just, in general, saying that with very large heaps, it's worth to try increasing the region size. We would welcome some feedback on this; if/how region size affected 1) cycle duration (in terms of quantas or secs) and 2) heap utilisation (could be expressed as cycle frequency too). With some real life data on such huge heaps, we could add a heuristic, that would make region size a function of the heap size, so that we minimise the need for this tuning.


Inactive hide details for akrus ---06/21/2018 07:51:19 AM---Thank you everyone for your comments! I'll try some changes there aakrus ---06/21/2018 07:51:19 AM---Thank you everyone for your comments! I'll try some changes there and tell how it goes. We have heap

From: akrus <akrus@xxxxxxxxxxx>
To: openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
Date: 06/21/2018 07:51 AM
Subject: Re: [openj9-dev] Performance questions for low-latency app
Sent by: openj9-dev-bounces@xxxxxxxxxxx





Thank you everyone for your comments!

I'll try some changes there and tell how it goes. We have heaps of 500G+ on some servers, does the recommendation for regionSize stay the same?

On Thu, 2018-06-21 at 01:04 -0400, Mark Stoodley wrote:
    akrus asked:
    >>
    5) I can see that Java memory usage never exceeds 50% of Xmx value, is
    it supposed to be like that? Is there any flag to change this?


    Aleks answered:
    >
    Controls the GC cycle trigger point expressed as heap occupancy. Default is 50% of heap, expressed in bytes. Larger value will delay trigger, hence increase heap utilisation/reduce GC frequency, but at the cost of increasing probability of not being able to finish the GC cycle incrementally. Since it's a static trigger, the worst case scenario of allocation rate, and live set size should be accounted when overriding this value.

    I'll just add a few details that may help to understand why Metronome has this kind of option and behaviour: Metronome is a snapshot-at-beginning collector (unlike any of our other collectors, I believe), which means Metronome will only collect memory that was garbage when the GC cycle began. It also means you need to have sufficient free space on the heap to be able to outlast most of the GC cycle if you want to avoid a STW pause (because freed memory only becomes available to the application to service allocations near the end of the GC cycle).


    "GC cycle" here, refers to the full process of collecting the entire heap, which may require many many individual GC pauses (which we cryptically tend to call "quanta"), especially if you're tuning for a small pause time (you mentioned 1ms). Another perhaps subtle implication of the "snapshot at beginning" approach is that the longer the GC cycle stretches out (i.e. the more pauses you need to complete the full GC cycle), the more memory the application can allocate during a GC cycle (see Aleks' comment about "allocation rate" which applies here :) ) and that memory cannot be reclaimed until near the end of the *next* GC cycle (remember: snapshot at *beginning*!). Specifying a high utilization for the application (80% versus the default 70%) has the same "lengthening" effect on the GC cycle time because not as many pauses can be scheduled in any given time window), which may also warrant a larger heap size / lower trigger value simply to buy more headroom for the application to allocate memory while the collector is operating.


    I hope that was useful, but if I didn't make sense just keep asking questions and we'll straighten it out (and generate lots of words to seed better open documentation as well :) ).

    Mark Stoodley 8200 Warden Avenue
    Senior Software Developer Markham, L6G 1C7
    IBM Runtime Technologies Canada
    Phone:+1-905-413-5831
    e-mail:mstoodle@xxxxxxxxxx

    We cannot solve our problems with the same thinking we used when we created them - Albert Einstein






    From:
    "Aleksandar Micic" <Aleksandar_Micic@xxxxxxxxxx>
    To:
    openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
    Date:
    2018/06/20 02:54 PM
    Subject:
    Re: [openj9-dev] Performance questions for low-latency app
    Sent by:
    openj9-dev-bounces@xxxxxxxxxxx




    1)

    -Xgc:
    targetUtilization=70

    Controls mutator vs GC (default 70 vs 30) CPU utilization, while GC is in active cycle. With higher numbers you can ensure higher throughput of the application during an active GC cycle, but at a cost of higher heap pressure (more frequent GCs, hence lower average throuhgput across a larger period of time with several GC cycles). For example, value of 85 in theory should give you approximately 20% (85/70 = 1.2x) better throughput while GC is active, but it will increase the duration of GC cycles by factor of (100-70)/(100-85) = 2x, so one will need larger heap to ensure a GC cycle finished before it runs of memory and switches to non-incremental mode (so called syncGC), hence with larger pauses.


    -Xgc:regionSize=65536


    With very large heaps (which indeed seems to be a case here) one can increase the regions size to reduce region related processing overhead (which could somewhat reduce GC cycle time). For example, it's reasonable to try 4x larger than default (64K). Regions size will be rounded to the next lower power of 2. The downside of choosing too large region size is increased heap fragmentation (lower heap utilisation)

    Some more info (in addition to verbose GC) about heap occupancy and other meta deta, one can get with -Xtgc:heap.

    5)

    -XXgc:trigger=

    Controls the GC cycle trigger point expressed as heap occupancy. Default is 50% of heap, expressed in bytes. Larger value will delay trigger, hence increase heap utilisation/reduce GC frequency, but at the cost of increasing probability of not being able to finish the GC cycle incrementally. Since it's a static trigger, the worst case scenario of allocation rate, and live set size should be accounted when overriding this value.



    Regards,
    Aleks



    akrus ---06/20/2018 04:10:02 AM---Hello everyone! I was able to switch one of our app instances to OpenJ9 finally and it

    From:
    akrus <akrus@xxxxxxxxxxx>
    To:
    openj9 developer discussions <openj9-dev@xxxxxxxxxxx>
    Date:
    06/20/2018 04:10 AM
    Subject:
    [openj9-dev] Performance questions for low-latency app
    Sent by:
    openj9-dev-bounces@xxxxxxxxxxx




    Hello everyone!

    I was able to switch one of our app instances to OpenJ9 finally and it
    works good so far. It was running on Azul Zing before, so I can compare
    what we have now and what we had before.

    There are couple of questions:
    1) I have the following flags set so far: -Xmx200g
    -Xshareclasses:name=myapp -Xgcpolicy:metronome -Xgc:targetPauseTime=1
    Are there any other flags to consider for low-latency app?

    2) I've tried enabling hugepages support for Java heap, but it was
    complaining that such amount of memory is unavailable (which is not
    true). I suppose that this can happen because hugepages were enabled on
    a working system so it's advised in docs somewhere to reboot the
    machine to fix this. Is anyone using hugepages for heap? Is there any
    performance comparison for OpenJ9 with hugepages and without?

    3) there is CUDA support available. Although we have no nVidia cards on
    the servers, is it worth adding one? Is there any performance
    comparison for this as well?

    4) I've tried switching JIT optLevel to scorching and this is consuming
    too much CPU. Is it worth changing default JIT level to any other?

    5) I can see that Java memory usage never exceeds 50% of Xmx value, is
    it supposed to be like that? Is there any flag to change this?

    6) OpenJ9 supports RDMA, but there is no -Dcom.ibm.net.rdma.conf
    configuration example I found. As I can see this was tested with
    Mellanox cards, did anyone test Mellanox VMA?

    7) comparing with Zing again, I can see that some app threads are
    sometimes 'hanging' for ~20-40 msec and this happens quite frequently.
    While the server is quite loaded and this can be normal behaviour, with
    Zing it's happening much less frequently, but pauses are more
    noticeable.

    8) are there support options available (IBM?)?

    Sorry for such a bunch of questions, but this mailing list is the only
    place to find out the answers :)

    Regards,
    akrus.
    _______________________________________________
    openj9-dev mailing list
    openj9-dev@xxxxxxxxxxx
    To change your delivery options, retrieve your password, or unsubscribe from this list, visit

    https://dev.eclipse.org/mailman/listinfo/openj9-dev



    _______________________________________________
    openj9-dev mailing list
    openj9-dev@xxxxxxxxxxx
    To change your delivery options, retrieve your password, or unsubscribe from this list, visit

    https://dev.eclipse.org/mailman/listinfo/openj9-dev



    _______________________________________________
    openj9-dev mailing list
    openj9-dev@xxxxxxxxxxx
    To change your delivery options, retrieve your password, or unsubscribe from this list, visit
    https://dev.eclipse.org/mailman/listinfo/openj9-dev
    _______________________________________________
    openj9-dev mailing list
    openj9-dev@xxxxxxxxxxx
    To change your delivery options, retrieve your password, or unsubscribe from this list, visit
    https://dev.eclipse.org/mailman/listinfo/openj9-dev




Back to the top