Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J wor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems

From: Scott Marlow <smarlow@xxxxxxxxxx>
Date: Thu, 15 Oct 2020 21:58:14 -0400
Delivered-to: jakartaee-tck-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/jakartaee-tck-dev>
List-help: <mailto:jakartaee-tck-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/jakartaee-tck-dev>, <mailto:jakartaee-tck-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/jakartaee-tck-dev>, <mailto:jakartaee-tck-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1

I startedhttps://ci.eclipse.org/jakartaee-tck/job/jakartaee-tck-scottmarlow/job/glassfish_start_twice/2/running with the workaround suggested onhttps://github.com/eclipse-ee4j/glassfish/issues/23191#issuecomment-708977368

The workaround changes are added to docker/run_jakartaeetck.sh inhttps://github.com/scottmarlow/jakartaee-tck/tree/glassfish_start_twice.

Basically, the workaround moves the monitoring-core.jar out of GlassFishand starts up GlassFish which will fail but it somehow works around thestartup failure that we occasionally see on Jenkins and always see on mylocal computer. After stopping GlassFish, we will then move themonitoring-core.jar file back into place for the next startup of GlassFish.

If this test works, we can decide if we want to move the same changeinto the Standalone TCK scripts that startup GlassFish as well.


Scott

On 10/1/20 12:04 PM, arjan tijms wrote:

Hi,

A few remarks here: just repeating the startup of GF doesn't seem towork. The test jobs for the GF build themselves already do that. Itrepeats the entire test 3 times. I've never ever seen it work after thesecond or third repeat.


Yet, starting the entire process does get it to pass.

This leads me to believe it's something in the Kubernetes pod and/orDocker container that gets corrupted. Once this corruption happens, GFwill always fail to startup.

Please see this issue about this:https://bugs.eclipse.org/bugs/show_bug.cgi?id=561229<https://bugs.eclipse.org/bugs/show_bug.cgi?id=561229>

It discusses the problem from the Jenkins/Kubernetes point of view.There's a few odd things in the Kubernetes setup which may have to belooked at first.


Kind regards,
Arjan Tijms

On Thu, Oct 1, 2020 at 5:41 PM Scott Marlow <smarlow@xxxxxxxxxx<mailto:smarlow@xxxxxxxxxx>> wrote:




    On 10/1/20 10:48 AM, Ed Bratt wrote:
     > I wasn't intending to point any fingers at the stability
    observations
     > you made -- only to observe that we want to focus on improving the
     > reliability so that we don't need to rely on re-runs, or waits,
    or other
     > symptomatic only type fixes.

    I agree 100%!  I don't think that we need to make the memory tuning
    changes just yet, as they will not improve reliability.

     > One day, I'd like to see that we can
     > initiate several test runs simultaneously -- and they all reliably
     > complete -- perhaps taking more clock-time, but reliably and
     > consistently repeating.

    +1

     >
     > Ideally, we could fill this compute pipeline with running and
    waiting
     > tasks and be confident that the system will reliably produce,
    consistent
     > results. It sounds like, for some unknown cause, we aren't there
    yet.
     > Getting to the root cause of this would be my priority (and we don't
     > know if that's a GlassFish issue, an infrastructure issue or even
     > something else). But, I'm not actually doing the work so, it's
    just my
     > opinion.

    +1

    Part of the original discussion related to
    https://github.com/eclipse-ee4j/glassfish/issues/23191
    <https://github.com/eclipse-ee4j/glassfish/issues/23191> suggested
    adding
    a --verbose --debug options which I tried on my local machine (as I can
    reproduce a similar symptom that could be the same as on CI), I would
    see `org.glassfish.flashlight.MonitoringRuntimeDataRegistry not
    found by
    org.glassfish.main.admin.monitoring-core` as shown on
    https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa
    <https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa>
    (which has the full server output including error exceptions).

    I'm thinking that we could try to reproduce the failure on Jenkins with
    the --verbose --debug options so that we see the actual cause on
    Jenkins.  Perhaps it will be the same exception call stack as I see
    locally.

    I don't know the code involved in the
    https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa
    <https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa>
    exception call stack enough to debug and find the cause.  If you open
    this gist, please search for `java.lang.ClassNotFoundException:
    org.glassfish.flashlight.MonitoringRuntimeDataRegistry not found by
    org.glassfish.main.admin.monitoring-core` which kind of seemed like a
    startup race condition but I have no idea really.

    Scott

    _______________________________________________
    glassfish-dev mailing list
    glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
    To unsubscribe from this list, visit
    https://www.eclipse.org/mailman/listinfo/glassfish-dev
    <https://www.eclipse.org/mailman/listinfo/glassfish-dev>


_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev

References:
- [jakartaee-tck-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Ed Bratt
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Scott Marlow
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Scott Marlow
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Ed Bratt
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Scott Marlow
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Ed Bratt
- Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Scott Marlow

Prev by Date: Re: [jakartaee-tck-dev] [glassfish-dev] GlassFish Passing the TCK
Next by Date: Re: [jakartaee-tck-dev] What is the expected change for EJB tests that are using org.omg.CORBA.ORB?
Previous by thread: Re: [jakartaee-tck-dev] [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
Next by thread: [jakartaee-tck-dev] How to map tests to specs
Index(es):
- Date
- Thread

Breadcrumbs