[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
|
Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
|
On 10/1/20 12:04 PM, arjan tijms wrote:
Hi,
A few remarks here: just repeating the startup of GF doesn't seem to
work. The test jobs for the GF build themselves already do that. It
repeats the entire test 3 times. I've never ever seen it work after the
second or third repeat.
Are you seeing the same
https://github.com/eclipse-ee4j/glassfish/issues/23191 (GF server.log
contains `Server shutdown initiated`)?
Have you tried adding --verbose --debug optional to GF which makes for a
much larger server.log but the failure that causes GF to shutdown
immediately during startup should be logged.
Perhaps it would help to add the --verbose --debug options only in the
GF retry loop, so that the 2nd + 3rd time that we run GF, the cause of
failure is shown on the console + server.log.
IMO, we really need the GF exception that causes the failure.
Yet, starting the entire process does get it to pass.
This leads me to believe it's something in the Kubernetes pod and/or
Docker container that gets corrupted. Once this corruption happens, GF
will always fail to startup.
Please see this issue about this:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=561229
I do agree that your JNLP is too small, which we also experienced on the
Platform TCK CI (we saw git terminating with the exit code that means an
OOM condition occurred.) We made our JNLP too big and will probably
reduce it down to a smaller number
(https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L131).
The too big JNLP size doesn't cause any failures though, so it is okay
for now.
It discusses the problem from the Jenkins/Kubernetes point of view.
There's a few odd things in the Kubernetes setup which may have to be
looked at first.
Thanks for communicating this here.
Scott
Kind regards,
Arjan Tijms
On Thu, Oct 1, 2020 at 5:41 PM Scott Marlow <smarlow@xxxxxxxxxx
<mailto:smarlow@xxxxxxxxxx>> wrote:
On 10/1/20 10:48 AM, Ed Bratt wrote:
> I wasn't intending to point any fingers at the stability
observations
> you made -- only to observe that we want to focus on improving the
> reliability so that we don't need to rely on re-runs, or waits,
or other
> symptomatic only type fixes.
I agree 100%! I don't think that we need to make the memory tuning
changes just yet, as they will not improve reliability.
> One day, I'd like to see that we can
> initiate several test runs simultaneously -- and they all reliably
> complete -- perhaps taking more clock-time, but reliably and
> consistently repeating.
+1
>
> Ideally, we could fill this compute pipeline with running and
waiting
> tasks and be confident that the system will reliably produce,
consistent
> results. It sounds like, for some unknown cause, we aren't there
yet.
> Getting to the root cause of this would be my priority (and we don't
> know if that's a GlassFish issue, an infrastructure issue or even
> something else). But, I'm not actually doing the work so, it's
just my
> opinion.
+1
Part of the original discussion related to
https://github.com/eclipse-ee4j/glassfish/issues/23191 suggested adding
a --verbose --debug options which I tried on my local machine (as I can
reproduce a similar symptom that could be the same as on CI), I would
see `org.glassfish.flashlight.MonitoringRuntimeDataRegistry not
found by
org.glassfish.main.admin.monitoring-core` as shown on
https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa
(which has the full server output including error exceptions).
I'm thinking that we could try to reproduce the failure on Jenkins with
the --verbose --debug options so that we see the actual cause on
Jenkins. Perhaps it will be the same exception call stack as I see
locally.
I don't know the code involved in the
https://gist.github.com/scottmarlow/eec11ca74b99d021346b270fa29ce4fa
exception call stack enough to debug and find the cause. If you open
this gist, please search for `java.lang.ClassNotFoundException:
org.glassfish.flashlight.MonitoringRuntimeDataRegistry not found by
org.glassfish.main.admin.monitoring-core` which kind of seemed like a
startup race condition but I have no idea really.
Scott
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
To unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/glassfish-dev
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev