Hi,
Did you already try some of the suggestions mentioned in the last comment here?
Which suggestions do you mean? The Jenkins server upgrade? JNLP settings? volumeMounts?
We did tune our JNLP settings which helped avoid "git OOM" failures (see
https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L127 for what we did so far (JNLP uses -Xmx2048m + memory: "3Gi")). I'm not sure if the JNLP JVM process will actually grow to 2gb but it could and I used 3Gi for the container to ensure that there is enough memory. We might be able to reduce the JNLP -Xmx though.
Regards,
Scott
On 7/29/20 3:16 AM, Alwin Joseph wrote:
>
> On 29/07/20 7:40 am, Scott Marlow wrote:
>>
>>
>> On 7/28/20 3:18 AM, Alwin Joseph wrote:
>>> Hi,
>>>
>>> We are still facing the GF start-domain failure often in our TCK
>>> runs. The failure in one of the suites cause the entire job to be
>>> running for a long time. Has anyone found a solution for the
>>> start-domain issue yet.
>>
>> There are a number of ideas from previous discussions:
>>
>> 1. Work with Eclipse CI administrators to get admin access so we can
>> better explore if something is wrong with our current Container/JVM
>> memory settings (e.g. explore why
>> https://ci.eclipse.org/jakartaee-tck/job/jakartaeetck-nightly-run-master-web/3/
>> has been running for two days).
> jakartaeetck-nightly-run-master-web/3 was running for two days because
> GF failed to start and the archives still tried to deploy. Additionally
> the failed tests are rerun once.
>>
>> 2. Instrument the "GF start-domain" command to handle failure by
>> showing some state of the world stats. I would like to see output of
>> "jps -l". I would also like to see available OS memory. It would be
>> nice to also see available (OS) file handles. Perhaps after failure,
>> if we could try once more to start GlassFish with "--verbose --debug",
>> we might get more interesting output.
> I will try "jps -l" before the GF start-domain & start GF with
> "--verbose --debug".
Thanks Alwin! +1
>>
>> 3. Try to stagger the TCK runs as a workaround (only run Web Profile
>> or Full Platform but not both at the same time).
>>
>> 4. Tune our container/JVM memory settings further.
>>
>> 5. Break up our larger TCK test groups into smaller test groups, with
>> a focus on the tests that seem to get stuck (e.g. JSF currently).
> For now we can kill the job if it is stuck and rerun only those test
> groups in the next run.
>>
>> IMO, #2 might be good to explore if anyone has time to contribute such
>> changes. #1 would be good also.
>>
>> #3 could also help. I think that #4 + #5 are longer term options.
>>
>> Scott
>>
>>>
>>> Regards,
>>> Alwin
>>>
>>> On 12/06/20 1:17 am, arjan tijms wrote:
>>>> Hi,
>>>>
>>>> Indeed, --verbose only logs to the console and will hang the current
>>>> process. It doesn't seem there's a port in use (it would explicitly
>>>> complain about that). Here the startup code just doesn't detect the
>>>> server process to be running. This could mean the detection somehow
>>>> fails, or the process, in fact, doesn't start.
>>>>
>>>> Often if the GF process doesn't start there would be errors in the
>>>> log, but that's not the case. The logs are a little hard to
>>>> retrieve, so possible it would be easier to cat them to the main
>>>> Jenkins log as soon as the script detects failure to start.
>>>>
>>>> Kind regards,
>>>> Arjan
>>>>
>>>>
>>>>
>>>> On Thu, Jun 11, 2020 at 9:16 PM Steve Millidge (Payara)
>>>> <steve.millidge@xxxxxxxxxxx> wrote:
>>>>
>>>> Verbose only switches on console logging it doesn't affect the
>>>> logging level afaik
>>>>
>>>> Sent from Outlook Mobile <https://aka.ms/blhgte>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> *From:* glassfish-dev-bounces@xxxxxxxxxxx
>>>> <mailto:glassfish-dev-bounces@xxxxxxxxxxx>
>>>> <glassfish-dev-bounces@xxxxxxxxxxx
>>>> <mailto:glassfish-dev-bounces@xxxxxxxxxxx>> on behalf of Scott
>>>> Marlow <smarlow@xxxxxxxxxx <mailto:smarlow@xxxxxxxxxx>>
>>>> *Sent:* Thursday, June 11, 2020 8:13:18 PM
>>>> *To:* glassfish developer discussions <glassfish-dev@xxxxxxxxxxx
>>>> <mailto:glassfish-dev@xxxxxxxxxxx>>
>>>> *Subject:* Re: [glassfish-dev] GlassFish status
>>>>
>>>>> Hi,
>>>>>
>>>>> Do you have any idea what it could be? I race condition would
>>>>> be unlikely, since it keeps failing on the same node if
>>>>> repeated. So maybe it's something related to the node, but
>>>>> I'm not sure.
>>>>>
>>>> Would port in use errors show in the console? Or do we need to
>>>> start Glassfish with the --verbose option to see errors like that?
>>>>
>>>>> Kind regards,
>>>>> Arjan Tijms
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 11, 2020 at 7:38 PM Alwin Joseph
>>>>> <alwin.joseph@xxxxxxxxxx <mailto:alwin.joseph@xxxxxxxxxx>>
>>>>> wrote:
>>>>>
>>>>> Hi Arjan,
>>>>>
>>>>> We encountered the same issue with jakartaee-tck platform
>>>>> run too yesterday in couple of the nodes. But it went
>>>>> through in all other >30 nodes.
>>>>>
>>>>> /+ /root/ri/glassfish6/glassfish/bin/asadmin --user admin
>>>>> --passwordfile /root/admin-password.txt start-domain//
>>>>> //Picked up JAVA_TOOL_OPTIONS: -Xmx6G//
>>>>> //Waiting for domain1 to start
>>>>> ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................//
>>>>>
>>>>> //No response from the Domain Administration Server
>>>>> (domain1) after 600 seconds.//
>>>>> //The command is either taking too long to complete or
>>>>> the server has failed.//
>>>>> //Please see the server log files for command status. //
>>>>> //Please start with the --verbose option in order to see
>>>>> early messages.//
>>>>> //Command start-domain failed./
>>>>>
>>>>>
>>>>> There was a stop-domain failure in glassfish CI some time
>>>>> back which was fixed by correcting the docker image.
>>>>>
>>>>> Regards,
>>>>> Alwin
>>>>>
>>>>> On 11/06/20 10:52 pm, arjan tijms wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I just noticed an old issue with the CI has resurfaced.
>>>>>>
>>>>>> Seemingly randomly, GlassFish will fail to start up:
>>>>>>
>>>>>> 12:11:19 ===== TEST RUN - STARTING GLASSFISH AND DB
>>>>>> =====
>>>>>> 12:11:19
>>>>>> 12:11:19 +
>>>>>> /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-23091/glassfish6/glassfish/bin/asadmin
>>>>>>
>>>>>> start-domain
>>>>>> 12:11:19 Picked up JAVA_TOOL_OPTIONS: -Xmx2G
>>>>>> 12:21:20 Waiting for domain1 to start
>>>>>> .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>>>>>>
>>>>>> 12:21:20 No response from the Domain Administration
>>>>>> Server (domain1) after 600 seconds.
>>>>>> 12:21:20 The command is either taking too long to
>>>>>> complete or the server has failed.
>>>>>> 12:21:20 Please see the server log files for command
>>>>>> status.
>>>>>> 12:21:20 Please start with the --verbose option in
>>>>>> order to see early messages.
>>>>>> 12:21:20 Command start-domain failed.
>>>>>>
>>>>>> See:
>>>>>> https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-23091/2/execution/node/96/log/
>>>>>>
>>>>>>
>>>>>> Repeating the script (within the same test run) never
>>>>>> helps. This is automatically done during the test.
>>>>>> However the exact same build does start up on other
>>>>>> nodes.
>>>>>>
>>>>>> Kind regards,
>>>>>> Arjan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 10, 2020 at 4:15 PM arjan tijms
>>>>>> <arjan.tijms@xxxxxxxxx <mailto:arjan.tijms@xxxxxxxxx>>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, Jun 10, 2020 at 2:36 AM
>>>>>> sawamura.hiroki@xxxxxxxxxxx
>>>>>> <mailto:sawamura.hiroki@xxxxxxxxxxx>
>>>>>> <sawamura.hiroki@xxxxxxxxxxx
>>>>>> <mailto:sawamura.hiroki@xxxxxxxxxxx>> wrote:
>>>>>>
>>>>>> - Dropped web_jsp(?):
>>>>>> https://github.com/eclipse-ee4j/glassfish/commit/0dea810da59f757059b1b424fc78060a44461fba#diff-58231b16fdee45a03a4ee3cf94a9f2c3
>>>>>>
>>>>>>
>>>>>>
>>>>>> Good catch! I added it back here:
>>>>>> https://github.com/eclipse-ee4j/glassfish/pull/23090
>>>>>>
>>>>>> Kind regards,
>>>>>> Arjan Tijms
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> glassfish-dev mailing list
>>>>>> glassfish-dev@xxxxxxxxxxx
>>>>>> <mailto:glassfish-dev@xxxxxxxxxxx>
>>>>>> To unsubscribe from this list,
>>>>>> visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>>
>>>> _______________________________________________
>>>> glassfish-dev mailing list
>>>> glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
>>>> To unsubscribe from this list, visit
>>>> https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>
>>>> _______________________________________________
>>>> glassfish-dev mailing list
>>>> glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
>>>> To unsubscribe from this list, visit
>>>> https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> glassfish-dev mailing list
>>>> glassfish-dev@xxxxxxxxxxx
>>>> To unsubscribe from this list,
>>>> visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>
>>> _______________________________________________
>>> glassfish-dev mailing list
>>> glassfish-dev@xxxxxxxxxxx
>>> To unsubscribe from this list, visit
>>> https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>
>>
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev