Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jakartaee-tck-dev] [glassfish-dev] GlassFish status



On Wed, Jul 29, 2020 at 9:44 AM arjan tijms <arjan.tijms@xxxxxxxxx> wrote:
Hi,

Did you already try some of the suggestions mentioned in the last comment here?


No, thanks for pointing this issue out though!  The https://github.com/eclipse-cbi/jiro link is like gold, I can now see our Jenkins config settings via https://github.com/eclipse-cbi/jiro/blob/master/instances/ee4j.jakartaee-tck/target/jenkins/configuration.yml!  :-)

Which suggestions do you mean?  The Jenkins server upgrade?  JNLP settings?  volumeMounts?

We did tune our JNLP settings which helped avoid "git OOM" failures (see https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L127 for what we did so far (JNLP uses -Xmx2048m + memory: "3Gi")).  I'm not sure if the JNLP JVM process will actually grow to 2gb but it could and I used 3Gi for the container to ensure that there is enough memory.  We might be able to reduce the JNLP -Xmx though.

Regards,
Scott



Kind regards,
Arjan

On Wed, Jul 29, 2020 at 2:29 PM Scott Marlow <smarlow@xxxxxxxxxx> wrote:


On 7/29/20 3:16 AM, Alwin Joseph wrote:
>
> On 29/07/20 7:40 am, Scott Marlow wrote:
>>
>>
>> On 7/28/20 3:18 AM, Alwin Joseph wrote:
>>> Hi,
>>>
>>> We are still facing the GF start-domain failure often in our TCK
>>> runs. The failure in one of the suites cause the entire job to be
>>> running for a long time. Has anyone found a solution for the
>>> start-domain issue yet.
>>
>> There are a number of ideas from previous discussions:
>>
>> 1.  Work with Eclipse CI administrators to get admin access so we can
>> better explore if something is wrong with our current Container/JVM
>> memory settings (e.g. explore why
>> https://ci.eclipse.org/jakartaee-tck/job/jakartaeetck-nightly-run-master-web/3/
>> has been running for two days).
> jakartaeetck-nightly-run-master-web/3 was running for two days because
> GF failed to start and the archives still tried to deploy. Additionally
> the failed tests are rerun once.
>>
>> 2.  Instrument the "GF start-domain" command to handle failure by
>> showing some state of the world stats.  I would like to see output of
>> "jps -l".  I would also like to see available OS memory.  It would be
>> nice to also see available (OS) file handles.  Perhaps after failure,
>> if we could try once more to start GlassFish with "--verbose --debug",
>> we might get more interesting output.
> I will try "jps -l" before the GF start-domain & start GF with
> "--verbose --debug".

Thanks Alwin!  +1

>>
>> 3.  Try to stagger the TCK runs as a workaround (only run Web Profile
>> or Full Platform but not both at the same time).
>>
>> 4.  Tune our container/JVM memory settings further.
>>
>> 5.  Break up our larger TCK test groups into smaller test groups, with
>> a focus on the tests that seem to get stuck (e.g. JSF currently).
> For now we can kill the job if it is stuck and rerun only those test
> groups in the next run.
>>
>> IMO, #2 might be good to explore if anyone has time to contribute such
>> changes.  #1 would be good also.
>>
>> #3 could also help.  I think that #4 + #5 are longer term options.
>>
>> Scott
>>
>>>
>>> Regards,
>>> Alwin
>>>
>>> On 12/06/20 1:17 am, arjan tijms wrote:
>>>> Hi,
>>>>
>>>> Indeed, --verbose only logs to the console and will hang the current
>>>> process. It doesn't seem there's a port in use (it would explicitly
>>>> complain about that). Here the startup code just doesn't detect the
>>>> server process to be running. This could mean the detection somehow
>>>> fails, or the process, in fact, doesn't start.
>>>>
>>>> Often if the GF process doesn't start there would be errors in the
>>>> log, but that's not the case. The logs are a little hard to
>>>> retrieve, so possible it would be easier to cat them to the main
>>>> Jenkins log as soon as the script detects failure to start.
>>>>
>>>> Kind regards,
>>>> Arjan
>>>>
>>>>
>>>>
>>>> On Thu, Jun 11, 2020 at 9:16 PM Steve Millidge (Payara)
>>>> <steve.millidge@xxxxxxxxxxx> wrote:
>>>>
>>>>     Verbose only switches on console logging  it doesn't affect the
>>>>     logging level afaik
>>>>
>>>>     Sent from Outlook Mobile <https://aka.ms/blhgte>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>     *From:* glassfish-dev-bounces@xxxxxxxxxxx
>>>>     <mailto:glassfish-dev-bounces@xxxxxxxxxxx>
>>>>     <glassfish-dev-bounces@xxxxxxxxxxx
>>>>     <mailto:glassfish-dev-bounces@xxxxxxxxxxx>> on behalf of Scott
>>>>     Marlow <smarlow@xxxxxxxxxx <mailto:smarlow@xxxxxxxxxx>>
>>>>     *Sent:* Thursday, June 11, 2020 8:13:18 PM
>>>>     *To:* glassfish developer discussions <glassfish-dev@xxxxxxxxxxx
>>>>     <mailto:glassfish-dev@xxxxxxxxxxx>>
>>>>     *Subject:* Re: [glassfish-dev] GlassFish status
>>>>
>>>>>         Hi,
>>>>>
>>>>>         Do you have any idea what it could be? I race condition would
>>>>>         be unlikely, since it keeps failing on the same node if
>>>>>         repeated. So maybe it's something related to the node, but
>>>>>         I'm not sure.
>>>>>
>>>>     Would port in use errors show in the console?  Or do we need to
>>>>     start Glassfish with the --verbose option to see errors like that?
>>>>
>>>>>         Kind regards,
>>>>>         Arjan Tijms
>>>>>
>>>>>
>>>>>
>>>>>         On Thu, Jun 11, 2020 at 7:38 PM Alwin Joseph
>>>>>         <alwin.joseph@xxxxxxxxxx <mailto:alwin.joseph@xxxxxxxxxx>>
>>>>> wrote:
>>>>>
>>>>>             Hi Arjan,
>>>>>
>>>>>             We encountered the same issue with jakartaee-tck platform
>>>>>             run too yesterday in couple of the nodes. But it went
>>>>>             through in all other >30 nodes.
>>>>>
>>>>>             /+ /root/ri/glassfish6/glassfish/bin/asadmin --user admin
>>>>>             --passwordfile /root/admin-password.txt start-domain//
>>>>>             //Picked up JAVA_TOOL_OPTIONS: -Xmx6G//
>>>>>             //Waiting for domain1 to start
>>>>> ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................//
>>>>>
>>>>>             //No response from the Domain Administration Server
>>>>>             (domain1) after 600 seconds.//
>>>>>             //The command is either taking too long to complete or
>>>>>             the server has failed.//
>>>>>             //Please see the server log files for command status. //
>>>>>             //Please start with the --verbose option in order to see
>>>>>             early messages.//
>>>>>             //Command start-domain failed./
>>>>>
>>>>>
>>>>>             There was a stop-domain failure in glassfish CI some time
>>>>>             back which was fixed by correcting the docker image.
>>>>>
>>>>>             Regards,
>>>>>             Alwin
>>>>>
>>>>>             On 11/06/20 10:52 pm, arjan tijms wrote:
>>>>>>             Hi,
>>>>>>
>>>>>>             I just noticed an old issue with the CI has resurfaced.
>>>>>>
>>>>>>             Seemingly randomly, GlassFish will fail to start up:
>>>>>>
>>>>>>             12:11:19  ===== TEST RUN - STARTING GLASSFISH AND DB
>>>>>> =====
>>>>>>             12:11:19
>>>>>>             12:11:19  +
>>>>>> /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-23091/glassfish6/glassfish/bin/asadmin
>>>>>>
>>>>>>             start-domain
>>>>>>             12:11:19  Picked up JAVA_TOOL_OPTIONS: -Xmx2G
>>>>>>             12:21:20  Waiting for domain1 to start
>>>>>> .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>>>>>>
>>>>>>             12:21:20  No response from the Domain Administration
>>>>>>             Server (domain1) after 600 seconds.
>>>>>>             12:21:20  The command is either taking too long to
>>>>>>             complete or the server has failed.
>>>>>>             12:21:20  Please see the server log files for command
>>>>>>             status.
>>>>>>             12:21:20  Please start with the --verbose option in
>>>>>>             order to see early messages.
>>>>>>             12:21:20  Command start-domain failed.
>>>>>>
>>>>>>             See:
>>>>>> https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-23091/2/execution/node/96/log/
>>>>>>
>>>>>>
>>>>>>             Repeating the script (within the same test run) never
>>>>>>             helps. This is automatically done during the test.
>>>>>>             However the exact same build does start up on other
>>>>>> nodes.
>>>>>>
>>>>>>             Kind regards,
>>>>>>             Arjan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             On Wed, Jun 10, 2020 at 4:15 PM arjan tijms
>>>>>>             <arjan.tijms@xxxxxxxxx <mailto:arjan.tijms@xxxxxxxxx>>
>>>>>>             wrote:
>>>>>>
>>>>>>                 Hi,
>>>>>>
>>>>>>                 On Wed, Jun 10, 2020 at 2:36 AM
>>>>>>                 sawamura.hiroki@xxxxxxxxxxx
>>>>>>                 <mailto:sawamura.hiroki@xxxxxxxxxxx>
>>>>>>                 <sawamura.hiroki@xxxxxxxxxxx
>>>>>> <mailto:sawamura.hiroki@xxxxxxxxxxx>> wrote:
>>>>>>
>>>>>>                     - Dropped web_jsp(?):
>>>>>> https://github.com/eclipse-ee4j/glassfish/commit/0dea810da59f757059b1b424fc78060a44461fba#diff-58231b16fdee45a03a4ee3cf94a9f2c3
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 Good catch! I added it back here:
>>>>>> https://github.com/eclipse-ee4j/glassfish/pull/23090
>>>>>>
>>>>>>                 Kind regards,
>>>>>>                 Arjan Tijms
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>>             glassfish-dev mailing list
>>>>>>             glassfish-dev@xxxxxxxxxxx
>>>>>> <mailto:glassfish-dev@xxxxxxxxxxx>
>>>>>>             To unsubscribe from this list,
>>>>>> visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>>
>>>>         _______________________________________________
>>>>         glassfish-dev mailing list
>>>>         glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
>>>>         To unsubscribe from this list, visit
>>>>         https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>
>>>>     _______________________________________________
>>>>     glassfish-dev mailing list
>>>>     glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
>>>>     To unsubscribe from this list, visit
>>>>     https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> glassfish-dev mailing list
>>>> glassfish-dev@xxxxxxxxxxx
>>>> To unsubscribe from this list,
>>>> visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>
>>> _______________________________________________
>>> glassfish-dev mailing list
>>> glassfish-dev@xxxxxxxxxxx
>>> To unsubscribe from this list, visit
>>> https://www.eclipse.org/mailman/listinfo/glassfish-dev
>>>
>>

_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev

Back to the top