Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jakartaee-tck-dev] [glassfish-dev] GlassFish status



On 7/29/20 3:16 AM, Alwin Joseph wrote:

On 29/07/20 7:40 am, Scott Marlow wrote:


On 7/28/20 3:18 AM, Alwin Joseph wrote:
Hi,

We are still facing the GF start-domain failure often in our TCK runs. The failure in one of the suites cause the entire job to be running for a long time. Has anyone found a solution for the start-domain issue yet.

There are a number of ideas from previous discussions:

1.  Work with Eclipse CI administrators to get admin access so we can better explore if something is wrong with our current Container/JVM memory settings (e.g. explore why https://ci.eclipse.org/jakartaee-tck/job/jakartaeetck-nightly-run-master-web/3/ has been running for two days).
jakartaeetck-nightly-run-master-web/3 was running for two days because GF failed to start and the archives still tried to deploy. Additionally the failed tests are rerun once.

2.  Instrument the "GF start-domain" command to handle failure by showing some state of the world stats.  I would like to see output of "jps -l".  I would also like to see available OS memory.  It would be nice to also see available (OS) file handles.  Perhaps after failure, if we could try once more to start GlassFish with "--verbose --debug", we might get more interesting output.
I will try "jps -l" before the GF start-domain & start GF with "--verbose --debug".

Thanks Alwin!  +1


3.  Try to stagger the TCK runs as a workaround (only run Web Profile or Full Platform but not both at the same time).

4.  Tune our container/JVM memory settings further.

5.  Break up our larger TCK test groups into smaller test groups, with a focus on the tests that seem to get stuck (e.g. JSF currently).
For now we can kill the job if it is stuck and rerun only those test groups in the next run.

IMO, #2 might be good to explore if anyone has time to contribute such changes.  #1 would be good also.

#3 could also help.  I think that #4 + #5 are longer term options.

Scott


Regards,
Alwin

On 12/06/20 1:17 am, arjan tijms wrote:
Hi,

Indeed, --verbose only logs to the console and will hang the current process. It doesn't seem there's a port in use (it would explicitly complain about that). Here the startup code just doesn't detect the server process to be running. This could mean the detection somehow fails, or the process, in fact, doesn't start.

Often if the GF process doesn't start there would be errors in the log, but that's not the case. The logs are a little hard to retrieve, so possible it would be easier to cat them to the main Jenkins log as soon as the script detects failure to start.

Kind regards,
Arjan



On Thu, Jun 11, 2020 at 9:16 PM Steve Millidge (Payara) <steve.millidge@xxxxxxxxxxx> wrote:

    Verbose only switches on console logging  it doesn't affect the
    logging level afaik

    Sent from Outlook Mobile <https://aka.ms/blhgte>

------------------------------------------------------------------------
    *From:* glassfish-dev-bounces@xxxxxxxxxxx
    <mailto:glassfish-dev-bounces@xxxxxxxxxxx>
    <glassfish-dev-bounces@xxxxxxxxxxx
    <mailto:glassfish-dev-bounces@xxxxxxxxxxx>> on behalf of Scott
    Marlow <smarlow@xxxxxxxxxx <mailto:smarlow@xxxxxxxxxx>>
    *Sent:* Thursday, June 11, 2020 8:13:18 PM
    *To:* glassfish developer discussions <glassfish-dev@xxxxxxxxxxx
    <mailto:glassfish-dev@xxxxxxxxxxx>>
    *Subject:* Re: [glassfish-dev] GlassFish status

        Hi,

        Do you have any idea what it could be? I race condition would
        be unlikely, since it keeps failing on the same node if
        repeated. So maybe it's something related to the node, but
        I'm not sure.

    Would port in use errors show in the console?  Or do we need to
    start Glassfish with the --verbose option to see errors like that?

        Kind regards,
        Arjan Tijms



        On Thu, Jun 11, 2020 at 7:38 PM Alwin Joseph
        <alwin.joseph@xxxxxxxxxx <mailto:alwin.joseph@xxxxxxxxxx>> wrote:

            Hi Arjan,

            We encountered the same issue with jakartaee-tck platform
            run too yesterday in couple of the nodes. But it went
            through in all other >30 nodes.

            /+ /root/ri/glassfish6/glassfish/bin/asadmin --user admin
            --passwordfile /root/admin-password.txt start-domain//
            //Picked up JAVA_TOOL_OPTIONS: -Xmx6G//
            //Waiting for domain1 to start
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................//
            //No response from the Domain Administration Server
            (domain1) after 600 seconds.//
            //The command is either taking too long to complete or
            the server has failed.//
            //Please see the server log files for command status. //
            //Please start with the --verbose option in order to see
            early messages.//
            //Command start-domain failed./


            There was a stop-domain failure in glassfish CI some time
            back which was fixed by correcting the docker image.

            Regards,
            Alwin

            On 11/06/20 10:52 pm, arjan tijms wrote:
            Hi,

            I just noticed an old issue with the CI has resurfaced.

            Seemingly randomly, GlassFish will fail to start up:

            12:11:19  ===== TEST RUN - STARTING GLASSFISH AND DB =====
            12:11:19
            12:11:19  +
/home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-23091/glassfish6/glassfish/bin/asadmin
            start-domain
            12:11:19  Picked up JAVA_TOOL_OPTIONS: -Xmx2G
            12:21:20  Waiting for domain1 to start
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
            12:21:20  No response from the Domain Administration
            Server (domain1) after 600 seconds.
            12:21:20  The command is either taking too long to
            complete or the server has failed.
            12:21:20  Please see the server log files for command
            status.
            12:21:20  Please start with the --verbose option in
            order to see early messages.
            12:21:20  Command start-domain failed.

            See:
https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-23091/2/execution/node/96/log/

            Repeating the script (within the same test run) never
            helps. This is automatically done during the test.
            However the exact same build does start up on other nodes.

            Kind regards,
            Arjan








            On Wed, Jun 10, 2020 at 4:15 PM arjan tijms
            <arjan.tijms@xxxxxxxxx <mailto:arjan.tijms@xxxxxxxxx>>
            wrote:

                Hi,

                On Wed, Jun 10, 2020 at 2:36 AM
                sawamura.hiroki@xxxxxxxxxxx
                <mailto:sawamura.hiroki@xxxxxxxxxxx>
                <sawamura.hiroki@xxxxxxxxxxx
<mailto:sawamura.hiroki@xxxxxxxxxxx>> wrote:

                    - Dropped web_jsp(?):
https://github.com/eclipse-ee4j/glassfish/commit/0dea810da59f757059b1b424fc78060a44461fba#diff-58231b16fdee45a03a4ee3cf94a9f2c3


                Good catch! I added it back here:
https://github.com/eclipse-ee4j/glassfish/pull/23090

                Kind regards,
                Arjan Tijms



_______________________________________________
            glassfish-dev mailing list
            glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>             To unsubscribe from this list, visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev

        _______________________________________________
        glassfish-dev mailing list
        glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
        To unsubscribe from this list, visit
        https://www.eclipse.org/mailman/listinfo/glassfish-dev

    _______________________________________________
    glassfish-dev mailing list
    glassfish-dev@xxxxxxxxxxx <mailto:glassfish-dev@xxxxxxxxxxx>
    To unsubscribe from this list, visit
    https://www.eclipse.org/mailman/listinfo/glassfish-dev


_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visithttps://www.eclipse.org/mailman/listinfo/glassfish-dev

_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev





Back to the top