Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jakartaee-tck-dev] Being green and reducing our Jenkins CI usage...

One thing to consider,

Some TCK tests historically have instabilities. Typically these are tests where a certain action is supposed to happen within some time-frame. When the test system was heavily loaded, these tests might fail more often. On lightly loaded systems they'd pass all the time. We wouldn't want to set the wait-time too long because that makes discovering a real failure take longer. I know that these have been improved over the years but there may still be tests in this stability category.

Until tests like these are fully resolved, we may need to utilized some calendar driven kick-off. Certainly we can be judicious about that kind of usage and, likely, once the TCKs start to become more stable, the general usage will hopefully start to stabilize.

On a separate front, I have asked Eclipse Admins if they can provide us with information about our actual usage so that we can start to learn more about how the various build and test jobs are actually using compute resources at OpenShift. Hopefully, we'll start to get some better usage data so that we can properly assess and budget for what we need, without overly impacting our feature development efforts.

Cheers,

-- Ed

On 7/10/2020 7:34 AM, Scott Marlow wrote:


On 7/10/20 9:12 AM, Alwin Joseph wrote:

On 09/07/20 9:08 pm, Scott Marlow wrote:


On 7/7/20 2:05 PM, Scott Marlow wrote:
Hi,

I heard some general feedback that we are consuming more resources than expected with our jakartaee-tck environment. I don't have specifics at the moment but am thinking that we could update the jakartaeetck-nightly-build-master [1] job to only run if a change was merged to the TCK repo [2].  Any volunteers from the committers to work on this change?  Only the committers have permissions to update the Jenkins job [1] to poll the SCM (git).

Just to understand, does "consuming more resources" means we are running jobs more often or we are using more storage?

I think that it is more that we are running jobs more often.  I think that we will still focus on releasing EE 9 and using the resources that we need to accomplish that, without focusing on reducing our Jenkins usage (as Ed mentioned, we should focus on EE 9 but if we do make improvements that is fine also).

I did modify jakartaeetck-nightly-build-master [1] to clone the [2] git repo, poll the SCM repo for changes and the duration was faster than the previous run, so it seems that cloning the git repo is okay.  Note that I did clone the "master" repo which matches the job name.

The net result is that on days that we do not merge code changes to the TCK repo, we will not run the (~2 hour) jakartaeetck-nightly-build-master [1] job.

Another improvement that we made recently is adding tests parameter to https://ci.eclipse.org/jakartaee-tck/job/jakartaeetck-nightly-run-master, so that we can run a small set of tests to verify a fix.


Ideas are welcome :)

The jakartaeetck-nightly-build-master [1] currently runs every night, regardless of whether there are any changes made to the Platform TCK repo.  We could update jakartaeetck-nightly-build-master [1] to reference the jakartaee-tck git repo [2], however that adds the additional cost of cloning the git repo (such that building the Platform TCK takes even more time versus some time savings on days when there are no TCK code changes).

I'm leaning towards trying to add the git repo [2] to the jakartaeetck-nightly-build-master [1] job, so that we reduce the overall time consumed.
I think using git hooks could solve this, to trigger the job when there is a commit. But above idea could be better to run it as per the schedule.

Thanks, I agree that a git hook could solve this also.

Scott


Scott


I also have been thinking about further changes to the JNLP memory settings in [3].  More specifically, we could try reducing the -Xmx2048m setting to a lower value that is still higher than the -Xmx512m that we previously used, it is guesswork mostly.  If we switched to -Xmx1024m, we likely will use more cpu (e.g. running more frequent GCs) but that might free up other VM/OS level memory for kernel use.  We would need to measure the before/after result of such a change.  If the TCK runs are faster with this change, we would it keep it.  If the TCK runs are slower or have more test failures, we would revert it.
I suggest we check with eclipse infra to get the total number of agents/resources available for jakartaee-tck CI. Agree that it is difficult to do adjustments here as it needs more trial-error way to figure out the right memory/cpu configuration.

Scott

[1] https://ci.eclipse.org/jakartaee-tck/job/jakartaeetck-nightly-build-master/

[2] https://github.com/eclipse-ee4j/jakartaee-tck

[3] https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L130

_______________________________________________
jakartaee-tck-dev mailing list
jakartaee-tck-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/jakartaee-tck-dev


_______________________________________________
jakartaee-tck-dev mailing list
jakartaee-tck-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/jakartaee-tck-dev


Back to the top