Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [] Compute cluster instability


In addition to the obvious offline problems shown on I see very variable job start delays. It is almost as if there is a 25% chance that a selected node will work followed by a two minute timeout before another try can occur. I have seen one job take nearly an hour to start.

Please take a moment to Subscribe to the Eclipse Status page at

Unfortunately there is no deep monitoring of the CI so on one day when I was seeing slow jobs, the summary was 100% perfect.

Isn't it tiome to revert the last 'improvment'?


        Edward Willink

On 26/02/2024 14:00, Denis Roy via wrote:

Greetings everyone,

If you're operating a Jenkins instance at (Jiro), you may have noticed some instability recently. The last software update we've made to the compute cluster is having an ill effect on compute nodes. Although Kubernetes can reschedule a running instance on a different node, this does not always succeed without intervention and it may leave a CI instance in an odd state for hours, depending on EF IT staff availability.

Even when an instance is rescheduled successfully, it does mean a "reboot" of your instance and a potentially failed build.

We apologize for the inconvenience, and the team is planning the next software update to hopefully resolve this recent instability. Our ultimate goal is to test updates on our staging cluster to establish reliability before upgrading the production cluster.

Please take a moment to Subscribe to the Eclipse Status page at

Thanks for reading.


Denis Roy

Director, IT Services | Eclipse Foundation

Eclipse Foundation: The Community for Open Innovation and Collaboration

Twitter: @droy_eclipse

_______________________________________________ mailing list
To unsubscribe from this list, visit

Back to the top