Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[eclipse.org-committers] Major outage Saturday, July 31, 2021

Last Saturday, July 31, at 9:30pm EDT our primary backend storage server suffered a filesystem deadlock which resulted in a kernel lockup. The server was force-restarted, but the filesystem was corrupted and, after much time spent in repair, could not be recovered.

While we have a secondary storage server, it is not kept in sync in real time, and can be behind its primary by up to four hours. We re-enabled the primary server with a new filesystem, back-synced data from the secondary, but we have incurred some data loss.

The areas where data loss would be noticeable are in the Git repos and mailing list archives. Website data, such as that on www.eclipse.org, bugzilla, Wiki are not affected.

As Git is decentralized, there should be no real data loss, and we ask all projects hosting on Gerrit to re-sync their repos with Eclipse Gerrit. If the ECA validator complains that you're trying to push changes that are not yours, please file a bug/issue and we'll work with you to re-merge your repos.

We apologize for the inconvenience this causes. It was among the worst possible failures at the worst possible time. We will be authoring a complete post-mortem next week, and reflect on the areas where our process and infra can be improved to be more resilient, and publish the recommendations.

At this time, all core services are 100% operational, and most other services are 100% operational or with minor glitches. You can monitor our restoration process, and monitor service status at all times at https://www.eclipsestatus.io.

Denis


--
Denis Roy
Director, IT Services
@droy_eclipse

Back to the top