|[eclipse.org-committers] Recent outages explained|
In light of the unusual site outages we've experienced recently, I thought it would be good to explain them, as well as outline the corrective steps we've taken to prevent further service disruptions.
Tuesday Sept 4: www.eclipse.org was hit by a distributed denial-of-service (DDoS) attack on our Forums. Many IP addresses were accessing forum pages that would take several seconds to load. This strained our database servers and caused our web servers to eventually exhaust available RAM and swap themselves into oblivion.
To help resolve the issue, we've restricted access to the forums pages, implemented query timeouts on our database servers, decreased the amount of Apache handlers and increased available RAM to www.eclipse.orgservers. We'll also likely provision a fourth www.eclipse.org server to better tolerate this type of DDoS activity in the future.
Monday, Sept. 10: eclipsecon.org exhausted its available RAM, and swapped itself into oblivion. We've increased available RAM on the server and will enable a second load-balanced server for increased capacity and fault tolerance.
Tuesday, Sept. 11: www, download and a few other services became unresponsive as the NFS server hosting the downloads/archives crashed. Many of our sites, including www.eclipse.org, access the downloads area to query file names and file sizes for the download pages. The crash was caused by the same Linux kernel fault that caused an outage on Feb 15 . We'll need to investigate further into what can be done to resolve this issue since, last I checked, the kernel bug was still open.
I apologize for the disruptions these outages have caused; we'll work on improving the stability of these critical services.
Back to the top