Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[] Webmaster update

Greetings committers,

A quick infrastructure update for you.

1. Server outages: problem solved?
2. Europa is coming
3. Project vservers
4. Mailing List and newsgroup archives
5. Bugzilla: neat tip

1. Server outages: problem solved?

Since March 2006 we were having infrequent and short, yet high-impact outages caused by our back-end file server crashing for no known reason. After lots of digging, I stumbled onto something and have applied a fix yesterday. I'm now confident that our backend servers are rock-solid.

As usual, the technical details are at the end of this e-mail, for those who are inclined to read this kind of stuff.

2. Europa is coming

As Europa approaches, our server infrastructure is being stressed by higher-than-usual Bugzilla and CVS usage. However, our servers are handling this added load quite well.

I'm also pleased to inform you that we'll be getting a Gigabit connection to the Internet to support the Europa download traffic.

The Europa release is being coordinated on the cross-project-issues-dev mailing list [1]. As we approach the release date, we will be freezing the contents of the downloads area to give our mirror sites exclusive access to new bits. Stay tuned on the cross-project list for content freeze notices.

3. Project vservers

Just a reminder that project vserver owners and maintainers must subscribe to the cross-project list [1], as I use it to communicate vserver maintenance issues.

4. Mailing List and newsgroup archives

We finally closed bug 98404 when Matt implemented monthly and yearly indexes for mailing lists and newsgroups. This should prove useful when trying to find specific discussion threads.

5. Bugzilla: neat tip

If you need to find attachments that were submitted by someone specific, you can use the Advanced Search, pick Attachment Data / Changed By / [email address of Attacher] criteria. A few people had asked about this in the past, and this is a good workaround for a feature that does not exist. Props to the submitter of this comment:

As usual, thanks for reading, and have a great summer (or winter for those in the southern hemisphere).


Technical details of the server outages:

Two SCSI disk drives in our 14-disk array were an older-model disk with a different level of microcode (this was not obvious to find). There exists a code update for these disks that fixes a problem that is very likely the cause of our crashes.

Essentially, the older (broken) disk code would put the disk into a "read/write protect" state if the disk failed a self-test due to a time-out, whereas the newer code does not. With one (or two) disks in this state, it seems that the entire disk array would simply wait for them to exit that state. Most often the unit recovered, but occasionally, the volume of incoming requests simply crushed the server, unable to service them with an unresponsive disk system.

As myself, Matt nor Karl had ever updated the microcode on a disk drive before, we had to weigh the risks of attempting to perform such an action on a live, busy and mission-critical system.

Props to Novell's tech support team, who worked hard with IBM to try (in vain) to reproduce this.


Eclipse WebMaster - webmaster@xxxxxxxxxxx
Questions? Consult the WebMaster FAQ at
View my status at

Back to the top