We are having a very frustrating problem on my project whereby folders are becoming locked on our OpenSUSE Linux server and the only way of releasing the lock is a restart (killing processes accessing the directory does nothing to help!). Symptoms of a directory being locked are that you can 'cd' into the directory in Putty but an 'ls' hangs and never returns; WinSCP hangs if you try and go into the locked folder; and any other processes trying to access a file in the folder or a sub-folder also hang.
Although not certain, we suspect it is something to do with Eclipse or one of the plugins we are using as the folders that become locked are invariably something Eclipse has been accessing. Has anyone seen or heard of this happening before? Or can guess why this might be happening? Is this likely to be something Eclipse could cause or more likely to be an issue with the OS?
We are running Eclipse Helios SR2 and the main plugins we are using are the Maven, Mercurial and Atlassian JIRA connector plugins.
Any help or ideas will be greatly appreciated!! Let me know if you need more information about any of the software/hardware too.
What is the relationship between your development workstation and the Linux server?
Are you running Eclipse on a Windows workstation and accessing a workspace on the Linux Server? Are you attempting to share a workspace? I'm guessing that there's a lot of moving parts between Eclipse and your Linux server
I don't think Eclipse attempts any directory locking. AFAIK, Linux doesn't support any notion of directory locking.
I just did a quick Google search on possible causes of 'ls' hanging. A misconfigured or corrupt NFS mount seems to be a pretty common cause.
Thanks for the response - we are still struggling to resolve this issue over a month after my original post!
The relationship between the development workstation and the Linux server is as follows. We run Eclipse on the Linux server and then use X11 forwarding to open up a GUI on our development Windows workstation. We then submit pieces of Java code to run on the Linux server through Eclipse. All workspaces/libraries we map to are on the Linux server and Maven, Mercurial and JIRA are all also on the Linux server, so there is a lot of connections going through this X11 GUI.
We have found Eclipse to be fairly unstable running through X11 and we regularly have it hang/crash. When this happens, it will often leave zombie Java processes running on the Linux server. We invariably have a number of zombie processes present when the directory locking occurs (our load average also goes through the roof!).
We do not use any NFS mounts, although as you say this does seem to be the main reason for this kind of behaviour. I also know that Eclipse does not attempt any directory locking, but this is not a normal OS lock but something causing commands to hang.
In case there are any SUSE/Linux whizzes reading this - if I run 'strace ls' on one of these locked directories it always hangs at a call of getdents64(...
I've only every run Eclipse using X11 forwarding (with various versions of Fedora on the backend) a couple of times myself and it's worked brilliantly. Except for performance: the performance was terrible.
Is it possible to test with X11 forwarding out of the equation? i.e. is it possible to run Eclipse directly on the Linux server? Can we at least get Windows out of the equation? i.e. does it work better if you ssh -Y from a Linux client? (In my own limited experiences, I've used the -X option successfully).
The configuration you're using seems overly complicated. Is there some reason why developers can't just run Eclipse on their workstations? Using a DVCS like Mercurial makes this sort of configuration work pretty well.
FWIW, the X11 forwarding just sends graphics commands through the connection. The connections that Eclipse makes from your server to Mecurial, JIRA, etc., are separate from the X11 channel and very likely do not go to the developer workstation.
Denis Roy Messages: 307 Registered: October 2004 Location: Ottawa, Ontario, Canada
I am willing to bet that the processes left behind one of these "crashes" are in the "D" state, or uninterruptible sleep. Something is causing Eclipse (and/or Linux) to lose a connection (likely with X11) and since the directory is open and waiting for I/O, any other process trying to access the directory must also wait for the Eclipse/java process to complete.
You can't kill processes in a 'D' state -- they must regain control of their I/O and in this case, that may not be possible.
X11 from a Linux server to a Linux client is very reliable. Which X client are you using on Windows?
An alternative to this setup would be to place the workspace on a shared directory (shared on Linux via NFS or Samba) and use a local console to execute it.
Wayne - we are certainly going to try to take X11 out of the equation. This is going to involve running Eclipse on our Windows workstations and then remote submitting code to the Linux server through an http connection (this is something my company's Eclipse plugin allows). We cannot just run everything on the workstations though unfortunately as they are simply not powerful enough for the kind of processing we need to perform, so we have a high spec central server that we all run on. Additionally, we are dealing with very high data volumes which are all stored on the central server and could not be transferred to the Windows boxes.
That is some good information about X11 too...
Denis - the processes are listed in top with a state of Z I'm afraid. Is this the status you are talking about? Or is there some other status I need to look at? As far as I can tell from my investigation, there is nothing we can do to get rid of these Z processes, and so if other processes have to wait for them to complete to access the same directory, then I would imagine this is definitely our issue!
We are using a client called Xming on the Windows servers. Do you know it? Is there a more reliable one out there? Certainly, almost every time when I go to close it and I have already closed down all of my GUIs, it still tells me there are clients connected.
Do you get the same problem when you access the server via a Linux workstation?
Can you run Eclipse directly on the Linux server box?
Are you attempting to share workspaces, or does each developer get their own?
Why not have the server pull from Mercurial? I'm not familiar with Mercurial, but I assume that it has some notion of branches like Git does. Developers can commit from their workstations into their own branches and then pull the desired branch onto the server and test.