Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Hudson » Jobs "vanished" after a server crash
Jobs "vanished" after a server crash [message #899329] Tue, 31 July 2012 09:32 Go to next message
Aurelien Pupier is currently offline Aurelien Pupier
Messages: 558
Registered: July 2009
Location: Grenoble, FRANCE
Senior Member

Hi,

Context:
Hudson 2.2.0
as a service on Linux
Linux installed as a Virtual Machine (Virtualbox)

Issue:
- VM was unresponsive
- close the VM using "power off" option
- restart VM (and so on Hudson)
- Some jobs vanished

In logs there was two kinds of error:
- fingerprint, data corrupted
- nextBuilderNumber file content is not correct, NumberFormatException

I don't know the exact reason of the crash. I suppose that the jobs that vanished after restart were the job running

Some of the job were running on slave but some other were running on the master. So there should be no relation to master/slave.

How I "solve" my issue: delete fingerprints folder and writing a number in nextBuildNumber files.

What is the best way to avoid this issue/restore back from this issue? (I mean, other ideas than not to crash the VM :s)

Propositions:
- Have an utility method to clean fingerprints folder on-demand in Manage Hudson
- Display jobs with a nextBuildNumber file empty with a special page showing that they are misconfigured and allow to set a new number in nextBuildNumber
- backup nextBuildNumber file?(currently I back up only config.xml file) but it requires to commit on each build?

Is it possible? Do you have other ideas?

Regards,


Aurélien Pupier - BonitaSoft S.A.
My blog
My company Eclipse-related blog
Re: Jobs "vanished" after a server crash [message #899404 is a reply to message #899329] Tue, 31 July 2012 13:17 Go to previous messageGo to next message
Steve Christou is currently offline Steve Christou
Messages: 125
Registered: June 2012
Location: Milwaukee, Wisconsin
Senior Member

Aurelien Pupier wrote on Tue, 31 July 2012 08:32

In logs there was two kinds of error:
- fingerprint, data corrupted
- nextBuilderNumber file content is not correct, NumberFormatException

Could you send the fingerprint, and NumberFormatException stack traces? It's weird how you are receiving a NumberFormatException in that file. Are the contents of that file empty when you run the next build?

Quote:
- Some jobs vanished

I believe this is normal behavior if you shut down hudson before the job has the chance to finish running, hudson will not publish the job results, due to possible corrupt data.
Quote:
- Have an utility method to clean fingerprints folder on-demand in Manage Hudson

I could see that as being a good feature. Log an improvement in bugzilla component hudson.

Quote:
What is the best way to avoid this issue/restore back from this issue? (I mean, other ideas than not to crash the VM :s)

I could possibly see this as being a bug, since it should automatically update the nextBuildNumber to the next number, even if the jvm crashes.


/**
 * @author Steven Christou
 * @dev    Hudson-ci
 */
Re: Jobs "vanished" after a server crash [message #899754 is a reply to message #899404] Thu, 02 August 2012 04:40 Go to previous message
Aurelien Pupier is currently offline Aurelien Pupier
Messages: 558
Registered: July 2009
Location: Grenoble, FRANCE
Senior Member

Steve Christou wrote on Tue, 31 July 2012 19:17
Aurelien Pupier wrote on Tue, 31 July 2012 08:32

In logs there was two kinds of error:
- fingerprint, data corrupted
- nextBuilderNumber file content is not correct, NumberFormatException

Could you send the fingerprint, and NumberFormatException stack traces? It's weird how you are receiving a NumberFormatException in that file. Are the contents of that file empty when you run the next build?


I can't send the fingerprint, I already deleted it.
nextBuilderNumber is empty. I can't see them so I can't run them anymore.

Here are the stacktraces:
SEVERE: Failed Loading job XXX-JOBNAME-XXX
hudson.util.IOException2: /var/lib/hudson/jobs/XXX-JOBNAME-XXX/nextBuildNumber doesn't contain a number
at hudson.model.Job.onLoad(Job.java:369)
at hudson.model.AbstractProject.onLoad(AbstractProject.java:342)
at hudson.model.BaseBuildableProject.onLoad(BaseBuildableProject.java:102)
at hudson.model.Items.load(Items.java:117)
at hudson.model.Hudson$13.run(Hudson.java:2368)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:146)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:259)
at hudson.model.Hudson$4.runTask(Hudson.java:698)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:187)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:94)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:493)
at java.lang.Integer.parseInt(Integer.java:514)
at hudson.model.Job.onLoad(Job.java:366)
... 12 more

and

I can't find the stacktraces about corrupted exception.

Steve Christou wrote on Tue, 31 July 2012 19:17
Aurelien Pupier wrote on Tue, 31 July 2012 08:32

Quote:
- Some jobs vanished

I believe this is normal behavior if you shut down hudson before the job has the chance to finish running, hudson will not publish the job results, due to possible corrupt data.


I'm ok with the job results. but what I mean is that the Job itself is no more in the Hudson list; not the job execution #x


Steve Christou wrote on Tue, 31 July 2012 19:17
Aurelien Pupier wrote on Tue, 31 July 2012 08:32

Quote:
- Have an utility method to clean fingerprints folder on-demand in Manage Hudson

I could see that as being a good feature. Log an improvement in bugzilla component hudson.


enhancement request opened: https://bugs.eclipse.org/bugs/show_bug.cgi?id=386468

Steve Christou wrote on Tue, 31 July 2012 19:17
Aurelien Pupier wrote on Tue, 31 July 2012 08:32

Quote:
What is the best way to avoid this issue/restore back from this issue? (I mean, other ideas than not to crash the VM :s)

I could possibly see this as being a bug, since it should automatically update the nextBuildNumber to the next number, even if the jvm crashes.



I open another enhancement request to restore back from this issue:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=386471

I also wrote a blogpost about the issue and how to solve it currently: http://www.bonitasoft.org/blog/eclipse/hudson-jobs-missing-after-a-crash-restore-them-from-the-ashes


Aurélien Pupier - BonitaSoft S.A.
My blog
My company Eclipse-related blog
Previous Topic:Security Concept for Hudson Plugins
Next Topic:Dynamic filter for multi-configuration build
Goto Forum:
  


Current Time: Sat Aug 30 10:25:38 EDT 2014

Powered by FUDForum. Page generated in 0.01826 seconds