[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geclipse-dev] A question to workflows
|
On Wed, 30 Apr 2008, Ariel Garcia wrote:
Hi,
I have a very general question to workflows. I submitted a workflow
consisting of three jobs: A, B and C. B depends on A and C depends on B.
In my case job A failed (the famous hit job shallow retry count). Does
it then make sense to proceed with jobs B and C? I do not believe this
is under our control, right?
A->B->C
In that case job B should not even be able to start, because it depends on
A, so it has to wait for it to finish.
It should be specified in workflow description. Maybe A is not critical
and B and C can run.
e.g. A is preparation of precomputed data. (kind of retrieving data cache)
B is computing and it can benefit of existence of precomputed data or can
do precomputing itself.
In this case even if A fails for any reason, B can be started, but
just will run longer.
Marking A as not critical can inform middleware that it can run B.
This is how Gridge GJD works:
http://www.gridge.org/files/grms/doc/user/html_one/view/GrmsUserGuide.html#d0e847
Look for "crucial" attribute of <task> (or Example 4.19)
Different would be that case
A-> B <-C
where B depends on the two independent jobs/inputs A,C
If A fails, of course C (running in parallel) could be canceled.
No! C should never be cancelled unless user clearly specifies so. After
successfull run of C, its results can be stored somwhere and partial
results of workflow are then available for user. And maybe whole workflow
can be restarted and C is already done.
If that happens of not is probably dependent on the workflow engine
implementation.
My guess would be that glite cancels the job but it is just a guess. Just
give it a try ;-)
My guess is that gLite do not care about subjobs state and will run whole
workflow. :) And task B will fail because of lacking input data.
In any case running B or not is not a problem for gEclipse. It is
strictly middleware decision. We should just have full worklflow
description, possibly with some indications for middleware how it should
deal with such cases. "critical" or "crucial" parameter could be good
start. Another parameter could be how middleware should continue workflow
job when critical task fails - "cancel_all_tasks" or "allow_task_finish".
Regards,
Pawel