Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geclipse-dev] A question to workflows




On Wed, 30 Apr 2008, Ariel Garcia wrote:

Hi,

I have a very general question to workflows. I submitted a workflow
consisting of three jobs: A, B and C. B depends on A and C depends on B.
In my case job A failed (the famous hit job shallow retry count). Does
it then make sense to proceed with jobs B and C? I do not believe this
is under our control, right?

A->B->C

In that case job B should not even be able to start, because it depends on
A, so it has to wait for it to finish.

It should be specified in workflow description. Maybe A is not critical and B and C can run.

e.g. A is preparation of precomputed data. (kind of retrieving data cache)
B is computing and it can benefit of existence of precomputed data or can do precomputing itself. In this case even if A fails for any reason, B can be started, but just will run longer.

Marking A as not critical can inform middleware that it can run B.

This is how Gridge GJD works:
http://www.gridge.org/files/grms/doc/user/html_one/view/GrmsUserGuide.html#d0e847

Look for "crucial" attribute of <task> (or Example 4.19)

Different would be that case
 A-> B <-C
where B depends on the two independent jobs/inputs A,C
If A fails, of course C (running in parallel) could be canceled.

No! C should never be cancelled unless user clearly specifies so. After successfull run of C, its results can be stored somwhere and partial results of workflow are then available for user. And maybe whole workflow can be restarted and C is already done.

If that happens of not is probably dependent on the workflow engine
implementation.
My guess would be that glite cancels the job but it is just a guess. Just
give it a try ;-)

My guess is that gLite do not care about subjobs state and will run whole workflow. :) And task B will fail because of lacking input data.


In any case running B or not is not a problem for gEclipse. It is strictly middleware decision. We should just have full worklflow description, possibly with some indications for middleware how it should deal with such cases. "critical" or "crucial" parameter could be good start. Another parameter could be how middleware should continue workflow job when critical task fails - "cancel_all_tasks" or "allow_task_finish".


Regards,

Pawel


Back to the top