Hello everyone,
we implemented a crawler able to walk through an atlassian confluence instance. When we got into an error we threw an RecoverableException, cause those errors seemed to be just temporarily and most of them where time-outs when the confluence server wasn’t able to answer in the desired speed.
Using this method we found some strange behavior that might not be intended:
First of all: When the number of retries are reached the job is terminated (not the record that is affected) leaving the solr-index in an inconsistent state. (Some data is indexed some not, and while using the delta worker data from the previous run was deleted that actually wasn’t meant to be deleted!)
We circumvented this problem by always catching an error an dropping the record immediately. But this can’t be the absolute solution. Sometimes just an time-out occurs and a retry would be much appreciated!
What we also recognized was that after the job was marked FAILED some errors were seen in the log:
2014-08-05 09:53:58,040 WARN [pool-4-thread-7 ] taskworker.DefaultTaskLogFactory - Task df1acc5f-940b-49fc-8d0b-67f0c4ad0561: Task 'df1acc5f-940b-49fc-8d0b-67f0c4ad0561' for job 'crawlConfluence' and run '20140805-095231322337' is unknown, maybe already finished or workflow run was canceled.
org.eclipse.smila.jobmanager.exceptions.IllegalJobStateException: Task 'df1acc5f-940b-49fc-8d0b-67f0c4ad0561' for job 'crawlConfluence' and run '20140805-095231322337' is unknown, maybe already finished or workflow run was canceled.
After searching we found out that these errors maybe caused by those other workers still working while the actual job has been failed. So the log is their way to say: “we recognized that the job is failed”. Is that the fact?
But the really troubling errors where those:
2014-08-05 09:53:51,657 ERROR [pool-4-thread-1 ] taskworker.DefaultTaskLogFactory - Task 2910cefa-4a02-48ff-b4b3-c2666f0b854d: Error while executing task 2910cefa-4a02-48ff-b4b3-c2666f0b854d in worker com.eccenca.importing.confluence.worker.ConfluenceObjectFetcherWorker@6481c861: Object with id 'pageBucket/257543c8-b090-4f34-848a-2e63b0863b1c0' does not exist in store 'temp'.
org.eclipse.smila.objectstore.NoSuchObjectException: Object with id 'pageBucket/257543c8-b090-4f34-848a-2e63b0863b1c0' does not exist in store 'temp'.
All of a sudden some records were missing leaving the objectstore in an inconsistent state. And if we restarted the job those errors occurred again. So there is some clean up missing.
Our questions: Is there a way (or is there something planned) to have those RecoverableExectons not causing a brutal failure? Something like: “Drop after n retries”. And maybe the last problem described concerning the objectstore a bug?
Thank you!
Kind Regards
Daniel
Mit freundlichen Grüßen / Kind regards
Daniel Hänßgen
phone +49 511 33652866
dhaenssgen@xxxxxxx
Postanschrift / Postal address:
brox IT-Solutions GmbH | An der Breiten Wiese 9 | 30625 Hannover | Germany
brox IT-Solutions GmbH
An der Breiten Wiese 9 | 30625 Hannover | Germany
Geschäftsführer / Board of Directors: Hans-Chr. Brockmann
Sitz und Registergericht / Domicile and Court of Registry: Hannover
HRB-Nr. / Commercial Register No.: 59240
USt-ID / VAT registration No.: DE 199 515 978
Diese Mail kann vertrauliche Informationen enthalten. Wenn Sie nicht Adressat sind, sind Sie nicht zur Verwendung
der in dieser Mail enthaltenen Informationen befugt. Bitte benachrichtigen Sie uns sofort über den irrtümlichen Empfang.
This e-mail may contain confidential information. If you are not the addressee you are not authorized to make use of
the information contained in this e-mail. Please inform us immediately that you have received it by mistake.