| Serialized Java objects remains on disk after deserializing [message #655711] |
Tue, 22 February 2011 07:05  |
SMILANewBee Messages: 42 Registered: August 2010 |
Member |
|
|
The web crawler consists of two threads. The first one downloads the content and post processes it. The results is written to disk as Java serialized object into the directory workspace/.metadata/.plugins/org.eclipse.smila.connectivity. framework.crawler.web.
The second thread gets the serialized objects and processes them further to add them to the blackboard. After the second thread deserialized the object it doesn't remove the file. So the directory is swellow with unnecessary data.
To prevent swelling in the class WebCrawler one line can be added to the deserializeIndexDocument method. After the statement
IOUtils.closeQuietly(objstream);
a simple
can be written delete the file after deserializing it.
|
|
|
|
| Re: Serialized Java objects remains on disk after deserializing [message #656247 is a reply to message #655897] |
Thu, 24 February 2011 11:44   |
|
Originally posted by: juergen.schumacher.attensity.com
Am 23.02.2011, 09:22 Uhr, schrieb Andrej Rosenheinrich =
<andrej.rosenheinrich@unister-gmbh.de>:
> Can someone confirm if those thoughts are right? Or would it cause =
> sidefeffects to delete those files?
Sorry, I'm not very accustomed to the Web Crawer, so I'm not sure. From =
=
reading the code,
I could imagine situations where a URL is visited twice during crawling =
=
and the second visit
happens before the first visit is completely processed, so the file woul=
d =
not be recreated
and the processing of the first visit would delete it before the second =
=
visit can read it.
Or something like this. On the other hand this sounds like a dubious =
behaviour anyway (:
Daniel, Tom, do you know anything about this?
Regards,
J=C3=BCrgen
|
|
|
|
|
| Re: Serialized Java objects remains on disk after deserializing [message #658164 is a reply to message #657795] |
Mon, 07 March 2011 03:55   |
|
Originally posted by: juergen.schumacher.attensity.com
Hi,
Am 04.03.2011, 06:45 Uhr, schrieb SMILANewBee <nils.thieme@unister.de>:
> Hello,
>
> we have also noticed, that deleting the files instantly a performance =
=
> decline was detected. We have too many hard drives accesses. So it wou=
ld =
> be better if the files would be deleted blockwise.
Could you create a bugzilla issue for this, so we are able to track it?
Further discussion should be done in bugzilla then.
Thank you!
Regards,
J=C3=BCrgen.
|
|
|
| Re: Serialized Java objects remains on disk after deserializing [message #658371 is a reply to message #658164] |
Tue, 08 March 2011 04:05   |
thomas menzel Messages: 81 Registered: July 2009 |
Member |
|
|
to give my 2 cents:
On 24.02.2011 17:44, Jürgen Schumacher wrote:
>
> Daniel, Tom, do you know anything about this?
>
nope, sorry
On 07.03.2011 09:55, Jürgen Schumacher wrote:
>
> Could you create a bugzilla issue for this, so we are able to track it?
> Further discussion should be done in bugzilla then.
> Thank you!
>
+1 and plz post the issue link here, so people know where to go
--
thomas menzel aka tom
thomas menzel aka tom
|
|
|
|
Powered by
FUDForum. Page generated in 0.04675 seconds