Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » SeMantic Information Logistics Architecture (SMILA) » Serialized Java objects remains on disk after deserializing
Serialized Java objects remains on disk after deserializing [message #655711] Tue, 22 February 2011 12:05 Go to next message
SMILANewBee is currently offline SMILANewBeeFriend
Messages: 42
Registered: August 2010
Member
The web crawler consists of two threads. The first one downloads the content and post processes it. The results is written to disk as Java serialized object into the directory workspace/.metadata/.plugins/org.eclipse.smila.connectivity. framework.crawler.web.

The second thread gets the serialized objects and processes them further to add them to the blackboard. After the second thread deserialized the object it doesn't remove the file. So the directory is swellow with unnecessary data.

To prevent swelling in the class WebCrawler one line can be added to the deserializeIndexDocument method. After the statement

IOUtils.closeQuietly(objstream);


a simple
file.delete();

can be written delete the file after deserializing it.
Re: Serialized Java objects remains on disk after deserializing [message #655897 is a reply to message #655711] Wed, 23 February 2011 08:22 Go to previous messageGo to next message
Andrej Rosenheinrich is currently offline Andrej RosenheinrichFriend
Messages: 22
Registered: August 2010
Junior Member
Can someone confirm if those thoughts are right? Or would it cause sidefeffects to delete those files?
Re: Serialized Java objects remains on disk after deserializing [message #656247 is a reply to message #655897] Thu, 24 February 2011 16:44 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: juergen.schumacher.attensity.com

Am 23.02.2011, 09:22 Uhr, schrieb Andrej Rosenheinrich =

<andrej.rosenheinrich@unister-gmbh.de>:

> Can someone confirm if those thoughts are right? Or would it cause =

> sidefeffects to delete those files?

Sorry, I'm not very accustomed to the Web Crawer, so I'm not sure. From =
=

reading the code,
I could imagine situations where a URL is visited twice during crawling =
=

and the second visit
happens before the first visit is completely processed, so the file woul=
d =

not be recreated
and the processing of the first visit would delete it before the second =
=

visit can read it.
Or something like this. On the other hand this sounds like a dubious =

behaviour anyway (:

Daniel, Tom, do you know anything about this?

Regards,
J=C3=BCrgen
Re: Serialized Java objects remains on disk after deserializing [message #657635 is a reply to message #656247] Thu, 03 March 2011 14:18 Go to previous messageGo to next message
Andrej Rosenheinrich is currently offline Andrej RosenheinrichFriend
Messages: 22
Registered: August 2010
Junior Member
To correct the first post here, the serialized files are deleted. This is done (if we understand the code right) in the close() method of the WebCrawler class. So only when the crawl is finished. If the crawl runs for a long time (assume days or weeks) and downloads a significant amount of pages this will be a problem. So maybe it would be the better option to delete these files instantly after processing is finished.
Re: Serialized Java objects remains on disk after deserializing [message #657795 is a reply to message #657635] Fri, 04 March 2011 05:45 Go to previous messageGo to next message
SMILANewBee is currently offline SMILANewBeeFriend
Messages: 42
Registered: August 2010
Member
Hello,

we have also noticed, that deleting the files instantly a performance decline was detected. We have too many hard drives accesses. So it would be better if the files would be deleted blockwise.
Re: Serialized Java objects remains on disk after deserializing [message #658164 is a reply to message #657795] Mon, 07 March 2011 08:55 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: juergen.schumacher.attensity.com

Hi,

Am 04.03.2011, 06:45 Uhr, schrieb SMILANewBee <nils.thieme@unister.de>:
> Hello,
>
> we have also noticed, that deleting the files instantly a performance =
=

> decline was detected. We have too many hard drives accesses. So it wou=
ld =

> be better if the files would be deleted blockwise.

Could you create a bugzilla issue for this, so we are able to track it?
Further discussion should be done in bugzilla then.
Thank you!

Regards,
J=C3=BCrgen.
Re: Serialized Java objects remains on disk after deserializing [message #658371 is a reply to message #658164] Tue, 08 March 2011 09:05 Go to previous messageGo to next message
thomas menzel is currently offline thomas menzelFriend
Messages: 81
Registered: July 2009
Member
to give my 2 cents:

On 24.02.2011 17:44, Jürgen Schumacher wrote:
>
> Daniel, Tom, do you know anything about this?
>

nope, sorry


On 07.03.2011 09:55, Jürgen Schumacher wrote:
>
> Could you create a bugzilla issue for this, so we are able to track it?
> Further discussion should be done in bugzilla then.
> Thank you!
>

+1 and plz post the issue link here, so people know where to go

--
thomas menzel aka tom


thomas menzel aka tom
Re: Serialized Java objects remains on disk after deserializing [message #658579 is a reply to message #658371] Wed, 09 March 2011 05:42 Go to previous message
SMILANewBee is currently offline SMILANewBeeFriend
Messages: 42
Registered: August 2010
Member
Hello,

a new bug was reported at https://bugs.eclipse.org/bugs/show_bug.cgi?id=339315
Previous Topic:Missing cleaning method in Pipelet
Next Topic:General memory consumption of SMILA
Goto Forum:
  


Current Time: Tue Mar 19 06:21:18 GMT 2024

Powered by FUDForum. Page generated in 0.02342 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top