Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » SeMantic Information Logistics Architecture (SMILA) » General memory consumption of SMILA
General memory consumption of SMILA [message #644821] Tue, 14 December 2010 10:45 Go to next message
Martin Röbert is currently offline Martin RöbertFriend
Messages: 16
Registered: December 2010
Location: Leipzig, Germany
Junior Member
Hi,

I got a question about how much memory SMILA needs do work proper and how the memory behaviour should be in general.

We ran SMILA in several times with different options and the amount of memory which is used is somewhat... huge... From time to time SMILA even quits with a OutOfMemory-Exception - but I cannot reproduce his properly.

ATM I have one crawl job with 200 listener threads running. After 21hours of crawling i have a heap size of 2.2GB, the used heap lies between 1.9 and 2.1GB.
Is this a normal behaviour?

Following a list of the used VM options:
- Xms 512M
- Xmx 2048M
- XX:PermSize 256M
- XXMaxPermSize 768M
- XX:+HeapDumpOnOutOfMemoryError
- XX:+UseParallelGC
- XX:ParallelGCThreads=50

Thanks in advance,

Martin
Re: General memory consumption of SMILA [message #644836 is a reply to message #644821] Tue, 14 December 2010 12:06 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: juergen.schumacher.attensity.com

Am 14.12.2010, 11:45 Uhr, schrieb Martin <martin.roebert@gmx.de>:
> Hi, I got a question about how much memory SMILA needs do work proper
> and how the memory behaviour should be in general.
>
> We ran SMILA in several times with different options and the amount of
> memory which is used is somewhat... huge... From time to time SMILA even
> quits with a OutOfMemory-Exception - but I cannot reproduce his properly.
>
> ATM I have one crawl job with 200 listener threads running. After
> 21hours of crawling i have a heap size of 2.2GB, the used heap lies
> between 1.9 and 2.1GB. Is this a normal behaviour?

I'm not aware of any memory problems in any projects, but then I'm not
using the crawlers
currently. Also I don't have any other numbers for comparison. Of course
it depends very
much on how large the records are which you are pushing through the
system. Depending on
this 200 listener threads may be quite a lot on a single machine.

If you want to do further analysis, I think you will find the Eclipse
Memory Analyzer
[http://eclipse.org/mat/] helpful. It will show you quite easily where in
the systems
large amounts of memory are being referenced.

Regards,
Juergen.
Re: General memory consumption of SMILA [message #644840 is a reply to message #644821] Tue, 14 December 2010 12:29 Go to previous messageGo to next message
Martin Röbert is currently offline Martin RöbertFriend
Messages: 16
Registered: December 2010
Location: Leipzig, Germany
Junior Member
Hi Jürgen,

thanks for the fast answer.

I crawl a site (average size 500KB per page according to FireBug) on a server with 4 cores (2,5GHz) and currently 4GB RAM. How many crawl jobs with how many threads are the top limit for this set up?

Have any of your colleagues tested SMILA regarding the consumption of memory?

Have you any hints for a stable set up?


Thanks in advance,

Martin
Re: General memory consumption of SMILA [message #644897 is a reply to message #644840] Tue, 14 December 2010 16:09 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: juergen.schumacher.attensity.com

Am 14.12.2010, 13:29 Uhr, schrieb Martin <martin.roebert@gmx.de>:
> Hi J=C3=BCrgen,
>
> thanks for the fast answer.
>
> I crawl a site (average size 500KB per page according to FireBug) on a=
=

> server with 4 cores (2,5GHz) and currently 4GB RAM. How many crawl job=
s =

> with how many threads are the top limit for this set up?

I cannot say for sure, depends on how computational intensive it is to
process the pages, but I would reckon that using about 10, maybe 20 work=
er
listener threads should be enough. And then you can use as much crawl jo=
bs
to always have some messages in the work queue - you can use the JMX =

monitoring
of ActiveMQ to see this. But that's just guessing from my side.

I just asked a colleague who once did some tests and he used 16 listener=
=

threads.
He tried up to 64 threads, but it didn't improve anything.

> Have any of your colleagues tested SMILA regarding the consumption of =
=

> memory?

We did tests to ensure that there are no memory leaks once (and didn't f=
ind
any, of course ;-), but I don't think that we yet measure memory =

consumption
parameters.

Cheers,
Juergen
Re: General memory consumption of SMILA [message #645420 is a reply to message #644821] Fri, 17 December 2010 08:33 Go to previous messageGo to next message
Martin Röbert is currently offline Martin RöbertFriend
Messages: 16
Registered: December 2010
Location: Leipzig, Germany
Junior Member
Hi Juergen,


as you recommended I used MAT to look over a heap dump of SMILA.

I got the following message:

One instance of " org.eclipse.smila.connectivity.framework.crawler.web.WebSite Iterator " loaded by "...crawler.web" occupies 1,759,810,632 (87.58%) bytes. The memory is accumulated in one instance of "java.util.HashMap$Entry[]"

I uploaded a picture of a part of the report at ImageShack, so you can have a look: http://img232.imageshack.us/img232/8839/matscreen.png

Is my assumption correct, that this Map contains the links of my crawled/parsed web page that should be crawled in the future?

Some facts to my web.xml: It contains 15 seeds and does a deep search:
<CrawlingModel Value="5" Type="MaxDepth"></CrawlingModel>
<CrawlScope Type="Host"></CrawlScope>

Any hints on that?


Thanks in advance,

Martin
Re: General memory consumption of SMILA [message #645462 is a reply to message #645420] Fri, 17 December 2010 10:11 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: juergen.schumacher.attensity.com

Am 17.12.2010, 09:33 Uhr, schrieb Martin <martin.roebert@gmx.de>:

> Hi Juergen,
>
>
> as you recommended I used MAT to look over a heap dump of SMILA.
>
> I got the following message:
>
> One instance of "
> org.eclipse.smila.connectivity.framework.crawler.web.WebSite Iterator "
> loaded by "...crawler.web" occupies 1,759,810,632 (87.58%) bytes. The
> memory is accumulated in one instance of "java.util.HashMap$Entry[]"
>
> I uploaded a picture of a part of the report at ImageShack, so you can
> have a look: http://img232.imageshack.us/img232/8839/matscreen.png
>
> Is my assumption correct, that this Map contains the links of my
> crawled/parsed web page that should be crawled in the future?
> Some facts to my web.xml: It contains 15 seeds and does a deep search:
> <CrawlingModel Value="5" Type="MaxDepth"></CrawlingModel>
> <CrawlScope Type="Host"></CrawlScope>
>
> Any hints on that?

No, not really, I do not know the crawlers code very much. Maybe someone
else can comment better.
But it sounds like a memory leak to me. Could you please create a Bugzilla
issue so we could track
this (though I have to admit that our time is rather limited currently...).

Thanks,
Juergen.
Re: General memory consumption of SMILA [message #658938 is a reply to message #645462] Thu, 10 March 2011 14:09 Go to previous message
Daniel Stucky is currently offline Daniel StuckyFriend
Messages: 35
Registered: July 2009
Member
Hi,

I think I solved the issue: https://bugs.eclipse.org/bugs/show_bug.cgi?id=332864

Daniel
Previous Topic:Serialized Java objects remains on disk after deserializing
Next Topic:Deleting stored records
Goto Forum:
  


Current Time: Tue Mar 19 11:15:26 GMT 2024

Powered by FUDForum. Page generated in 0.03047 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top