Eclipse Community Forums: SeMantic Information Logistics Architecture (SMILA)

Home » Archived » SeMantic Information Logistics Architecture (SMILA) » Flushing records to late

Flushing records to late [message #660190]

Thu, 17 March 2011 05:50

Eclipse User

Hello,

the web crawlers crawls some page and the CrawlThread collects them into a record buffer. If the record buffer is full it will be flushed after a new record is processed. If the crawler waits for new pages a long time the buffer will not be flushed within the wait time.

This can be handled easily to move the flush statement within the "processDataReference" to the top.

Have we understand this problem correctly?

Re: Flushing records to late [message #660259 is a reply to message #660190]

Thu, 17 March 2011 11:17

Eclipse User

Hi,

I'm not sure which flush statement you want to move.
- flushRecords() is called in run() after the CrawlThread was stopped to flush any remaining records. It is also called within checkForFlush() if the buffersize is reached or the time limit is reached
- checkForFlush() is called in processDataReferences() before a dataReference is processed.

The waiting you described occurs within run() when
dataReferences = _crawler.getNext();
is called. So I don't think you will gain anything by moving any of the flush methods.

To ensure that records are buffered at most "flushinterval" seconds we would have to implement it in another way, usinf a separate thread to trigger the flush if the time has elapsed.

Re: Flushing records to late [message #660357 is a reply to message #660190]

Fri, 18 March 2011 01:34

Eclipse User

Hello,

I mean the second case where the "checkForFlush" is called in the "processDataReferences" method.

You are right. We have modified the "WebCrawler" class in such way that the crawler returns an empty array if nothing is crawled in the iteration. So in this case no flush will be processed.

But your idea with the thread is great because it makes sure that the flush interval will be meet independent of the crawling.

Previous Topic:	Announcing changes in SMILA
Next Topic:	Nightly builds are now also available for Mac OS X!

Goto Forum:

-=] Back to Top [=-

Current Time: Sat Jul 12 13:08:30 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter