Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » SeMantic Information Logistics Architecture (SMILA) » EOFException in CrawlThread
EOFException in CrawlThread [message #652876] Mon, 07 February 2011 08:01 Go to previous message
Andrej Rosenheinrich is currently offline Andrej Rosenheinrich
Messages: 22
Registered: August 2010
Junior Member
Hi again,

looking at the logfiles we notice on a regular basis the following exception:

2011-02-04 21:00:24,326 ERROR [Thread-39 ] impl.CrawlThread - Error while processing record with Id whatever of dataSourceId
org.eclipse.smila.connectivity.framework.CrawlerException: org.eclipse.smila.connectivity.framework.CrawlerException: java.io.EOFException
at org.eclipse.smila.connectivity.framework.crawler.web.WebCraw ler.getMObject(WebCrawler.java:361)
at org.eclipse.smila.connectivity.framework.util.internal.DataR eferenceImpl.getRecord(DataReferenceImpl.java:100)
at org.eclipse.smila.connectivity.framework.impl.CrawlThread.pr ocessDataReferences(CrawlThread.java:352)
at org.eclipse.smila.connectivity.framework.impl.CrawlThread.ru n(CrawlThread.java:235)
Caused by: org.eclipse.smila.connectivity.framework.CrawlerException: java.io.EOFException
at org.eclipse.smila.connectivity.framework.crawler.web.WebCraw ler.deserializeIndexDocument(WebCrawler.java:830)
at org.eclipse.smila.connectivity.framework.crawler.web.WebCraw ler.getRecord(WebCrawler.java:577)
at org.eclipse.smila.connectivity.framework.crawler.web.WebCraw ler.getMObject(WebCrawler.java:359)
... 3 more
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Obje ctInputStream.java:2553)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java :1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java: 350)
at java.util.ArrayList.readObject(ArrayList.java:593)
at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe thodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass .java:974)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.j ava:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java :1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea m.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.j ava:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java :1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java: 350)
at org.eclipse.smila.connectivity.framework.crawler.web.WebCraw ler.deserializeIndexDocument(WebCrawler.java:828)


This exception appears when crawling with one or multiple threads. Crawls are running for several hours using the same dataSourceId, so if the configfile is wrong the exception should be thrown from the beginning and more often. Is there any explanation for this behavior? Is this a problem of SMILA or the underlying operation system?

Thanks,
Andrej
 
Read Message
Read Message
Read Message
Read Message
Previous Topic:Changelog?
Next Topic:Storing all crawled data?
Goto Forum:
  


Current Time: Sat May 25 06:52:37 EDT 2013

Powered by FUDForum. Page generated in 0.01605 seconds