Re: [smila-dev] SMILA as search engine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [smila-dev] SMILA as search engine

From: Andreas Weber <Andreas.Weber@xxxxxxxxxxx>
Date: Thu, 11 Oct 2012 12:26:03 +0200
Accept-language: de-DE, en-US
Acceptlanguage: de-DE, en-US
Delivered-to: smila-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/smila-dev>
List-help: <mailto:smila-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: Ac2niC47Nq3C/LfyRL67F9islCEtyAAEKb5Q
Thread-topic: SMILA as search engine

Hi Rene,

I think the main reason for your problem is that current SMILA doesn’t extract the text from PDFs out-of-the-box.

We plan to provide this for the next release, but it’s not implemented yet.

So, with current SMILA, if you want to search on PDF content, you have to implement a Pipelet (or Worker) which will do the PDF-to-text extraction (e.g. by calling a 3^rd party SW) and use that in your workflow.

Regards,

Andreas

Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Corinth, Rene
Gesendet: Donnerstag, 11. Oktober 2012 10:13
An: smila-dev@xxxxxxxxxxx
Betreff: [smila-dev] SMILA as search engine

Hi all,

I have one more question before SMILA go online in theseus….

If I want to use the advanced search in Theseus http://www.theseus-programm.de/en/75_smila.php?tpl=advanced and I’m searching for “Document Type” PDF, no title or summary is shown.

I think the problem is that I use just the webcrawler and not the filecrawler, but these pdf’s are in the web. So how can I combine these two crawlers or do I have to go a different way?

Cheers René

References:
- [smila-dev] SMILA as search engine
  - From: Corinth, Rene

Prev by Date: [smila-dev] SMILA as search engine
Next by Date: Re: [smila-dev] SMILA as Search engine
Previous by thread: [smila-dev] SMILA as search engine
Next by thread: [smila-dev] Commit rights for Georg Schmidt have been expired
Index(es):
- Date
- Thread

Breadcrumbs