Hi Rene,
I think the main reason for your problem is that current SMILA doesn’t extract the text from PDFs out-of-the-box.
We plan to provide this for the next release, but it’s not implemented yet.
So, with current SMILA, if you want to search on PDF content, you have to implement a Pipelet (or Worker) which will do the PDF-to-text extraction (e.g. by calling a 3rd party SW) and use that in your workflow.
Regards,
Andreas
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Corinth, Rene
Gesendet: Donnerstag, 11. Oktober 2012 10:13
An: smila-dev@xxxxxxxxxxx
Betreff: [smila-dev] SMILA as search engine
Hi all,
I have one more question before SMILA go online in theseus….
If I want to use the advanced search in Theseus http://www.theseus-programm.de/en/75_smila.php?tpl=advanced and I’m searching for “Document Type” PDF, no title or summary is shown.
I think the problem is that I use just the webcrawler and not the filecrawler, but these pdf’s are in the web. So how can I combine these two crawlers or do I have to go a different way?
Cheers René