Indexing of XML (List) Elements [message #639167] |
Mon, 15 November 2010 08:27  |
Eclipse User |
|
|
|
Hello,
i have a question concerning the indexing of XML (List) Elements.
I have the folowing XML DataStructure (to be crawled and indexed):
<RECIPES>
<RECIPE>
<TI>Agnolotti Ignudi Al Mascarpone (Meat Balls in Mascarpone</TI>
<IN amnt="113398">4 oz Prosciutto; in one piece</IN>
<IN amnt="113398">4 oz Pancetta; in one piece</IN>
<!-- much more ingrediants -->
<PR>long Text</PR>
</RECIPE>
<!-- more RECIPES -->
</RECIPES>
To Split the RECIPES i use "xmlprocessing.XmlSplitterPipelet" and for the TITEL,PR and INGREDIANTS (IN) i use the "xmlprocessing.XPathExtractorPipelet". (this works very well so far)
This is my DataDictionary Indexstructure (the relevant fragments):
<IndexField FieldNo="13" IndexValue="true" Name="RecipePr" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredient" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="11" IndexValue="true" Name="RecipeTitle" StoreText="true" Tokenize="true" Type="Text"/>
The Extension Activity to extract the ingrediants is this:
<extensionActivity name="extractIN">
<proc:invokePipelet>
<proc:pipelet class="org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet" />
<proc:variables input="request" output="request" />
<proc:PipeletConfiguration>
<proc:Property name="inputType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="outputType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="inputName">
<proc:Value>Content</proc:Value>
</proc:Property>
<proc:Property name="outputName">
<proc:Value>RecipeIngredient</proc:Value>
</proc:Property>
<proc:Property name="xpath">
<proc:Value>RECIPE/IN</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
My problem is now, that SMILA concatenates all ingrediants to one String-Literal/Attribute. What i need is a List of String-Literals in the RecipeIngredient-Attribute.
Something like this (just an simple illustration ):
<IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredients" StoreText="true" Tokenize="true" Type="List"/>
I have no idea how to modeling it in the XML DataDictionary. Has someone an idea and can help me?
Kind regards,
UNI-HI Stud
|
|
|
Re: Indexing of XML (List) Elements [message #639196 is a reply to message #639167] |
Mon, 15 November 2010 10:03   |
Eclipse User |
|
|
|
Originally posted by: juergen.schumacher.attensity.com
Hi,
Am 15.11.2010, 14:27 Uhr, schrieb UNI-HI Stud <schae003@uni-hildesheim.de>:
> ...
> My problem is now, that SMILA concatenates all ingrediants to one
> String-Literal/Attribute. What i need is a List of String-Literals in
> the RecipeIngredient-Attribute.
>
> Something like this (just an simple illustration ):
> <IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredients"
> StoreText="true" Tokenize="true" Type="List"/>
I'm not too accustomed to these elements of SMILA, but from looking at the
code it seems that
the XPathExtractor-Pipelet does not produce multiple literals in a single
attribute, so you
probably need to write your own pipelet to create this list of literals.
However, the Lucene
integration seems to be able to handle multi-literal attributes. So it
should be merely be
a matter of producing the literals, the configuration of the
DataDictionary seems to be OK to me.
Regards,
Juergen.
|
|
|
|
Powered by
FUDForum. Page generated in 0.04072 seconds