Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » SeMantic Information Logistics Architecture (SMILA) » Indexing of XML (List) Elements
Indexing of XML (List) Elements [message #639167] Mon, 15 November 2010 13:27 Go to next message
UNI-HI Stud is currently offline UNI-HI Stud
Messages: 6
Registered: August 2010
Location: Germany
Junior Member
Hello,
i have a question concerning the indexing of XML (List) Elements.

I have the folowing XML DataStructure (to be crawled and indexed):
<RECIPES>
	<RECIPE>
	<TI>Agnolotti Ignudi Al Mascarpone (Meat Balls in Mascarpone</TI>
	<IN amnt="113398">4 oz Prosciutto; in one piece</IN>
	<IN amnt="113398">4 oz Pancetta; in one piece</IN>
        <!-- much more ingrediants -->
     	<PR>long Text</PR>
	</RECIPE>
	<!-- more RECIPES -->
</RECIPES>


To Split the RECIPES i use "xmlprocessing.XmlSplitterPipelet" and for the TITEL,PR and INGREDIANTS (IN) i use the "xmlprocessing.XPathExtractorPipelet". (this works very well so far)

This is my DataDictionary Indexstructure (the relevant fragments):
<IndexField FieldNo="13" IndexValue="true" Name="RecipePr" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredient" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="11" IndexValue="true" Name="RecipeTitle" StoreText="true" Tokenize="true" Type="Text"/>


The Extension Activity to extract the ingrediants is this:
	
<extensionActivity name="extractIN">
  <proc:invokePipelet>
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet" />
      <proc:variables input="request" output="request" />
      <proc:PipeletConfiguration>
        <proc:Property name="inputType">
          <proc:Value>ATTRIBUTE</proc:Value>
        </proc:Property>
        <proc:Property name="outputType">
          <proc:Value>ATTRIBUTE</proc:Value>
        </proc:Property>
        <proc:Property name="inputName">
          <proc:Value>Content</proc:Value>
        </proc:Property>
        <proc:Property name="outputName">
          <proc:Value>RecipeIngredient</proc:Value>
        </proc:Property>
        <proc:Property name="xpath">
          <proc:Value>RECIPE/IN</proc:Value>
        </proc:Property>
      </proc:PipeletConfiguration>       								
  </proc:invokePipelet>
</extensionActivity>


My problem is now, that SMILA concatenates all ingrediants to one String-Literal/Attribute. What i need is a List of String-Literals in the RecipeIngredient-Attribute.

Something like this (just an simple illustration ):
<IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredients" StoreText="true" Tokenize="true" Type="List"/>

I have no idea how to modeling it in the XML DataDictionary. Has someone an idea and can help me?

Kind regards,

UNI-HI Stud
Re: Indexing of XML (List) Elements [message #639196 is a reply to message #639167] Mon, 15 November 2010 15:03 Go to previous messageGo to next message
Eclipse User
Originally posted by: juergen.schumacher.attensity.com

Hi,

Am 15.11.2010, 14:27 Uhr, schrieb UNI-HI Stud <schae003@uni-hildesheim.de>:
> ...
> My problem is now, that SMILA concatenates all ingrediants to one
> String-Literal/Attribute. What i need is a List of String-Literals in
> the RecipeIngredient-Attribute.
>
> Something like this (just an simple illustration ):
> <IndexField FieldNo="12" IndexValue="true" Name="RecipeIngredients"
> StoreText="true" Tokenize="true" Type="List"/>

I'm not too accustomed to these elements of SMILA, but from looking at the
code it seems that
the XPathExtractor-Pipelet does not produce multiple literals in a single
attribute, so you
probably need to write your own pipelet to create this list of literals.
However, the Lucene
integration seems to be able to handle multi-literal attributes. So it
should be merely be
a matter of producing the literals, the configuration of the
DataDictionary seems to be OK to me.

Regards,
Juergen.
Re: Indexing of XML (List) Elements [message #639358 is a reply to message #639196] Tue, 16 November 2010 09:05 Go to previous message
UNI-HI Stud is currently offline UNI-HI Stud
Messages: 6
Registered: August 2010
Location: Germany
Junior Member
Ah ok thanks a lot Wink
Previous Topic:Using the RecordRecycler
Next Topic:SMILA version 0.8
Goto Forum:
  


Current Time: Wed Oct 01 16:37:35 GMT 2014

Powered by FUDForum. Page generated in 0.09849 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software