Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-user] RE: XPathExtractorPipelet

Hi Georg,

 

Daniel Stucky solved the problem and will present the solution.

 

Best

Andreas

 

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von August Georg Schmidt
Gesendet: Mittwoch, 11.
März 2009 12:03
An: Smila project user mailing list
Betreff: [smila-user] RE: XPathExtractorPipelet

 

Hi Andreas,

 

from my point of view this looks like an malformed XML document that you enter into the pipelet.

 

Could you give me a sample for the information structure where this problem occurs?

 

How is the XML document parsed? Does it have something like a Unicode header in front of the document (before <?xml version="1.0" encoding="utf-8" ?>

)?

 

Kind Regards,

 

Georg

 

From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Andreas.Schultz@xxxxxxxxxxx
Sent: Mittwoch, 11. März 2009 11:37
To: smila-user@xxxxxxxxxxx
Subject: [smila-user] XPathExtractorPipelet

 

Hello,

 

I have problems using the XPathExtractorPipelet within the addpipeline.

Is anybody out there who has used this pipelet?

 

Even the author.xml file of the unit-test for xmlprocessing crashes!

 

I got the following message:

 

 

 

2009-03-11 10:23:29,734 WARN  [ODEServerImpl-5                              ]  memdao.ProcessDaoImpl                         - Discarding in-memory instance 0 because it exceeded its time-to-live: null

 2009-03-11 10:23:29,734 INFO  [ODEServerImpl-5                              ]  bpel.ProcessingServiceManager                 - AddPipeline/extensionActivity-activity-line-35: invoking service SimpleMimeTypeIdentifier, processing request -> request

 2009-03-11 10:23:29,734 INFO  [ODEServerImpl-5                              ]  bpel.PipeletManager                           - AddPipeline/extensionActivity-activity-line-51: invoking pipelet org.eclipse.smila.processing.pipelets.HtmlToTextPipelet, processing request -> request

 2009-03-11 10:23:29,750 INFO  [ODEServerImpl-5                              ]  bpel.PipeletManager                           - AddPipeline/extensionActivity-activity-line-77: invoking pipelet org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet, processing request -> request

 2009-03-11 10:23:29,750 WARN  [ODEServerImpl-5                              ]  pipelets.ATransformationPipelet               - unable to transform document src:file|key:<Path=D:\works\SMILA\SR_USDL\author.xml>

org.eclipse.smila.processing.pipelets.xmlprocessing.util.XMLUtilsException: Error while parsing XML document!

                at org.eclipse.smila.processing.pipelets.xmlprocessing.util.XMLUtils.parse(XMLUtils.java:359)

                at org.eclipse.smila.processing.pipelets.xmlprocessing.util.XMLUtils.parse(XMLUtils.java:251)

                at org.eclipse.smila.processing.pipelets.xmlprocessing.AXmlTransformationPipelet.createDocument(AXmlTransformationPipelet.java:47)

                at org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet.process(XPathExtractorPipelet.java:118)

                at org.eclipse.smila.processing.bpel.PipeletManager.doInvoke(PipeletManager.java:186)

                at org.eclipse.smila.processing.bpel.ExtensionManager.invokeAdapter(ExtensionManager.java:222)

                at org.eclipse.smila.processing.bpel.ExtensionManager.invokeActivity(ExtensionManager.java:167)

                at org.eclipse.smila.processing.bpel.SMILAExtensionBundle$InvokePipeletActivity.run(SMILAExtensionBundle.java:79)

                at org.eclipse.smila.processing.bpel.SMILAExtensionBundle$InvokePipeletActivity.run(SMILAExtensionBundle.java:91)

                at org.apache.ode.bpel.rtrep.v2.EXTENSIONACTIVITY.run(EXTENSIONACTIVITY.java:62)

                at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)

                at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

                at java.lang.reflect.Method.invoke(Unknown Source)

                at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)

                at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)

                at org.apache.ode.bpel.rtrep.v2.RuntimeInstanceImpl.execute(RuntimeInstanceImpl.java:639)

                at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:593)

                at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.executeCreateInstance(BpelRuntimeContextImpl.java:581)

                at org.apache.ode.bpel.engine.ODEProcess.executeCreateInstance(ODEProcess.java:373)

                at org.apache.ode.bpel.engine.ODEProcess$2.call(ODEProcess.java:295)

                at org.apache.ode.bpel.engine.ODEProcess$2.call(ODEProcess.java:294)

                at org.apache.ode.bpel.engine.ODEProcess$ProcessCallable.call(ODEProcess.java:1206)

                at org.apache.ode.bpel.engine.BpelInstanceWorker.doInstanceWork(BpelInstanceWorker.java:174)

                at org.apache.ode.bpel.engine.BpelInstanceWorker.execInCurrentThread(BpelInstanceWorker.java:108)

                at org.apache.ode.bpel.engine.ODEProcess.doInstanceWork(ODEProcess.java:487)

                at org.apache.ode.bpel.engine.ODEProcess.invokeProcess(ODEProcess.java:293)

                at org.apache.ode.bpel.engine.MyRoleMessageExchangeImpl.doInvoke(MyRoleMessageExchangeImpl.java:122)

                at org.apache.ode.bpel.engine.UnreliableMyRoleMessageExchangeImpl$1.call(UnreliableMyRoleMessageExchangeImpl.java:44)

                at org.apache.ode.bpel.engine.UnreliableMyRoleMessageExchangeImpl$1.call(UnreliableMyRoleMessageExchangeImpl.java:43)

                at org.apache.ode.bpel.engine.ODEProcess$ProcessCallable.call(ODEProcess.java:1206)

                at org.apache.ode.bpel.engine.Contexts.execTransaction(Contexts.java:106)

                at org.apache.ode.bpel.engine.BpelServerImpl$TransactedCallable.call(BpelServerImpl.java:968)

                at org.apache.ode.bpel.engine.BpelServerImpl$ServerCallable.call(BpelServerImpl.java:948)

                at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)

                at java.util.concurrent.FutureTask.run(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                at java.lang.Thread.run(Unknown Source)

Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.

                               PublicId:              null

                               SystemId:           null

                               LineNumber:     2

                               ColumnNumber:             2

                at org.eclipse.smila.processing.pipelets.xmlprocessing.util.DOMErrorHandler.fatalError(DOMErrorHandler.java:48)

                at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)

                at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)

                at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)

                at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)

                at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)

                at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)

                at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

                at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

                at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

                at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

                at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

                at org.eclipse.smila.processing.pipelets.xmlprocessing.util.XMLUtils.parse(XMLUtils.java:355)

                ... 37 more

 2009-03-11 10:23:29,750 INFO  [ODEServerImpl-5                              ]  bpel.ProcessingServiceManager                 - AddPipeline/extensionActivity-activity-line-109: invoking service LuceneIndexService, processing request -> request

 

 

My configuration is (new part: extract_USID_From_Content):

 

<?xml version="1.0" encoding="utf-8" ?>

<!--

  * Copyright (c) 2008 empolis GmbH and brox IT Solutions GmbH.

                * All rights reserved. This program and the accompanying materials

                * are made available under the terms of the Eclipse Public License v1.0

                * which accompanies this distribution, and is available at

                * http://www.eclipse.org/legal/epl-v10.html

                *

                * Contributors:

                *    Daniel Stucky (empolis GmbH) - initial design

-->

<process name="AddPipeline" targetNamespace="http://www.eclipse.org/smila/processor"

                xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:xsd="http://www.w3.org/2001/XMLSchema"

                xmlns:proc="http://www.eclipse.org/smila/processor" xmlns:rec="http://www.eclipse.org/smila/record">

 

                <import location="processor.wsdl" namespace="http://www.eclipse.org/smila/processor"

                               importType="http://schemas.xmlsoap.org/wsdl/" />

 

                <partnerLinks>

                               <partnerLink name="Pipeline" partnerLinkType="proc:ProcessorPartnerLinkType" myRole="service" />

                </partnerLinks>

 

                <extensions>

                               <extension namespace="http://www.eclipse.org/smila/processor" mustUnderstand="no" />

                </extensions>

 

                <variables>

                               <variable name="request" messageType="proc:ProcessorMessage" />

                </variables>

 

                <sequence>

                               <receive name="start" partnerLink="Pipeline" portType="proc:ProcessorPortType" operation="process" variable="request"

                                               createInstance="yes" />

 

                               <extensionActivity name="invokeSimpleMimeTypeIdentification">

                                               <proc:invokeService>

                                                               <proc:service name="SimpleMimeTypeIdentifier" />

                                                               <proc:variables input="request" output="request" />

                                               </proc:invokeService>

                               </extensionActivity>

 

                               <!-- only process text based content, skip everything else -->

                               <if name="conditionIsText">

      <condition>starts-with($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V, "text/")</condition>

                                               <sequence name="processTextBasedContent">                            

 

                                                               <!-- extract txt from html files -->

                                                               <if name="conditionIsHtml">

          <condition>($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/html")

            or ($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/xml")</condition>

                                                                              <extensionActivity name="invokeHtml2Txt">

                                                                                              <proc:invokePipelet>

                                                                                                              <proc:pipelet class="org.eclipse.smila.processing.pipelets.HtmlToTextPipelet" />

                                                                                                              <proc:variables input="request" output="request" />

                                                                                                              <proc:PipeletConfiguration>

                                                                                                                             <proc:Property name="inputType">

                                                                                                                                             <proc:Value>ATTACHMENT</proc:Value>

                                                                                                                             </proc:Property>                                                                

                                                                                                  <proc:Property name="outputType">

                                                                                                      <proc:Value>ATTACHMENT</proc:Value>

                                                                                                  </proc:Property>

                                                                                                  <proc:Property name="inputName">

                                                                                                      <proc:Value>Content</proc:Value>

                                                                                                  </proc:Property>

                                                                                                  <proc:Property name="outputName">

                                                                                                      <proc:Value>Content</proc:Value>

                                                                                                  </proc:Property>

                                                                                                  <proc:Property name="meta:title">

                                                                                                      <proc:Value>Title</proc:Value>

                                                                                                  </proc:Property>                                                                     

                                                                                                              </proc:PipeletConfiguration>                                                                                                                                

                                                                                              </proc:invokePipelet>

                                                                              </extensionActivity>

                                                               </if>                                                    

 

 

                                                                              <extensionActivity name="extract_USID_From_Content">

                                                                                              <proc:invokePipelet>

                                                                                                              <proc:pipelet class="org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet" />

                                                                                                              <proc:variables input="request" output="request" />

                                                                                                                             <proc:PipeletConfiguration>

                                                                                                                                             <proc:Property name="xpath" type="java.lang.String">

                                                                                                                                                             <proc:Value>service/serviceKey</proc:Value>

                                                                                                                                             </proc:Property>                          

                                                                                                                                             <proc:Property name="seperator" type="java.lang.String">

                                                                                                                                                             <proc:Value></proc:Value>

                                                                                                                                             </proc:Property>

                                                                                                                                             <proc:Property name="namespace" type="java.lang.String">

                                                                                                                                                             <proc:Value></proc:Value>

                                                                                                                                             </proc:Property>                                                                                                                                                       

                                                                                                                                             <proc:Property name="inputType" type="java.lang.String">

                                                                                                                                                             <proc:Value>ATTACHMENT</proc:Value>

                                                                                                                                             </proc:Property>          

                                                                                                                                             <proc:Property name="outputType" type="java.lang.String">

                                                                                                                                                             <proc:Value>ATTACHMENT</proc:Value>

                                                                                                                                             </proc:Property>          

                                                                                                                                             <proc:Property name="inputName" type="java.lang.String">

                                                                                                                                                             <proc:Value>Content</proc:Value>

                                                                                                                                             </proc:Property>          

                                                                                                                                             <proc:Property name="outputName" type="java.lang.String">

                                                                                                                                                             <proc:Value>serviceKey</proc:Value>

                                                                                                                                             </proc:Property>          

                                                                                                                             </proc:PipeletConfiguration>

                                                                                              </proc:invokePipelet>

                                                                               </extensionActivity>

 

 

                              

                                                               <extensionActivity name="invokeLuceneService">

                                                                              <proc:invokeService>

                                                                                              <proc:service name="LuceneIndexService" />

                                                                                              <proc:variables input="request" output="request" />

                                                                                              <proc:setAnnotations>

                                                                                                              <rec:An n="org.eclipse.smila.lucene.LuceneIndexService">

                                                                                                                             <rec:V n="indexName">test_index</rec:V>

                                                                                                                             <rec:V n="executionMode">ADD</rec:V>

                                                                                                              </rec:An>

                                                                                              </proc:setAnnotations>

                                                                              </proc:invokeService>

                                                               </extensionActivity>

                                                              

                                               </sequence>                                                  

                               </if>                    

 

                               <reply name="end" partnerLink="Pipeline" portType="proc:ProcessorPortType" operation="process" variable="request" />

                               <exit />

                </sequence>

</process>

 

 

Best

Andreas Schultz

 

Software Development

--------------------------------------------------------

empolis GmbH

An der Autobahn

Postfach 180

33311 Gütersloh

Germany

http://www.empolis.de/

mailto:andreas.schultz@xxxxxxxxxxx

Tel. +49 (0) 52 41 - 80-3462

Fax. +49 (0) 52 41 - 80-41820

Sitz Gütersloh | Amtsgericht Gütersloh HRB 3971

Geschäftsführer: Dr. Stefan Wess

 


Back to the top