Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EPF » Importing Word content
Importing Word content [message #589543] Wed, 18 June 2008 07:33
Kristian Mandrup is currently offline Kristian Mandrup
Messages: 44
Registered: July 2009
Member
Hi again,

I have been experimenting with ways to automate or improve importing of
content from Word into EPF. One issue is how to clean up the Word HTML.
I have successfully made a small converter, using HTML Tidy (batch) with a
configuration file:

tidy –config configWordClean.txt –f errors.txt –m [filename].htm

---
// sample config file for HTML tidy
indent: auto
indent-spaces: 2
wrap: 72
word-2000: yes
clean: yes
markup: yes
output-xml: yes
input-xml: no
doctype: omit
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: latin1
---

The result is only a starting point. In the second step, I use a custom
made WordTidy.xslt to filter the remainder to suit EPF more specifically.

---
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="fn xsl xs">
<xsl:output method="xhtml" encoding="ISO-8859-1" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="html">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="head">
</xsl:template>
<xsl:template match="body">
<body>
<xsl:apply-templates/>
</body>
</xsl:template>
<xsl:template match="div">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="table">
<table width="{@width}" border="{@border}" cellspacing="{@cellspacing}"
cellpadding="{@cellpadding}">
<xsl:apply-templates/>
</table>
</xsl:template>
<xsl:template match="tbody">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[text() = '&#160;' ]"/>
<xsl:template match="span[@class='c1']"/>
<xsl:template match="tr">
<tr>
<xsl:apply-templates/>
</tr>
</xsl:template>
<xsl:template match="td | th">
<td width="{@width}" valign="{@valign}">
<xsl:apply-templates/>
</td>
</xsl:template>
<xsl:template match="h1 | h2 | h3">
<h3>
<xsl:apply-templates/>
</h3>
</xsl:template>
<xsl:template match="h4">
<h4>
<xsl:apply-templates/>
</h4>
</xsl:template>
<xsl:template match="h5">
<h5>
<xsl:apply-templates/>
</h5>
</xsl:template>
<xsl:template match="span">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="img">
<img width="{@width}" height="{@height}">
<xsl:attribute name="src" select=" concat('resources/',
substring-after(@src, '/')) "/>
<xsl:apply-templates/>
</img>
</xsl:template>
<xsl:template match="br"/>
<xsl:template match="ul">
<ul>
<xsl:apply-templates/>
</ul>
</xsl:template>
<xsl:template match="li">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
</xsl:stylesheet>
---

Notice how I enforce a rule whereby all images are located relative to the
html in a /ressources folder (within the EPF project structure).
Another step is then to place these images in the right place so the
references are valid!
---

The above technique really helps a lot, but I would like to automate it
even more. Why should I have to manually Insert the HTML into the RTE each
time. Why not use the EPF API to do this more directly???

Has any work been done towards this goal already?

Kristian
Previous Topic:Importing Word content
Next Topic:Linking task into Task Descriptor not working
Goto Forum:
  


Current Time: Thu Aug 21 16:15:01 EDT 2014

Powered by FUDForum. Page generated in 0.04006 seconds