Skip to main content

Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EPF » Importing Word content
Importing Word content [message #51108] Wed, 18 June 2008 11:33
Kristian Mandrup is currently offline Kristian MandrupFriend
Messages: 44
Registered: July 2009
Hi again,

I have been experimenting with ways to automate or improve importing of
content from Word into EPF. One issue is how to clean up the Word HTML.
I have successfully made a small converter, using HTML Tidy (batch) with a
configuration file:

tidy –config configWordClean.txt –f errors.txt –m [filename].htm

// sample config file for HTML tidy
indent: auto
indent-spaces: 2
wrap: 72
word-2000: yes
clean: yes
markup: yes
output-xml: yes
input-xml: no
doctype: omit
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: latin1

The result is only a starting point. In the second step, I use a custom
made WordTidy.xslt to filter the remainder to suit EPF more specifically.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
exclude-result-prefixes="fn xsl xs">
<xsl:output method="xhtml" encoding="ISO-8859-1" indent="yes"/>
<xsl:template match="/">
<xsl:template match="html">
<xsl:template match="head">
<xsl:template match="body">
<xsl:template match="div">
<xsl:template match="table">
<table width="{@width}" border="{@border}" cellspacing="{@cellspacing}"
<xsl:template match="tbody">
<xsl:template match="*[text() = '&#160;' ]"/>
<xsl:template match="span[@class='c1']"/>
<xsl:template match="tr">
<xsl:template match="td | th">
<td width="{@width}" valign="{@valign}">
<xsl:template match="h1 | h2 | h3">
<xsl:template match="h4">
<xsl:template match="h5">
<xsl:template match="span">
<xsl:template match="p">
<xsl:template match="img">
<img width="{@width}" height="{@height}">
<xsl:attribute name="src" select=" concat('resources/',
substring-after(@src, '/')) "/>
<xsl:template match="br"/>
<xsl:template match="ul">
<xsl:template match="li">

Notice how I enforce a rule whereby all images are located relative to the
html in a /ressources folder (within the EPF project structure).
Another step is then to place these images in the right place so the
references are valid!

The above technique really helps a lot, but I would like to automate it
even more. Why should I have to manually Insert the HTML into the RTE each
time. Why not use the EPF API to do this more directly???

Has any work been done towards this goal already?

Previous Topic:Cannot visualize my Activity Diagram
Next Topic:Importing Word content
Goto Forum:

Current Time: Wed Dec 01 19:54:02 GMT 2021

Powered by FUDForum. Page generated in 0.02340 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top