Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EPF » WordHtmForEPF - Preparing Word saved .htm pages for instant EPF import
WordHtmForEPF - Preparing Word saved .htm pages for instant EPF import [message #53550] Thu, 11 September 2008 07:36
Kristian Mandrup is currently offline Kristian MandrupFriend
Messages: 44
Registered: July 2009

I was annoyed that I couldn't paste a .htm document saved from Word into a
EPF rich editor without getting some rather unuseful results. I created a
small groovy script to greatly improve the process.

Usage guide:
Save Word doc as filtered .htm page. It will create a folder with external
resources used such as images, numbered image001, image002 and so on. This
really sucks for importing into EPF.

Run the WordHtmForEPF.groovy in this directory, perhaps with a filepattern
argument for selecting which htm files to process, fx:
WordHtmForEPF.groovy Guide*.htm

The script will create new .htm files with EPF_ prefix for each .htm file
matching the filepattern. These new .htm files have image src attributes
redirected to images in a /resource folder.
The images from the Word saved resource folders will be renamed more
usefully (with document name prefix) and copied to the new /resources

Now if you open a EPF_xxx.htm fil in a broswer you can copy paste it into
a EPF RTE and the result will be just beautiful. All images are
transfered, stored in EPF /resources folder for the type of method element
and each image named to indicate the method element they are part of :)

It even handles special characters (danish characters as of now...)


* @author
//import commons
public class WordHtmForEPF {

private static encode(first) {
def prefix = first.replaceAll(/(%20|\s)/) {fullM, space ->
return '-'
prefix = prefix.replaceAll(/(æ|Æ|%C3%A6)/) {fullM, space ->
return 'ae'
prefix = prefix.replaceAll(/(ø|Ø)/) {fullM, space ->
return 'oe'
prefix = prefix.replaceAll(/(å|Å)/) {fullM, space ->
return 'aa'
prefix = prefix.replaceAll(/(é|É)/) {fullM, space ->
return 'e'
return prefix

* @param args
public static void main(def args){
println "WordHtmEPF v.1.0 - by Kristian Mandrup consulting"
def filePattern = "[^EPF_].*.htm"
if (args.length > 0) {
filePattern = args[0]
def removePrefix = false
if (args.length > 1) {
if (args[1] == 'remove')
removePrefix = true

def dir = new File(".")
dir.eachFileMatch(~"${filePattern}") {File f ->
println "Generating EPF image references for: ${}"
def str = f.getText()
def replaced = str.replaceAll(/src="(.*)\/(.*)">/) {fullMatch, first,
second ->
return 'src="resources/' + encode(first) + '-' + second + '">'
// println replaced
def f2 = new File('EPF_' +
f2 << replaced
String dirName =[0..-5] + '-filer'
File fdir = new File(dirName)
renameResources(fdir, removePrefix)
File resDir = new File('resources')
if (!resDir.exists()) {
// println "Renaming ${} to resources "
fdir.renameTo(new File('resources'))

private static void renameResources(File directory, removePref) {
def renameClos = { dir, filePattern, prefix, removePrefix ->
println "Prefix ${prefix}"
println "filePattern ${filePattern}"
dir.eachFileMatch(~"${filePattern}") {File f ->
// is prefix already present in start of name!?
String newFileName = "${prefix}-${}"
def replaceName = encode(newFileName)
if (! {

if (removePrefix) {
int index = prefix.length()+1
replaceName =[index..-1]
} else {
// don't add prefix if present already!
replaceName =

// copy all files to resources dir
String newDirectoryName = 'resources/'
def d1 = new File(newDirectoryName)
File newResourceFile = new File(newDirectoryName + '/' + replaceName)
// copy file
println "Copy ${} to resources/${} "
new AntBuilder().copy(file: "${f.canonicalPath}",
def filePattern = ".*.(jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
String prefix = ""
def path = directory.canonicalPath
def index = path.lastIndexOf('\\')+1
prefix = path[index..-1]
println "Directory name used as default prefix: ${prefix}"
println "Renaming all pictures (jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
renameClos( directory, filePattern, prefix, removePref )

Previous Topic:RichText (EPF milestone
Next Topic:WordHtmForEPF - Preparing Word saved .htm pages for instant EPF import
Goto Forum:

Current Time: Thu Jan 19 00:43:38 GMT 2017

Powered by FUDForum. Page generated in 0.04157 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software