WordML2HTML with support for images stylesheet updated

| 23 Comments | 2 TrackBacks

Almost 2 years ago I published a post "Transforming WordML to HTML: Support for Images" showing how to hack Microsoft WordML2HTML stylesheet to support images. People kept telling me it doesn't support some weird image formats or header images. Moreover I realized it has a bug and didn't work with .NET 2.0. So finally I updated that damn stylesheet. Now I took another Microsoft WordML2HTML stylesheet as a base - that one that comes with Word 2003 XML Viewer tool. I think it's a better one. Anyway, I added to it a couple of templates so images now get decoded and saved externally and headers and footers are processed too (only header/footer for odd pages per section to be precise). Note: this stylesheet uses embedded C# script to decode images and so only works with .NET XSLT processors, such as XslTransform (.NET 1.1) or XslCompiledTransform (.NET 2.0). You can also run it with nxslt/nxslt2 command line tool. Here is a small demo.

Starting Word 2003 document with images in body and header:

Magic XSLT transformation:

nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html
produces test.html and a directory containing decoded images:

Download the stylesheet at the XML Lab downloads page. Any comments are welcome.

Higher quality PDF to Word software will do more than just allow you to convert PDF to Word; you'll be able to do PDF conversion between Excel, Powerpoint, and other formats, such that converting PDF to Word is just the tip of the iceberg.

Related Blog Posts

2 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/558

Signs on the Sand: WordML2HTML with support for images stylesheet updated from XSLT:Blog[@author = 'M. David Peterson']/Code-of-the-Day on January 10, 2006 1:49 AM

Signs on the Sand: WordML2HTML with support for images stylesheet updated Almost 2 years ago I published a post "Transforming WordML to HTML: Support for Images" showing how to hack Microsoft WordML2HTML stylesheet to support images. People kept tellin... Read More

Update: this post is outdated, see "WordML2HTML with support for images stylesheet updated" for updates. Here is a new version of WordML2HTML XSLT stylesheet, developed by Microsoft for Word 2003 Beta2 and adapted by me to Word 2003 RTM. I called this ... Read More

23 Comments

yes, tried and it works! you know what? I even tried to insert autocad drawing as illistration of the text, what I did is to download autocad drawing viewer and copy the part of drawing, then paste it as WMF picture into it.

Hi Oleg,

The xslt you have mentioned that we can download is not available in the link you have given.Can you please let me know as to how I can access it?It would be a very great help if you respond to this.

Thanks and Regards
Gowri

Hi Oleg,

I have been working on generating wordML reports from 1 year. I have written the code (C#) to generate a WML document as report, and it is generated perfectly.I am also able to to convert it into HTML using the following code.

Here i have used the microsoft provided Xslt file for conversion. I also use the XslCompiledTransform class (.net provided class)which transforms the WML file to html file.

XsltFilePath = Application.StartupPath + @"\XSLT.xslt";
XmlFilePath = m_XMLfile;
HTMLFilePath = m_HTMLfile;
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XsltFilePath, new XsltSettings(false, true), null);
xslt.Transform(XmlFilePath, @".\"+ HTMLFilePath);

The Html file renders correctly on IE7, but it is losing its formating when i open the same file in IE8.

Could you please suggest me some solution.......

With Regards
Biju
biju.hsn@gmail.com

Hi ,

Can you provide me an alternate link for the XSLT , I am unable to download it.

Thanks in advance,
Anil

<msxsl:script language="c#" implements-prefix="ext">
public string decodePicture(XPathNodeIterator bindata, string dirname, string filename) {
if (bindata.MoveNext()) {
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(dirname);
if (!di.Exists)
di.Create();
using (System.IO.FileStream fs =
System.IO.File.Create(System.IO.Path.Combine(di.FullName, filename))) {
byte[] data = Convert.FromBase64String(bindata.Current.Value);
fs.Write(data, 0, data.Length);
}
return dirname + "/" + filename;
}
else
return "";
}
</msxsl:script>
<xsl:template match="w:pict">
<xsl:variable name="dir">
<xsl:choose>
<xsl:when test="$docName != ''">
<xsl:value-of select="$docName"/>
</xsl:when>
<xsl:otherwise>
<!-- We need something unique instead of document name -->
<!-- Let's take first 10 characters of title -->
<xsl:value-of select="translate(substring($p.docInfo/o:Title, 1, 10), ' ', '')"/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>_files</xsl:text>
</xsl:variable>
<img
src="{ext:decodePicture(w:binData, $dir, substring-after(w:binData/@w:name, 'wordml://'))}"
alt="{v:shape/v:imagedata/@o:title}" style="{v:shape/@style}"
title="{v:shape/v:imagedata/@o:title}"/>
</xsl:template>
Initializing...




/*
Fading Ticker Tape Script-
© Dynamic Drive (www.dynamicdrive.com)
Fading background color component by Dave Methvin, Windows Magazine
For full source code, installation instructions,
100's more DHTML scripts, and Terms Of
Use, visit dynamicdrive.com
*/
//default speed is 4.5 seconds, Change that as desired
var speed=4500;

var news=new Array();
news[0]="Click here to go to Dynamic Drive's front page";
news[1]="Visit Website Abstraction for free JavaScripts!";
news[2]="Looking for software downloads? Click here.";
//expand or shorten this list of messages as desired

i=0;
if (document.all)
tickerobject=document.all.subtickertape.style;
else
tickerobject=document.tickertape.document;
function regenerate(){
window.location.reload();
}
function regenerate2(){
if (document.layers)
setTimeout("window.onresize=regenerate",450);
}

function update(){
BgFade(0xff,0xff,0xff, 0x00,0x00,0x00,10);
if (document.layers){
document.tickertape.document.subtickertape.document.write(''+news[i]+'');
document.tickertape.document.subtickertape.document.close();
}
else
document.all.subtickertape.innerHTML=news[i];

if (i i++;
else
i=0;
setTimeout("update()",speed);
}

function BgFade(red1, grn1, blu1, red2,
grn2, blu2, steps) {
sred = red1; sgrn = grn1; sblu = blu1;
ered = red2; egrn = grn2; eblu = blu2;
inc = steps;
step = 0;
RunFader();
}
function RunFader() {
var epct = step/inc;
var spct = 1 - epct;
if (document.layers)
tickerobject.bgColor =
Math.floor(sred * spct + ered *
epct)*256*256 +
Math.floor(sgrn * spct + egrn * epct)*256 +
Math.floor(sblu * spct + eblu * epct);
else
tickerobject.backgroundColor=
Math.floor(sred * spct + ered *
epct)*256*256 +
Math.floor(sgrn * spct + egrn * epct)*256 +
Math.floor(sblu * spct + eblu * epct);
if ( step setTimeout('RunFader()',50);
}
step++;
}

BRONTOK.A[16] [ By: HVM31 -- JowoBot #VM Community ]

BRONTOK.A[16]

-- Hentikanlah kebobrokan di negeri ini --


1. Penjarakan Koruptor, Penyelundup, Tukang Suap, & Bandar NARKOBA

( Send to "NUSAKAMBANGAN")


2. Stop Free Sex, Aborsi, & Prostitusi
( Go To HELL )


3. Stop pencemaran lingkungan, pembakaran hutan & perburuan liar.


4. Stop Pornografi & Pornoaksi


5. SAY NO TO DRUGS !!!



-- KIAMAT SUDAH DEKAT --


Terinspirasi oleh:
Elang Brontok (Spizaetus Cirrhatus) yang hampir punah

[ By: HVM31 ]
-- JowoBot #VM Community --
!!! Akan Kubuat Mereka (VM lokal yg cengeng & bodoh) Terkapar !!!

alert ("Anda Setuju?");

When I try to use the transform with the linked xml file, I get this exception:

"Attribute and namespace nodes cannot be added to the parent element after a text, comment, pi, or sub-element node has already been added."

Anyone have any idea as to why I would get this exception? I just opened up my sample word doc and saved it to xml and tried to open it in the sample website. The xml file is here:

http://www.sendspace.com/file/045o89

And the full exception is here:

http://pastebin.com/m7fac7065

it should be able to support the mwf file...

Biju, can you send me sample WordML file?

Hi,
If my wordML document has blank lines, after coverting it through following command
"nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html"
it suppress those lines and hence the formatting is not proper. You can even in your above example only.

The line between "Header text and image" and the image has been suppressed after the conversion. Any help/insight on this will be much appericiated.

Thanks for your great job!
I have a question.
Does the new script(WordML2HTML XSLT stylesheet, v1.3-.NET-script) you updated in 2006 support the .wmf image?
I found that when I using Word2003 insert a Microsoft Math formula object,then save as .xml and successfully transform it to a .html, use ie open it. the problem appears :the fomula isn't show olny a error-cross while other images show correctly inclued
header images.

I am wonering the same thing anout text boxes. Could you please post a reply?

Sorry, busy. I'll look at this problem this weekend.

Hi, in my last post, I gave you the link to download the sample code. Did you get the chance to look at it ?
Is there any possibility of help from your side on this ?
Please reply
P.S If you need any information from my side, please let me know

Alok, can you provide a minimal sample?

If my wordML document has blank lines, after coverting it through following command
"nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html"
it suppress those lines and hence the formatting is not proper. You can even in your above example only.

The line between "Header text and image" and the image has been suppressed after the conversion. Any help/insight on this will be much appericiated.

Sorry, what is Word Templates?

The XSLT is awesome, thank you!! Do you plan to add support for Word Templates into it?

Christian, have you tried latest stylesheet?

The conversion to HTML is generally ok, but it doesn't save the pictures.

I have tried to debug nxslt2, but with no luck. I cannot run it in debug mode at all.

Vassilij, probably it doesn't.

Hi
Does this stylesheet have support for text boxes? If not how can I retrieve information that is in a text box ?

Hi
Could that xslt run in a java application?
Thanks

Leave a comment