The world is getting better. And the Word too! Word 2003 Beta2 now understands not only those *.doc files, but XML also. It's all as it should be in open XML world (what makes some people suspicious): there is WordML vocabulary, its schema (well documented one, btw) is available as part of Microsoft Word XML Content Development Kit Beta 2. Having said that it's obvious to go on and to assume that Word documents now may be queried using XPath or XQuery as well as transformed and generated using XSLT. Isn't it fantastic?
So here is "Hello Word!" XSLT stylesheet, which generates minimal, while still valid Word 2003 document:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:processing-instruction name="mso-application">progid="Word.Document"</xsl:processing-instruction> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml"> <w:body> <w:p> <w:r> <w:t>Hello Word!</w:t> </w:r> </w:p> </w:body> </w:wordDocument> </xsl:template> </xsl:stylesheet>That <?mso-application progid="Word.Document"?> processing instruction is important one - that's how Windows recognizes an XML document as Word document. Seems like they parse only XML document prolog looking for this PI. Good idea I think.
Now let's try something more interesting - transform some XML document to formatted Word document, containing heading, italic text and link. Consider the following source doc:
<?xml-stylesheet type="text/xsl" href="style.xsl"?> <chapter title="XSLT Programming"> <para>It's <i>very</i> simple. Just ask <link url="http://google.com">Google</link>.</para> </chapter>Then XSLT stylesheet (quite big one due to verbose element-based WordML syntax):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml"> <xsl:template match="/"> <xsl:processing-instruction name="mso-application">progid="Word.Document"</xsl:processing-instruction> <w:wordDocument> <xsl:apply-templates/> </w:wordDocument> </xsl:template> <xsl:template match="chapter"> <o:DocumentProperties> <o:Title> <xsl:value-of select="@title"/> </o:Title> </o:DocumentProperties> <w:styles> <w:style w:type="paragraph" w:styleId="Heading3"> <w:name w:val="heading 3"/> <w:pPr> <w:pStyle w:val="Heading3"/> <w:keepNext/> <w:spacing w:before="240" w:after="60"/> <w:outlineLvl w:val="2"/> </w:pPr> <w:rPr> <w:rFonts w:ascii="Arial" w:h-ansi="Arial"/> <w:b/> <w:sz w:val="26"/> </w:rPr> </w:style> <w:style w:type="character" w:styleId="Hyperlink"> <w:rPr> <w:color w:val="0000FF"/> <w:u w:val="single"/> </w:rPr> </w:style> </w:styles> <w:body> <w:p> <w:pPr> <w:pStyle w:val="Heading3"/> </w:pPr> <w:r> <w:t> <xsl:value-of select="@title"/> </w:t> </w:r> </w:p> <xsl:apply-templates/> </w:body> </xsl:template> <xsl:template match="para"> <w:p> <xsl:apply-templates/> </w:p> </xsl:template> <xsl:template match="i"> <w:r> <w:rPr> <w:i/> </w:rPr> <xsl:apply-templates/> </w:r> </xsl:template> <xsl:template match="text()"> <w:r> <w:t xml:space="preserve"><xsl:value-of select="."/></w:t> </w:r> </xsl:template> <xsl:template match="link"> <w:hlink w:dest="{@url}"> <w:r> <w:rPr> <w:rStyle w:val="Hyperlink"/> <w:i/> </w:rPr> <xsl:apply-templates/> </w:r> </w:hlink> </xsl:template> </xsl:stylesheet>And the resulting WordML document, opened in Word 2003:
Not bad.
Ok, I'm closing comments on this page due to severe spamming.
Interesting to see Microsoft playing catchup. Open Source Office alternative OpenOffice.org http://www.openoffice.org is based on xml and has been around for years.
Thanks ! Good work :)
Nelson, you need something like /contract/sections/section[@number='section1']/sectionTerm[ @termid='term1']/term
med, see "Generating images in WordprocessingML" at http://www.tkachenko.com/blog/archives/000106.html
Hello,
Does anybody know how to get an child node which has an attribute by using selectSingleNode method.
I try to get node "sectionTerm" with attribute termid = "term1" under section which has attribute number="section1" from following
xml file(I have to use [ to replace < because it will not show tag name if I use <):
......
[contract][sections]
[section number="section1"]
[sectionTerm termid = "term1"]
[term]Hello[/term]
[/sectionTerm]
[sectionTerm termid = "term2"]
[term]Goodbye[/term]
[/sectionTerm]
[/section]
[section number="section2"]
[sectionTerm termid = "term1"]
[term]Hello[/term]
[/sectionTerm]
[sectionTerm termid = "term2"]
[term]Goodbye[/term]
[/sectionTerm]
[/section]
[/sections]
[/contract]
I'm a newbie in WordML, how do you handle images?
This is pretty interesting. I agree with the author.
Cris, afaik Word 2003 holds images embedded within WordML document, obviously Base64 encoded. It's w:pict element, take a look into WordML schema. So it also seems to be quite feasible.
Yeah, sure I've been thinking about XSL-FO2WordML and WordML2XSL-FO, but I'm still in research phase. While I know XSL-FO well, I'm newbie in WordML.
But that's really sounds tempting...
using XSL:FO as unified formatting language for documents, can any WordML be transformed to FO and can any FO be transformed to WordNL, in other words, is there (semantic, or functional, whatever that means in formal terms, I am not 100% sure) equivalence between two formatting languages?
I don't know that, did you think of that already? I think definite answer requires some time consuming research...
how do you handle images and making them local images so users can edit images and see them if internet connection is not available.
Very Cool,
i will wait until more tools are avaible!
Thanks for the info, Oleg.
Hans Braumller
-- + --
Mail Art Networking Visual & Virtual Poet
http://braumueller.crosses.net
Oh, Goggle, funny typo, thanks, fixed. btw, goggle.com site does exist, but I don't advise to browse it due to nasty spam popup windows.
And what about Word - I do impressed about these new possibility also. Let's just wait the release and when people get upgraded.
Wow. I wish I understood that! It seems to be one of the holy grails, producing a valid word document *without* using word :-)
You do know you wrote Goggle, right?