Being pedantic

| No Comments | No TrackBacks

Reading last Aaron Skonnard's installment in MSDN Mag I cannot resist to note that his explanation of whitespace handling in XSLT is not actually true. Or to put it this way - it's true only for Microsoft's XSLT implementations in default mode.

Here is what Aaron says:

Before an XSLT processor executes a transformation against a given source document, it first strips all white space-only text nodes from both documents.

Well, looks like a)Aaron's only working with Microsoft XSLT processors in default mode and b)forgot what W3C XSLT Recommendation says about whitespace stripping.

It might be new for some Microsoft-oriented XSLT users, but XSLT spec explicitly says that whitespace should be preserved in source XML tree by default. Yes, even insignificant one aka whitespace-only text nodes. This is how all conforming XSLT processors should actually behave. MSXML and XslTransform are only notable exceptions. The explanation of this spec violation is that the process of whitespace stripping is done at tree-building stage and both XSLT engines have no control over it. Indeed, by default both XmlDocument and XPathDocument do strip all insignificant whitespace. And some of us seems to be so get used to it that even claim this is how XSLT should work. That's not true.

XSLT processors don't strip insignificant whitespace from source XML, that's input tree builders (MSXML's DOMDocument, XmlDocument and XPathDocument) by default do that. And if you happens to transform XmlDocument, which has been loaded with PreserveWhitespace property set to true or XPathDocument, which has been loaded with XmlSpace.Preserve argument in the costructor call you might be badly surprised. XSLT stylesheet disregarding insignificant whitespace is not a robust one, because it depends in a very fragile way on XSLT processor's environment. Not to mention using other XSLT processors such as Saxon or Xalan.

A loud example of such bad XSLT programming style is usually becomes apparent when using <xsl:apply-templates/> and position() function together. Consider the following XML document:

<root>
    <item>Screwdriver</item>
    <item>Hammer</item>
</root>
Then the following stylesheet:
<stylesheet version="1.0" 
xmlns="http://www.w3.org/1999/XSL/Transform" >
  <template match="item">
    <value-of select="position()"/>:<value-of select="."/>
  </template>
</stylesheet>
will output
1:Screwdriver2:Hammer
in MSXML and .NET in default (whitespace stripping) mode and
    2:Screwdriver
    4:Hammer
in all non-Microsoft processors and in Microsoft processors in whitespace-preserving mode. Beware of that.

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/126

re: RssBandit, XML rendering and from IXml* - Welcome to the real world on January 26, 2004 5:52 PM

TITLE: re: RssBandit, XML rendering and URL: http://weblogs.asp.net/cazzu/archive/2004/01/26/63008.aspx IP: 66.129.67.202 BLOG NAME: IXml* - Welcome to the real world DATE: 01/26/2004 05:52:51 PM Read More

Leave a comment