Calling document("") in .NET

| No Comments | No TrackBacks

There was recently an interesting thread in the microsoft.public.dotnet.xml newsgroup on document("") function call in .NET. A guy was porting some app from using MSXML to .NET. Something didn't work... You know these common bitter (and usually completely lame) complaints:

It is strange, this all works just fine using MSXML4 objects instead of XML.NET I guess between the implementation of MSXML4 and XML.NET they forgot the purpose of the special case document('').
...
W3C spec or not, it is too bad that XML.NET is intrinsically tied to the file system. My program has access neither to write nor read from the file system. I guess I will use MSXML4.
So what's wrong with document("") in .NET comparatively to MSXML?

First, a little disgression on what's document("") call actually means. As per XSLT 1.0 spec:

The URI reference may be relative. The base URI (see [3.2 Base URI]) of the node in the second argument node-set that is first in document order is used as the base URI for resolving the relative URI into an absolute URI. If the second argument is omitted, then it defaults to the node in the stylesheet that contains the expression that includes the call to the document function. Note that a zero-length URI reference is a reference to the document relative to which the URI reference is being resolved; thus document("") refers to the root node of the stylesheet; the tree representation of the stylesheet is exactly the same as if the XML document containing the stylesheet was the initial source document.
So it's about introspection - calling document() function with empty string as the only argument allows XSLT stylesheet to get its own source as XML document. And to process it as any other XML document - query, transform, anything. Extremely useful feature, which leverages the simple fact XSLT stylesheets are merely XML documents. Static lookup tables stored within a stylesheet is one of common usages.

So that guy has MSXML-based application, working in I/O-restricted environment (no access to file system in particular). XSLT stylesheet is given as a string, being loaded to MSXML2.DOMDocument and calls document("") to access a lookup table within its own source. Works fine. Doesn't work in .NET. Why?

The difference between MSXML and System.Xml's XSLT implementation here is that in MSXML XSLT is a function of DOM - XSLT stylesheet always must be explicitly loaded into DOM before calling tranformNode()/transformNodeToObject() or working with Msxml2.XSLTemplate. So XSLT implementation always has in-memory DOM representation of the stylesheet at hands and returns it whenever document("") is called. Simple and effective as almost everything in MSXML. In System.Xml, XSLT and DOM are completely decoupled. It's even officially recommended to avoid using DOM (XmlDocument class) when performing XSL Transformations in .NET. Instead, XslTransform class can be loaded from a variety of sources, such as Stream, TextReader, XmlReader, XPathNavigator or by an URI.

XslTransform loads and compiles XSLT stylesheet into some internal representation, ready to multithreaded transformations. See the difference? There is no in-memory XSLT stylesheet floating around explicitly, so XsltTransform can't return it whenever document("") is called. Instead, in XslTransform document("") isn't treated as any special case and usual URI resolving machinery leads "" to the stylesheet's base URI (as per XSLT spec above) and it gets fetched using that URI as any other document.

In fact, many (if not all DOM-decoupled) XSLT processors behave exactly this way. At least Saxon and Xalan (included by default into Java 2) do. Obviously nobody wants to hold XSLT sources in memory till the run-time just in case there will be a call to document(""). There is no magic here. XSLT is regular compiled language. Can you imagine .exe to be asked for C++ sources? (Well, there is a reflection, but that's completely another matter and not usually the case).

As a matter of interest, XslTarnsform's internal compiled XSLT structure does include source XSLT stylesheet as XPathNavigator. Looks like it only used when document() function resolves relative URIs. Hmm, I wonder if it could be made more thrifty? Oh well, anyway. Looks like they just decided not to expose it. Who can say that's unreasonable decision - not to expose an internal structure?

So, what's the solution? How to avoid I/O when using document("") in .NET? Switching back to MSXML? No way. Here is a simple trick how to reuse in-memory XSLT stylesheet to avoid loading XSLT sources by URI. The idea is to load the stylesheet to an XPathDocument, assign it some unique base URI and resolve that URI in an XmlResolver.
First, small XmlResolver:

public class MyResolver : XmlUrlResolver {
     private XPathNavigator nav;

     public MyResolver(XPathNavigator nav) {
         this.nav = nav;
     }

     public override object GetEntity(Uri absoluteUri, string role, Type 
ofObjectToReturn) {
         if (absoluteUri.Scheme == "my")
             return nav.Clone();
         else
             return base.GetEntity(absoluteUri, role, ofObjectToReturn);
      }
}
And here is the main part:
string xml = "<foo/>";
string xsl = @"
<xsl:stylesheet version=""1.0"" 
xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
 <xsl:template match=""/"">
  <xsl:value-of select=""count(document('')//*)""/>
 </xsl:template>
</xsl:stylesheet>";

XPathDocument doc = new XPathDocument(new StringReader(xml));
XslTransform xslt = new XslTransform();
XPathDocument xslDoc = new XPathDocument(new
     XmlTextReader("my://uri", new StringReader(xsl)));
xslt.Load(xslDoc);

//Runtime - no I/O here
xslt.Transform(doc, null, Console.Out, new
     MyResolver(xslDoc.CreateNavigator()));
It's a bit tricky though and I wonder if there is any cleaner solution?

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/348

Leave a comment