id() function and XML Schema

| 1 Comment | No TrackBacks
According to XPath data model an element node may have a unique identifier (ID), which can be used then to select a node by its ID using XPath's id() function and to navigate using XPathNavigator.MoveToId method. Querying by ID is extremely effective becuse in fact it doesn't require traversing the XML document, instead almost every XPath implementation I've ever seen just keeps internal hashtable of IDs, hence querying by ID is merely a matter of getting a value from a hashtable by a key.

XPath 1.0 Recommendation published back in 1999 of course says nothing about XML Schema, which was published in year 2001. May be that's the reason why XmlDocument and XPathDocument (and therefore XslTransform) classes in .NET don't support above tasty functionality when XML document is defined using XML Schema. Only DTD is supported unfortunately. Even if you have defined xs:ID typed attribute in your schema and validated document reading it via XmlValidatingReader it won't work. As a matter of fact it does work in MSXML4 though.

Whether it's right or wrong - I have no idea, it's quite debatable question. On the one hand XPath spec explicitly says "If a document does not have a DTD, then no element in the document will have a unique ID.". On the other hand XML Schema was published 2 years after XPath 1.0 and provides semantically the same functionality as DTD does, so XPath 2.0 is now deeply integrated with XML Schema. And it works in MSXML4... I'm wondering what people think about it?

Anyway, here is another act of hackery: how to force XmlDocument and XPathDocument classes to turn on id() and XPathNavigator.MoveToId support when document is validated against XML Schema and not DTD.
Apparently XmlValidatingReader collects ID information anyway, but it's being asked for this collection only when XmlDocument/XPathDocument encounter DocumentType node in XML. So let's give them this node, I mean let's emulate it. Here is the code:

public class IdAssuredValidatingReader : XmlValidatingReader {
    private bool _exposeDummyDoctype;
    private bool _isInProlog = true;
       
    public IdAssuredValidatingReader(XmlReader r) : base (r) {}
    
    public override XmlNodeType NodeType {
        get { 
            return _exposeDummyDoctype ?
                XmlNodeType.DocumentType :
                base.NodeType; 
        }            
    }
    
    public override bool MoveToNextAttribute() {
        return _exposeDummyDoctype?
            false :
            base.MoveToNextAttribute();
    }
    
    public override bool Read() {
        if (_isInProlog) {
            if (!_exposeDummyDoctype) {
                //We are looking for the very first element
                bool baseRead = base.Read();
                if (base.NodeType == XmlNodeType.Element) {
                    _exposeDummyDoctype = true;  
                    return true;
                } else {
                    return baseRead;
                }
            } else {
                //Done, switch back to normal flow
                _exposeDummyDoctype = false;
                _isInProlog = false;
                return true;
            }
        } else
            return base.Read();
    }
}
And proof of concept:
source.xml
<root 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="D:\Untitled1.xsd">
    <file id="F001" title="abc" size="123"/>
    <file id="F002" title="xyz" size="789"/>
    <notification id="PINK" title="Pink Flowers"/>
</root>
In Untitled1.xsd schema (elided for clarity) id attributes are declared as xs:ID.
The usage:
public class Test {
    static void Main(string[] args) {
        XmlValidatingReader vr = 
            new IdAssuredValidatingReader(
            new XmlTextReader("source.xml"));
        vr.ValidationType = ValidationType.Schema;
        vr.EntityHandling = EntityHandling.ExpandEntities;
        XmlDocument doc = new XmlDocument();
        doc.Load(vr);
        Console.WriteLine(
            doc.SelectSingleNode("id('PINK')/@title").Value);
    }
} 
Another one:
public class Test {
    static void Main(string[] args) {
        XmlValidatingReader vr = 
            new IdAssuredValidatingReader(
            new XmlTextReader("source.xml"));
        vr.ValidationType = ValidationType.Schema;
        vr.EntityHandling = EntityHandling.ExpandEntities;
        XPathDocument doc = new XPathDocument(vr);
        XPathNavigator nav = doc.CreateNavigator();
        XPathNodeIterator ni = nav.Select("id('PINK')/@title");
        if (ni.MoveNext())
            Console.WriteLine(ni.Current.Value);
    }
}
In both cases the result is "Pink Flowers".

I'm not sure which semantics this hack breaks. The only deficiency I see is that the dummy emulated DocumentType node becomes actually visible in resulting XmlDocument (XPathDocument is not affected because XPath data model knows nothing about DocumentType node type).

Any comments?

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/62

1 Comment

Oleg, thanks for your work on this, it is exactly what I need for my current project. I know it's been a long time since you wrote this. However, I cannot get this to work. Execution of the XPath statement using id() function always returns null. Any assistance you could offer would be greatly appreciated.

Thanks!

Jason

Leave a comment