Breadth-first tree traversal in XSLT

| 9 Comments | No TrackBacks

As a matter of interest - how would you implement breadth-first tree traversal in XSLT? Traditional algorithm is based on using a queue and hence isn't particularly suitable here. Probably it's feasible to emulate a queue with temporary trees, but I think that's going to be quite ineffective. Being not procedural, but declarative language XSLT needs different approach. Here is what I came up with:

Let's say we've got the following XML:

<a>
  <b>
    <d/>
    <e/>
  </b>  
  <c>
    <f/>
    <g/>
  </c>
</a>
Easy to see that being traversed in a breadth-first way, the sequence of visited nodes would be a, b, c, d, e, f, g. How?

Strightforward declarative solution is to traverse nodes level by level - just select all nodes at a level i, then i+1 etc. till maximum depth level is reached:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">  
    <xsl:call-template name="BFT">
      <xsl:with-param name="tree" select="/descendant::*"/>
    </xsl:call-template>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="tree" select="/.."/>
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
       select="$tree[count(ancestor::*)=$depth]"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
        <xsl:call-template name="BFT">
          <xsl:with-param name="tree" select="$tree"/>
          <xsl:with-param name="depth" select="$depth + 1"/>       
        </xsl:call-template>
    </xsl:if>    
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">    
    <xsl:value-of select="name()"/>    
  </xsl:template>
</xsl:stylesheet>
Not so effective, but anyway. For each depth level we need to count all ancestors of each node in the list. Looks like worst case running time of the above implementation is O(maxDepth*n*maxDepth) = O(maxDepth2*n), where n is the number of nodes.

Using keys it can be improved to O(maxDepth*n + maxDepth*O(1)) = O(maxDepth*(n+1)), where maxDepth*n is an indexing price and maxDepth*O(1) is running time of retrieving nodes from the index by depth level value (provided keys implementation is based on a hashtable):

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:key name="depthMap" match="*" use="count(ancestor::*)"/>
  <xsl:template match="/">
    <xsl:call-template name="BFT"/>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
      select="key('depthMap', $depth)"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
      <xsl:call-template name="BFT">
        <xsl:with-param name="depth" select="$depth+1"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">
    <xsl:value-of select="name()"/>
  </xsl:template>
</xsl:stylesheet>

O(maxDepth*n) is definitely better than O(maxDepth2*n), but still worse than procedural O(n). In the worst case (deep lean tree) it gets even O(n2). But on average XML trees, which are usually wide and rather shallow, it's close to procedural algorithm's running time.

More ideas?

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/270

9 Comments

Unfortunately, tail recursion is dependent on an per implementation basis and cannot be fully relied on.

As for DVC -- yes it works beautifully, but reqires extra coding and thinking.

Your question was about what people'd find "most natural" with XSLT.

In case memoisation was there, the most natural solution for me would be to use it for the simplest non-recursive (or pseudo-recursive) solution as described in my previous comments.

As for memoisation getting into the XSLT 2.0 spec, chances are extremely low at this late stage (just my opinion!).

Do not forget, however, that there is EXSLT -- I will propose memoisation there.

Cheers,

Dimitre.

I think stack depth can be reduced. At least twice with processing two levels at each call. And plus tail recursion. I think that algorithm is the best one for XSLT 1.0.
And talking about memoisation - it's really fascinating approach, how do you think is there any chance for memoisation to make it to XSLT 2.0 spec?

Yes, a nice one.

As we know, with very deep hierarchies recursion may lead to stack overflow.

This is why I'd prefer a non-recursive method. To avoid the quadratical complexity of calculating count(ancestor::*) I would use SAXON 8 (XSLT 2.0) and memoisation for a "depth" function. The "depth" function will seem to be recursive, however it will just use the pre-calculated value of depth(..)

So, there would be just iteration (xsl:for-each) on all nodes sorted by their depth.


Cheers,

Dimitre.

I like Alexey's solution. What do you think, Dimitre?

Ok, next try
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:call-template name="bft">
<xsl:with-param name="tree" select="*"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="bft">
<xsl:param name="tree"/>
<xsl:for-each select="$tree">
<xsl:value-of select="name(.)"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:if test="$tree/*">
<xsl:call-template name="bft">
<xsl:with-param name="tree" select="$tree/*"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
It seems to me, that running time of that solution is O(n), where n is the number of elements in the document.

Ufff...

Just one line that makes it all clear:

xsl:sort select="count(ancestor::*)"


Cheers,

Dimitre

Hi Oleg,

What about this:

<xsl:stylesheet version="1.0"
xmlns:xsl="">">http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="/">
<t>
<xsl:for-each select="//node()">
<xsl:sort select="count(ancestor::*)" data-type="number"
order="ascending"/>
<xsl:copy>
<xsl:copy-of select="@*"/>
</xsl:copy>
</xsl:for-each>
</t>
</xsl:template>
</xsl:stylesheet>


Optimisation not considered (yet).

Cheers,

Dimitre.

Hehe, nice try, but I doubt it's breadth-first traversal. Try it on a bit deeper tree:
<a>
<b>
<d>
<h/>
<i/>
</d>
<e>
<k/>
<l/>
</e>
</b>
<c>
<f>
<m/>
<n/>
</f>
<g>
<o/>
<p/>
</g>
</c>
</a>

The result is a b c d e h i k l f g m n o p

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:call-template name="bft">
<xsl:with-param name="tree" select="*"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="bft">
<xsl:param name="tree"/>
<xsl:for-each select="$tree">
<xsl:value-of select="name(.)"/><xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:for-each select="$tree">
<xsl:call-template name="bft">
<xsl:with-param name="tree" select="*"/>
</xsl:call-template>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Leave a comment