On grouping in XSLT and EXSLT

| 3 Comments | No TrackBacks

Everybody knows grouping in XSLT is kinda advanced topic. Muenchian method is just a nightmare for XSLT newbies and XSLT-related newsgroups are full of help-me-to-group-in-xsl postings. Well, and I and fellows do answer such questions day afer day. Should admit that's a way booooring. Now I wonder why we don't use EXSLT to simplify grouping technique so even newsbies can grasp it quickly? I'm talking about set:distinct function, which can replace the dreadful and mysterious generate-id()=generate-id(key('theKey', foo)[1]) step in Muenchian method.

Here is a common grouping sample along with both classical solution (pure Muenchian method) and improved one (EXSLT-based). So compare and say which is more understandable.

Source XML, list of cities.

<doc>
    <city name="Paris" country="France"/>
    <city name="Madrid" country="Spain"/>
    <city name="Vienna" country="Austria"/>
    <city name="Barcelona" country="Spain"/>
    <city name="Salzburg" country="Austria"/>
    <city name="Bonn" country="Germany"/>
    <city name="Lyon" country="France"/>
    <city name="Hannover" country="Germany"/>
    <city name="Calais" country="France"/>
    <city name="Berlin" country="Germany"/>
</doc>
The task is to group them by countries:
<doc>
    <country name="France">
        <city>Paris</city>
        <city>Lyon</city>
        <city>Calais</city>
    </country>
    <country name="Spain">
        <city>Madrid</city>
        <city>Barcelona</city>
    </country>
    <country name="Austria">
        <city>Vienna</city>
        <city>Salzburg</city>
    </country>
    <country name="Germany">
        <city>Bonn</city>
        <city>Hannover</city>
        <city>Berlin</city>
    </country>
</doc>

Solution #1, classical Muenchian method:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kCountry" match="city" use="@country"/>
    <xsl:template match="doc">
        <doc>
            <xsl:for-each 
select="city[generate-id()=generate-id(key('kCountry', @country)[1])]">
                <country name="{@country}">
                    <xsl:apply-templates select="key('kCountry', @country)"/>
                </country>
            </xsl:for-each>
        </doc>
    </xsl:template>
    <xsl:template match="city">
        <city><xsl:value-of select="@name"/></city>
    </xsl:template>
</xsl:stylesheet>

Solution #2, EXSLT based one:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:set="http://exslt.org/sets" exclude-result-prefixes="set">
    <xsl:key name="kCountry" match="city" use="@country"/>
    <xsl:template match="doc">
        <doc>
            <xsl:for-each select="set:distinct(city/@country)">
                <country name="{.}">
                    <xsl:apply-templates select="key('kCountry', .)"/>
                </country>
            </xsl:for-each>
        </doc>
    </xsl:template>
    <xsl:template match="city">
        <city><xsl:value-of select="@name"/></city>
    </xsl:template>
</xsl:stylesheet>

Both stylesheets are almost the same except bolded parts. My measurements (using nxslt.exe with -t option) say it takes the same time to execute both stylesheets and frankly I don't see why it could be different. But set:distinct(city/@country) and city[generate-id()=generate-id(key('kCountry', @country)[1])] do differ, don't they?
Well, the only obvious contra is that Muenchian method is portable as pure XSLT, while EXSLT based method relies on optional EXSLT implementation.

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/111

3 Comments

So EXSLT has at least a year ahead. Still I believe its functions can make programming in XSLT 1.0 much more trivial and sometimes even more effective. E.g. using Muenchian method with not toy documents in .NET is just a disaster. But using set:distinct instead of comparing of generated ids makes it at least reasonably fast. More precise measurements needed though to prove it.

I too believe EXSLT has run its course as informing the XSLT 2.0 effort, though potentially there is a wider role for EXSLT as a standard method of defining functions generally for all programming languages...though I struggle seeing this happening.

simply said the muenchian method is not needed in XSLT 2.0, I think its safe to say that it will become a recc sometime in 2004.

> Both stylesheets are almost the same except
> bolded parts. My measurements (using nxslt.exe
> with -t option) say it takes the same time to
> execute both stylesheets and frankly I don't
> see why it could be different.

It is very different if grouping is done repetitively (more than once) within the same transformation. In this case using the Muenchian method wins, because the index for the xsl:key is constructed only once, while each invocation of set:distinct() is independent of the previous invocations and repeats again the same total amount of work.

Another reason people are not using EXSLT massively is that XSLT 2.0 will soon be a recommendation and there is *very* little use of EXSLT with XSLT 2.0.

Implementing the EXSLT functions is a good exercise, but it would not be realistic to expect that this function library will have any significance in the nearest future.

Just my 2c.

=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

Leave a comment