Tokenizing in XSLT

| 3 Comments | No TrackBacks

Kirk Allen Evans has posted a recursive XSLT template to transform CSV into XML. Being low-level substring functions based it's obviously quite verbose and convolute, what was fairly enough pointed out by Dare. He has provided 10-lines C# version also.

What I wanted to add to this subject is that such example perfectly illustrates how radically EXSLT extensions may improve XSLT 1.0 coding. (btw, Dare is working on the implementation of EXSLT functions for .NET and I believe it would be great addition to .NET XSLT programming practice). Look yourself: here is EXSLT version, which makes use of str:tokenize extension function (note, even smaller than Dare's C# one):

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:str="http://exslt.org/strings" exclude-result-prefixes="str">
    <xsl:include href="d:/xsl/str.tokenize.msxsl.xsl"/>
    <xsl:template match="root">
        <root>
            <xsl:for-each select="str:tokenize(.,'&#xA;')">
                <row>
                    <xsl:for-each select="str:tokenize(.,',')">
                        <elem><xsl:value-of select="."/></elem>
                    </xsl:for-each>
                </row>
            </xsl:for-each>
        </root>
    </xsl:template>
</xsl:stylesheet>

So, XSLT perfectly able to handle this, it just needs tokenizing facility, like C# has and what for producing XML - IMO XSLT is the best hammer on the market. I agree though that for pure CSV2XML conversion XSLT may be not a right tool, if it was my project, I'd make use of SAX filter or something like Chris Lovett's XmlCsvReader.

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/17

3 Comments

Or alternatively you could use a native binary written in c++ which is much faster, especially for large files. Like the open source one at http://csv2xml.sourceforge.net/

I know, Dimitre. FXSL rocks, keep doing your excellent work.

The "str-split-to-words" from FXSL is more flexible as it allows *a set* of possible delimiters to be specified -- e.g. " ,/\ "

A variety of problems have been solved using this functional tokenizer -- just have a look in xsl-list.

=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

Leave a comment