September 2004 Archives

Well, I was talking about it a lot and finally decided to stop rambling and start doing. Here is my new toy: XQP project. XQP stands for XML Query Processor of course. It's going (if my karma is good enough) to be free open-source XPath2.0/XQuery1.0/XSLT2.0 engine for the .NET platform. SourceForge team kindly approved the project and now we have everything to deliver a killer application!

The main idea behind XQP is to develop a single core runtime engine based on XPath2.0/XQuery1.0 algebra and then to provide XPath 2.0, XQuery 1.0 and XSLT 2.0 compilers for that engine. It's not something brand new of course. Saxon implements both XSLT 2.0, XPath 1.0 and XQuery 1.0 with a single engine. Microsoft went even further ineventing Common Query Runtime (CQR) and common intermediate format (QIL) and implementing its bits in Whidbey as System.Xml.Query. I believe that's a mainstream design pattern, very obvious considering the XPath/XQuery intimacy and XQuery/XSLT functional overlap.

There are some issues I'd like to be clear.

What for? Why not to wait .NET 2.0?
Well, it was announced that System.Xml v2.0 won't support neither XSLT 2.0 nor XPath 2.0. Being XSLT and .NET aficionado I can't imagine the situation when Java has XSLT 2.0 and .NET doesn't. If Microsoft can't deliver it - we can do it.

What's wrong with Saxon.NET?
Well, nothing wrong, cool project. I just don't believe in effective porting of any big system. Saxon is too tied to Java. And after all porting is so boooooring, while I enjoy to develop :)

Why new project?
I considered starting XQP as part of Mvp-Xml project or even Mono, but realized that due to experimental nature of the project I want to rule the it by myself.

Isn't it too huge project? Well, I'm not afraid of it. I was participating in developing two XSLT 1.0 processors (a commercial one and an open-source one) and being Apache committer I was watching closely how the Xalan is cooking. There is nothing scary in implementing XPath or XSLT, all standard techniques like building optimizing compilers apply, after all it's just another programming language to implement, probably the most interesting task for a programmer.

Needless to say, everybody interested is invited to participate. We are currently in the team gathering and initial planning stage. And by the way, we are receiveing donations. If you can't help us developing, but want to support the project - you can donate some money to the XQP project.

XInclude.NET progress

| No Comments | No TrackBacks |

Well, XInclude.NET workspace at GotDotNet seems to be severely broken. I've sent a solid dozen of requests to fix it and now they even don't answer. Ok, moving the project to sf.net, specifically to the Mvp-Xml project. I'm adding XInclude.NET v1.2 sources to the CVS right now. After some setup I will be able finally to deliver new release, aligned with April 2004 XInclude CR. Stay tuned. And after that I'm going to pack nxslt.exe with latest EXSLT.NET and XInclude.NET and release it too (there are some minor new features too).

Saxon 8.1 and grouping in XQuery

| No Comments | No TrackBacks |

Cafe con Leche XML News:

Michael Kay has released Saxon 8.1, an implementation of XSLT 2.0, XPath 2.0, and XQuery in Java. Saxon 8.1 is published in two versions for both of which Java 1.4 is required. Saxon 8.1B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.1SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."
Hmmm, grouping for XQuery... Here is how it looks like in Saxon-SA:
declare namespace f="f.uri";

(: Test saxon:for-each-group extension function :)

declare function f:get-country ($c) { $c/@country };

declare function f:put-country ($group) {
    <country name="{$group[1]/@country}" 
        leading="{$group[1]/@name}" size="{count($group)}">
       {for $g in $group 
           order by $g/@name
           return <city>{ $g/@name }</city>
       }
    </country>
}; 

<out>
    {saxon:for-each-group(/*/city, 
         saxon:function('f:get-country', 1), 
         saxon:function('f:put-country', 1))}
</out>
Looks a bit convolute for me. More info here.

MSDN2

| No Comments | No TrackBacks |

Tim Ewald shares some info on the MSDN2. Now that's soooo coooool! I think that's the best thing could happen with MSDN. And now I just don't believe they let Tim to leave MSFT!

Beware of aggresive news aggregators

| 2 Comments | No TrackBacks |

Dare writes on "News Aggregators As Denial of Service Clients":

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home. Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.
Well, I share Dare's thoughts. Happily my blog is already hosted on Windows Server 2003 and my favorite blog engine uses statically generated HTML/XML pages for everything including comments, so conditional GET saves me from rude aggresive news aggregators fetching comments for every post I've made last month every 30 min. I'd avoid using Newzcravler and Sauce Reader news aggregators untill they stop being evil.

Wesner Moise on Enums and Performance

| No Comments | No TrackBacks |

Wesner Moise (.NET Undocumented) writes on enums perf in .NET.

While enums are value types and are often recognized and treated like standard integral values by the runtime (in IL, enums and integers have almost no distinction), there are few performance caveats to using them.

Enumerated types are derived from ValueType and Enum (as well as Object), which are, ironically, reference types. An explicit conversion of an enum value to ValueType, will actually perform boxing and generate an object reference.

Any calls to an inherited method from any of those classes will also actually invoke boxing, prior to calling the base method. This includes the following methods: GetType(), ToString(), GetHashCode() and Equals(). In addition the costs of mplicit boxing is the far larger costs of reflection used to actually complete the said methods.
That's obvious, but this is not really:
ToString uses reflection, the first time it is called, to dynamically retrieve enumeration constants from the enumerated type and stores those values into a hash table for future use. However, GetHashCode always uses reflection to retrieve the underlying value. While ValueType.Equals will attempt to do a fast bit check, when a valuetype with no reference methods, such as is the case for enumerated types, it won't be faster than a direct compare.

This is true for any value type, but normally the cost can be eliminated for ToString, GetHashCode, and Equals, by simply overriding those methods and avoiding calls to the base methods. However, those methods CANNOT be overridden for enumerated types.
And this is sad:
Another ironic conclusion is that creating your own version of an enumerated type, not derived from Enum, is going to be faster than the CLR versions, because you can ensure that GetHashCode, Equals, ToString, IComparable, and IComparable<T> are not inherited from any of base classes such as ValueType.
Now what? Back to Java "enums"?

As it turned out unfortunatley I introduced nasty bug into date:day-name(), date:day-abbreviation() and date:month-abbreviation() functions while testing EXSLT.NET 1.1 before the release. Saturday and December never appeared :( Thanks to Chris Bayes for prompt bug report. Hence - EXSLT.NET 1.1.1 release. Please update.

Ken North:

Author Elliotte Rusty Harold talks about the significance of JDK 1.5 and whether Java should be open source an/or an international standard. He also discusses the state of XML, and we coaxed him into describing his recent books about XML (Effective XML, XML Bible 1.1).

Streaming video (running time 7:01)
http://www.webservicessummit.com/People/EHarold.htm

mvp-xml-help mail list created

| No Comments | No TrackBacks |

I have just created first public mail list for the Mvp-Xml project - mvp-xml-help mail list.

mvp-xml-help list is general discussion list for all users of the Mvp-Xml project.

The allowed topics on this list are:

  • Asking for help or helping others on using Mvp-Xml libraries.
  • General announces related to the Mvp-Xml project.
  • Suggestions, comments and other feedback related to the Mvp-Xml project.
No spam or offtopics are allowed.

Everybody interested are invited to subscribe.

SQLSummit.com published 15-minute video interview with Michael Rys on "SQL Server 2005: Integrating SQL, XML, and XQuery" - http://www.sqlsummit.com/People/MRys.htm.

"Michael discusses SQL Server 2005 support for XQuery, SQL/XML and the SQL:2003 standard. He discusses b-tree, quadtree, and r-tree indexes and pluggable and selectable indexing techniques for XML documents. He also comments about the evolution of XQuery."
Talking about Microsoft dialect - guess what's the first Michael's word in the interview? :)

[Via Ken North, the editor of SQLSummit.com]

Zoological Mythology and Cryptozoology

| 1 Comment | No TrackBacks |

Just found - a collection of public domain ebooks on Zoological Mythology and Cryptozoology - http://www.herper.com/ebooks/titles.html. Free download, lots of old lithographs. Amonst:
"Curious Creatures in Zoology", NY 1890;
"Mythical Monsters", London, 1886;
"Natural History Legend and Lore", London, 1895;
"Un-Natural History, or Myths of Ancient Science", Edinburgh, 1886.

Just interesting. And priceless for those looking for cool project names :)

GotDotNet woes

| 4 Comments | No TrackBacks |

So I'm in a critic mood today... I recently found out that XInclude.NET workspace is down for at least a month. Not surprisingly the feedback on the "Combining XML Documents with XInclude" article was so low - all the article links to XInclude.NET: homepage, message board and bug tracker are dead. Moreover and what's worse I can't get access to the XInclude.NET source code for at least a week! Holy cow! Looks like I was too optimistic about GotDotNet Workspaces.

Needless to say I decided to move XInclude.NET project out of there. It needs more reliable home, sorry. It's official now - XInclude.NET will be incorporated into Mvp-Xml project at SourceForge. It's a real pain to move stable project, but I have no choice.

Docbook XSL stylesheets v1.66.0 released

| No Comments | No TrackBacks |

Docbook XSL stylesheets v1.66.0 has been released yesterday. It's a huge (9Mb) collection of XSLT stylesheets for transforming Docbook documents into HTML, XHTML, XSL-FO (PDF), HTML Help and Java Help. They are well designed by XSLT experts such as Norman Walsh and extremely well tested by huge and diverse Docbook community. You know what I mean? I hope Microsoft testers responsible for the System.Xml will finally try to test .NET XSLT implementation against Docbook stylesheets before they ship - it's a shame that only after .NET 1.1 SP1 XslTransform stopped to barf on Docbook HTML stylesheets (and it's still unable to compile Docbook XSL-FO stylesheets... ouch, is it 2004 or 1999?).

Happy New Year! Shana tova umetuka!

| 5 Comments | No TrackBacks |

Today it's Rosh HaShanah holiday in Israel - the Jewish New Year. The new 5765 year starts on the sunset. As a matter of interest, in Hebrew years are written in letters, not digits, e.g. new 5765 year is written as תשס״ה. It's not really that Israel lives accordng to this calendar nowadays, it's more a matter of tradition, but it's national holiday (actually solid couple of weeks of holidays), so happy new 5765 year to everybody! Shana tova umetuka!

Here is nice comic picture I got (it says "Happy New Jewish Year" in Russian):

Well, GotDotNet seems to be down sometimes :). Just in case here is alternative download location for the EXSLT.NET 1.1: http://www.xmland.net/exslt/EXSLT.NET-1.1.zip.

EXSLT.NET 1.1 released

| 10 Comments | 3 TrackBacks |

Here we go again - I'm pleased to announce EXSLT.NET 1.1 release. It's ready for download. The blurb goes here:

EXSLT.NET library is community-developed free open-source implementation of the EXSLT extensions to XSLT for the .NET platform. EXSLT.NET fully implements the following EXSLT modules: Dates and Times, Common, Math, Random, Regular Expressions, Sets and Strings. In addition EXSLT.NET library provides proprietary set of useful extension functions.

Download EXSLT.NET 1.1 at the EXSLT.NET Workspace home - http://workspaces.gotdotnet.com/exslt
EXSLT.NET online documentation - http://www.xmland.net/exslt

EXSLT.NET Features:

  • 65 supported EXSLT extension functions
  • 13 proprietary extension functions
  • Support for XSLT multiple output via exsl:document extension element
  • Can be used not only in XSLT, but also in XPath-only environment
  • Thoroughly optimized for speed implementation of set functions

Here is what's new in this release:

  • New EXSLT extension functions has been implemented: str:encode-uri(), str:decode-uri(), random:random-sequence().
  • New EXSLT.NET extension functions has been implemented: dyn2:evaluate(), which allows to evaluate a string as an XPath expression, date2:day-name(), date2:day-abbreviation(), date2:month-name() and date2:month-abbreviation() - these functions are culture-aware versions of the appropriate EXSLT functions.
  • Support for time zone in date-time functions has  been implemented.
  • Multithreading issue with ExsltTransform class has been fixed. Now ExsltTransform class is thread-safe for Transform() method calls just like the  System.Xml.Xsl.XslTransform class.
  • Lots of minor bugs has been fixed. See EXSLT.NET bug tracker for more info.
  • We switched to Visual Studio .NET 2003, so building of the project has been greatly simplified.
  • Complete suite of NUnit tests for each extension function has been implemented (ExsltTest project).

Any comments and bug reports are welcome!

PS. Well, despite Dimitre's and my light side's objections I implemented dyn2:evaluate(). I know, I'm evil...

Nice one

| No Comments | No TrackBacks |

From "Fallacies of Validation, version #3" by Roger L. Costello:

5. Fallacy of a Universal Validation Language

Dave Pawson identified this fallacy. He noted that the Atom specification
cannot be validated using a single technology:

> From [Atom, version] 0.3 onwards it's not been possible
> to validate an instance against a single schema, not
> even Relax NG. They need a mix of Schema and 'other'
> processing before being given a clean bill of health.

Aaron Skonnard about his The XML Files column of the MSDN Magazine:

This pretty much says it all. In the beginning, my column focused almost exclusively on core XML topics such as XML namespaces, XPath, XSLT, MSXML, System.Xml, etc. Over the past few years, my focus has naturally shifted away from these topics towards emerging SO and Web services concepts. It's been a natural evolution, indicative of my work and interests. Hence, the new name is appropriate. Dare's XML Developer Center is where you should look for continued coverage on core XML topics and System.Xml.

Although I'm sad to let go of The XML Files, I'm excited about manning the Service Station.

EXSLT.NET progress

| 2 Comments | No TrackBacks |

Lots of activity in the EXSLT.NET project recently. We implemented more functions such as random:random-sequence(), str:encode-uri() and str:decode-uri(). Lots of bugs have been fixed. Support for time zone in date-time functions has been implemented. We switched to Visual Studio .NET 2003 so simplifying our custom build process. Currently I'm writing unit tests for each function (and we have something about 80 of functions already!). After I finish it up I'm going to update documentation and release EXSLT.NET 1.2, so stay tuned.

Btw, I was thinking about adding some simple function for dynamic XPath evaluation. Of course we have no chance to implement dyn:evaluate() as extension function in .NET, but we could provide some simplified proprietary version, e.g.

object dyn2:evaluate(node-set, string)
where first argument is context node and second one is XPath expression (as string) to evaluate. This would allow to build and evaluate XPath expressions on the fly, the feature XSLT 1.0 doesn't and XSLT 2.0 won't support. It's gonna be limited of course - no variables, no keys etc., but anyway. Would you like to have such function in EXSLT.NET?

PS. I know, it smells provocatively, but it should attract more users to EXSLT.NET library.

Interesting post by Michael Kay on detecting cycles in graphs using XSLT and XQuery:

> I have XML data in the form of a graph (nodes, edges) and I
> need to check if
> any cycles exist in the data before I join the data together
> in one XML file.
>
> Can anyone point me to any resources to do this? Has anyone
> already done this in XQuery?
>

There is an example of how to do this in my book XSLT 2.0 Programmer's Reference, and the example translates directly into XQuery.

If you don't want to buy the book, the code (together with a "main program" that invokes it to look for cycles among the attribute sets in a stylesheet) is here:
Take a look at the stylesheet here. And now even more intriguing:
The book also shows how to generalize this so the code that looks for cycles is independent of the way that the nodes and arcs are implemented. Unfortunately this generalization relies on Dimitre Novatchev's technique for simulating higher-order functions, which is dependent on XSLT and won't work in XQuery.
Wow, I can't wait for the book to arrive. That's going to be my next one in reading queue, out of all priorities.

MovableType 3.1 is out

| 2 Comments | No TrackBacks |

New and long awaited release of the MovableType blogging engine has been announced. New features of MT 3.1 include:

  • Dynamic pages - now it's possible to switch between generation of static pages and dynamic generation. Well, I'm going to stay with static pages anyway.
  • Subcategories
  • Post scheduling
  • Improved extensibility
  • Plugin pack, including of course MTBlacklist (a killer plugin, allowing to control comment spam easily)
Free version allows only 1 author and 3 weblogs.

Well, I'm not sure really if I want to upgrade. I'm quite happy with my MT 2.66 + MTBlacklist installation.