Have you noted this thread in microsoft.public.dotnet.xml newsgroup? A guy was trying to get list of unique values from XML document of 46000 records. Using Muenchian grouping method. For MSXML4 it took 20 seconds, while in .NET 1.0 and 1.1 it effectively hung.
Well, as all we know Muenchian method works deadly slowly in .NET unfortunately. MSXML4 optimizes generate-id($node1) = generate-id($node2) expression by making direct comparison of nodes instead of generating and comparing ids. .NET implementation isn't so sophisticated. Emerging .NET 1.1 sp1 is going to make it faster, but what's today's solution?
Enter EXSLT.NET's set:distinct() extension function. Using it the result was:
695 unique keys generated from about 46000 records in less than 2 seconds.Now that's really amazing. Ten times faster than MSXML4! And much more understandable - just compare these expressions:
set:distinct(atl_loads/atl_load/client_key)>and
atl_loads/atl_load/client_key[generate-id(.) = generate-id(key('client_key_lkp',client_key)[1])]
Special kudos to Dimitre Novatchev for optimizing EXSLT.NET set functions.
Leave a comment