July 2004 Archives

I just got several instances of what I believe is another resourceful form of blog comment spam. It looked like an ordinar spam, somehow making it through MT-Blacklist system I've got installed and after "Name: free government grants" I was aready clicking on "De-spam using MT-Blacklist" link, but then I realized the domain name to be banned is "journalism.nyu.edu". Hmmm, free government grants on nyu.edu site???? Wait a minute!

And yes, that wasn't a joke. That linked page at journalism.nyu.edu is a very serious political blog rant with lots of comments and obviously "free government grants" comment among them! So here is how I think it works: they post an evil spam comment to a trustworthy blog B. Then if it doesn't get cleaned soon, chances are high that it will be staying in archives for a long time, so they start to spread more evil spam comments linking to the infected page at the blog B.

The bad thing is that banning such spam you have to ban (trustworthy) site B, which can be actually even your friend's site. Ergo: clean your blogs guys, don't keep spam comments in archives.

This is small trick for newbies looking for a way to get URI of a source XML and the stylesheet from within XSLT stylesheet.

As a matter of interest - how would you implement breadth-first tree traversal in XSLT? Traditional algorithm is based on using a queue and hence isn't particularly suitable here. Probably it's feasible to emulate a queue with temporary trees, but I think that's going to be quite ineffective. Being not procedural, but declarative language XSLT needs different approach. Here is what I came up with:

SgmlReader and namespaces

| No Comments | No TrackBacks |

It's obvious, but I didn't realize that till recently - Chris Lovett's SgmlReader doesn't supprot namespaces. Why? SgmlReader is SGML reader in the first place and you know, there is no namespaces in SGML. So whenever you want to cheat and process malformed XML with SgmlReader - beware of namespaces.

Sometimes some of us want to narrow encoding of an output XML document, while to preserve data fidelity. E.g. you transform some XML source with arbitrary Unicode text into another format and you need the resulting XML document to be ASCII encoded (don't ask me why). Here is fast and simple solution for that problem.

Justification of XHTML

| No Comments | No TrackBacks |

W3C has published "HTML and XHTML FAQ" document. "Why is XHTML needed? Isn't HTML good enough?", "What are the advantages of using XHTML rather than HTML?. Rather interesting refresh WRT to recent discussion in xml-dev list.

Small but cool

| No Comments | No TrackBacks |

Isn't it cool to have a small personal page at microsoft.com? :)

Every MVP got such one recently. Here is mine (aka http://aspnet2.com/mvp.ashx?olegt). And here is the XML MVPs gang.

XML Schema 1.1, First Working Draft

| No Comments | No TrackBacks |

Oh boy!

2004-07-19: The XML Schema Working Group has released the First Public Working Draft of XML Schema 1.1 in two parts: Part 1: Structures and Part 2: Datatypes. The drafts include change logs from the XML Schema 1.0 language and are based on version 1.1 requirements. XML schemas define shared markup vocabularies, the structure of XML documents which use those vocabularies, and provide hooks to associate semantics with them.

Main goals are to simplify the language and to add support for versioning. Read comprehensive review by Elliotte Rusty Harold at cafeconleche.org.

Isn't it cool:

A visitor to your weblog Signs on the Sand has automatically been banned by posting more than the allowed number of comments in the last 200 seconds. This has been done to prevent a malicious script from overwhelming your weblog with comments. The banned IP address is

67.30.130.142

If this was a mistake, you can unblock the IP address and allow the visitor to post again by logging in to your Movable Type installation, going to Weblog Config - IP Banning, and deleting the IP address 67.30.130.142 from the list of banned addresses.
--
Powered by Movable Type Version 2.661
http://www.movabletype.org/

For sure it's a must for any blogging engine nowadays.

USPTO did it again. Fun is going on. Now Oracle has been granted a patent on CMS. Patent 6,745,238 says:

The web site system permits a site administrator to construct the overall structure, design and style of the web site. This allows for a comprehensive design as well as a common look and feel for the web site. The web site system permits content for the web site to originate from multiple content contributors. The publication of content is controlled by content owners. This permits assignment of content control to those persons familiar with the content.

Is it sane actually?

SchemaCOP is coming?

| No Comments | No TrackBacks |

Gudge writes:

On my team we have a bunch of guidelines for writing XML Schema documents. For a while we've been checking schema against the guidelines. Unfortunately the implementation of the checker was in wetware, rather than software. Recently, I found an hour or two to put together a software implementation of a SchemaCOP which, given a schema will dump out a report telling you where you've stepped outside the guidelines.
That would be very useful tool, really. I'm looking forward to see it.

And this is even more cool:

One of the satisfying pieces of writing the code was that I was able to do it all in XSLT. I love this language, it makes hard things easy ( and easy things hard :-) )
I tend to agree with the last assertion. I think knowing XSLT well means first of all having a gut feeling of these easy2hard spots and avoiding them at the design stage. As in any other language after all.

This is an interesting one:

The XML Schema Working Group has released a revised Working Draft of XML Schema: Component Designators. The document defines a scheme for identifying the XML Schema components specified by the XML Schema Recommendation Part 1 and Part 2.

The idea is to be able to address components of an XML Schema, just as we can address parts of an XML document by XPath or XPointer. An absolute schema component designator syntactically is an URI, whose main part is an URI of a schema document and fragment identifier is XPointer pointer conforming to the new proposed xscd() XPointer scheme. The syntax is obviously XPath-like.

Potential addressable XML Schema components are:
{type definitions}
{attribute declarations}
{element declarations}
{attribute group definitions}
{model group definitions}
{notation declarations}
{identity constraint definitions}
{facets}
{fundamental facets}
{member type definitions}
{attribute uses}
{particles}
{annotations}
etc.

Examples:
schema-URI#xscd(/type(purchaseOrderType))
schema-URI#xscd(/type(Items)/item/productName)
or even schema-URI#xscd(/type(Items)/item/quantity/type()/facet(maxExclusive)).

Good idea, isn't it? Obviously the core question is - why not just use XPath, the schema is just XML document after all? Actually looks like they are uncomparable things. AFAIK it's also one of first (after XInclude of course) real applications of XPointer.

Ok, this is not a new one, but just for those who somehow missed it (just like me).
A cool puzzle to solve: { First 10 digit prime in consecutive digits of e }.com

How much time does it take for you to crack it? My full time is about an hour (I'm not so good on sequences apparently).

PS. Try not to google for hints.
PPS. Please no spoilers in comments.

Antenna House released first lite version of their famous XSL Formatter (XSL-FO to PDF). It's much more cheaper than full version (only $300 for Windows version), but has a bit annoying (at least for me) limitations:

Total page number of the formatted pages are limited to 300. The watermark that shows the limited version is displayed on the back ground and the URL of our Website is displayed at the bottom of the pages which exceed 300.
Arabic, Hebrew and Thai are not supported. The formatted result is not correct.
The auto layout of the table is not supported. table-layout="auto" is invalid.
Anyway, free evaluation version, support for .NET - not bad.

Tricky XSLT optimization

| 3 Comments | No TrackBacks |

Rick Jelliffe writes:

Perhaps some tricky implementation of XSLT could figure out if a stylesheet is streamable and switch to a streaming strategy.
That would be rather effective optimization indeed. But how that could be implemented in XSLT/XQuery processor? Obviously full-blown stylesheet analysis would be feasible only having schema information available (that means XSLT 2.0/XQuery 1.0), but even without it it's still easy to detect some common streaming-friendly cases, such as:

1. Renaming elements or changing namespaces, e.g.:
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="foo">
    <bar>
      <xsl:apply-templates select="@*|node()"/>
    </bar>
  </xsl:template>
</xsl:stylesheet>
It's easy to see that the stylesheet has identity transformation and a template for "foo" element, which actually replaces "foo" witrh "bar". Above is detectable and could be done more effective with XmlReader or XmlReader/XmlWriter pipeline.

2. Translating attributes to elements or similar, e.g.
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="foo">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:element name="{name()}">
          <xsl:value-of select="."/>
        </xsl:element>
      </xsl:for-each>
      <xsl:apply-templates select="node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>
Also that's detectable what above stylesheet is doing and is implemenatable with only XmlReader or XmlReader/XmlWriter internally instead.

3. Pretty-printing using XSLT - frequent case, easily detectable - an ideal candidate for optimization. Just stream input through XmlTextWriter internally.

4. Adding root element or adding header/footer - ditto.

5. Changing PIs in the prolog (<?xml-stylesheet>).

6. What else?

Obviously to gain something with all above implemented XSLT processor should be given plain Stream/TextReader/XmlReader as input, not any already-in-memory XML store.

VSIP SDK 2005 Beta 1 released

| No Comments | 2 TrackBacks |

Oh boy, what a month. Here is another juicy release I wish I had any free time to dig in: VSIP SDK 2005 Beta 1.

Visual Studio 2005 Beta1 is available for MSDN subscribers. And as ordinar ISO CD images, not 2.7Gb bundle. Let's make some good traffic today!

Tired of spam

| 2 Comments | No TrackBacks |

I'm tired of comment spam... It reached 15-30 spam instances/day level and finally I decided to install MT-Blacklist plugin for my blogging engine. 5 minutes of installation, updaing the blacklist, deep de-spamming and that it, I'm clean and protected. Well done, Jay Allen! Hope it's gonna help. Anyway if you are not a spammer and your comment has been refused, don't hesitate to mail me about that.

Cool news from the XML Editor Team (announced by Chris Lovett):


Announcing: New XML Editor in Visual Studio 2005 Beta 1

Visual Studio 2005 Beta 1 contains a completely new XML Editor, built on top of the core text editor provided by Visual Studio. It is entirely written in C# and leverages all the cool stuff provided by the System.Xml .NET assembly. The new XML editor provides support for editing XML and DTD content, including special support for XSD and XSL. It contains the following handy features:

* Full syntax coloring for all XML and DTD syntax.
* Well formedness checking while you type, with red squiggles and error list.
* Intellisense based on any DTD, XDR and XSD schemas.
* Validation-while-you-type with blue squiggles and error list.
* Auto-completion of namespace declarations, end tags and attribute value quotes.
* Support for xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes.
* Schema picker dialog for overriding schemas used for validation, which is then remembered as a document property in your solution.
* Schema cache for commonly used schemas with standard set provided out of the box. You can easily add your own schemas here or edit the existing ones to constantly improve your XML editing experience.
* Smart Formatter that is more than a pretty printer. It honors and formatting of attributes that you may have done by hand and it fixes up the most common mistakes people make in XML, like unquoted attribute values.
* Smart indenting based on XML element depth.
* Inline expand/collapse support.
* Easy navigation between start and end tags using brace matching command (Ctrl+]) .
* Brace highlighting so you see which tags are being closed as you type.
* Goto Definition command for navigating between elements and their associated DTD, XDR or XSD schema definitions. This command can also navigate from an entity reference to the entity definition in the DTD.
* Tool tips that popup showing xsd:annotations for the element or attribute under the mouse.
* XSL and XSD compilation errors while you type, providing even more error checking that can be represented in the schemas alone. * Show XSLT Output command available on any XML or XSLT file.

XSD Schema Inference

The editor provides a handy command named "Create Schema" which does one of three things:

1. Convert associated DTD to XSD
2. Convert associated XDR schema to XSD
3. Infer a schema from the XML

This is by far the easiest way to get started with designing an XSD schema.

XSLT Debugging

In non-Express SKU's only, this feature gives you a powerful XSLT debugger, fully integrated into the overall Visual Studio debugging experience so you can step from C# code directly into the XSLT transform itself and back out, or from XSLT out to extension objects and back. It also provides a "Debug XSL" command on XML editor toolbar to start debugging directly from XML or XSL file.

Once debugging has started the standard Visual Studio debugging menu is available including special support for the following:

  Setting and clearing breakpoints, at the node level (as opposed to line level).

  Locals window that shows XSLT variables and parameters that are in scope.

  Call Stack window that shows XSLT template stack.

Deep VS Integration & Extensibility

All the advanced core text editor commands and configurability is available, for example:

o Fully configurable colors using standard Tools/Options/Environment/Fonts and Colors property page.
o Fully integrated text editor settings (Tools/Options/Text Editor/XML) for general, tabs and miscellaneous settings.
o Support for the new Visual Studio 2005 "Import/Export Settings" feature.

Support for multiple-views over the same buffer. In Visual Studio 2003, the XSD designer and grid views were only available from a tab at the bottom of the document window, which means you could not view both ways at the same time. This limitation has been removed, and each different view is now a full fledged document window.

Custom XML designers can also be registered per file extension and/or XML namespace URI, which is how the Visual Studio XSD designer, DataSet designer, and the Grid View are associated with the XML editor. Anyone can now register an XML designer for a given namespace and the XML editor will automatically provide a View Designer menu item for invoking that designer. In fact if you are planning a custom XML designer, I'd love to chat about integration with this new XML text editor.

All I can say is "finally!"