Architecture of the World Wide Web, Volume One

| No TrackBacks

W3C at last published the "Architecture of the World Wide Web, Volume One" as W3C Recommendation. It was cooked in long hot discussions by Web heavyweights and geeks. Here is what's that about:

This document describes the properties we desire of the Web and the design choices that have been made to achieve them. It promotes the reuse of existing standards when suitable, and gives guidance on how to innovate in a manner consistent with Web architecture.
That's a must reading for all developers working with Web, XML and URIs. We can make the Web a better place by following principles, constraints and practices defined in that document.

It's 47 printed pages and I had no time to read it thoroughly yet, but I skimmed XML-related parts. There are some normative answers to some bloated questions finally.

Binary vs Text data formats:

The trade-offs between binary and textual data formats are complex and application-dependent. Binary formats can be substantially more compact, particularly for complex pointer-rich data structures. Also, they can be consumed more rapidly by agents in those cases where they can be loaded into memory and used with little or no conversion. Note, however, that such cases are relatively uncommon as such direct use may open the door to security issues that can only practically be addressed by examining every aspect of the data structure in detail.

Textual formats are usually more portable and interoperable. Textual formats also have the considerable advantage that they can be directly read by human beings (and understood, given sufficient documentation). This can simplify the tasks of creating and maintaining software, and allow the direct intervention of humans in the processing chain without recourse to tools more complex than the ubiquitous text editor. Finally, it simplifies the necessary human task of learning about new data formats; this is called the "view source" effect.

It is important to emphasize that intuition as to such matters as data size and processing speed is not a reliable guide in data format design; quantitative studies are essential to a correct understanding of the trade-offs. Therefore, designers of a data format specification should make a considered choice between binary and textual format design.
Oh yeah, well said.

When to use XML:

XML defines textual data formats that are naturally suited to describing data objects which are hierarchical and processed in a chosen sequence. It is widely, but not universally, applicable for data formats; an audio or video format, for example, is unlikely to be well suited to expression in XML. Design constraints that would suggest the use of XML include:

1. Requirement for a hierarchical structure.
2. Need for a wide range of tools on a variety of platforms.
3. Need for data that can outlive the applications that currently process it.
4. Ability to support internationalization in a self-describing way that makes confusion over coding options unlikely.
5. Early detection of encoding errors with no requirement to "work around" such errors.
6. A high proportion of human-readable textual content.
7. Potential composition of the data format with other XML-encoded formats.
8. Desire for data easily parsed by both humans and machines.
9. Desire for vocabularies that can be invented in a distributed manner and combined flexibly.

On linking in XML:

Designers of XML-based formats may consider using XLink and, for defining fragment identifier syntax, using the XPointer framework and XPointer element() Schemes.
Note that "may". It means "we'd like to see at least anybody using XLink, though we admit it's not so good." It's still an issue.
XLink is not the only linking design that has been proposed for XML, nor is it universally accepted as a good design.

On our favorite nightmare - XML namespaces. It's always an issue (aka it's too long), go read it. Some related to the misunderstanding Dare was writing about:

Attributes are always scoped by the element on which they appear. An attribute that is "global," that is, one that might meaningfully appear on elements of many types, including elements in other namespaces, should be explicitly placed in a namespace. Local attributes, ones associated with only a particular element type, need not be included in a namespace since their meaning will always be clear from the context provided by that element.
The type attribute from the W3C XML Schema Instance namespace "http://www.w3.org/2001/XMLSchema-instance" ([XMLSCHEMA], section 4.3.2) is an example of a global attribute. It can be used by authors of any vocabulary to make an assertion in instance data about the type of the element on which it appears. As a global attribute, it must always be qualified. The frame attribute on an HTML table is an example of a local attribute. There is no value in placing that attribute in a namespace since the attribute is unlikely to be useful on an element other than an HTML table.

And here are some new definitions for a very bloated topic:

Another benefit of using URIs to build XML namespaces is that the namespace URI can be used to identify an information resource that contains useful information, machine-usable and/or human-usable, about terms in the namespace. This type of information resource is called a namespace document. When a namespace URI owner provides a namespace document, it is authoritative for the namespace.

There are many reasons to provide a namespace document. A person might want to:

- understand the purpose of the namespace,
- learn how to use the markup vocabulary in the namespace,
- find out who controls it and associated policies,
- request authority to access schemas or collateral material about it, or
- report a bug or situation that could be considered an error in some collateral material.
A processor might want to:

- retrieve a schema, for validation,
- retrieve a style sheet, for presentation, or
- retrieve ontologies, for making inferences.
In general, there is no established best practice for creating representations of a namespace document; application expectations will influence what data format or formats are used. Application expectations will also influence whether relevant information appears directly in a representation or is referenced from it.
Well, I'm not sure I fully agree with this practice, but at least it sounds reasonable and clear.

On QNames in content problem:

Do not allow both QNames and URIs in attribute values or element content where they are indistinguishable.

XML ID problem - still not solved.

Media types for XML:

In general, a representation provider SHOULD NOT assign Internet media types beginning with "text/" to XML representations.
Read again that. Use what RFC 3023 says - "application/xml" and all that jazz with "+xml" suffix (e.g. "image/svg+xml"). Also:
In general, a representation provider SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing.

So lots of cool stuff to read and follow.

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/374