Why is XML case-sensitive?

Sriram Krishnan asks strange question:

I see someone flaming someone else for not being XHTML compliant. Tim Bray - if you're reading this, I want to know something. Why is XML case-sensitive? No human-being ever thinks in case-sensitive terms. A is a. End of story. So now, I have a situation where writing <html> </HTML> wouldn't be XHTML compliant. And what do I get out of XHTML apart from geek-bragging rights and this strange idea of 'standards-compliance'? Does it give me more freedom? Does it help my viewers? My customers?

Well, this guy is definitely heavily sloppy-HTML-contaminated. What? <html> </HTML> isn't XHTML complaint? Thanks GOD! Anyway, Tim Bray does answer his question:

XML markup is case-sensitive because the cost of monocasing in Unicode is horrible, horrible, horrible. Go look at the source code in your local java or .Net library.

Also, not only is it expensive, it's just weird. The upper-case of e' is different in France and Quebec, and the lower-case of 'I' is different here and in Turkey.

XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase(). -Tim

Nice.

9 Comments

Anonymous | September 29, 2009 10:34 AM | Reply

At the turn of the 19th century, when ASIIC was defined, after IBM's EBCDIC format...

Development on ASCII began in 1960. EBCDIC development began afterwards, in 1963. And both of those were in the latter half of the 20th century. As a point of reference, Unix development began in 1969 (same decade as ASCII and EBCDIC).

James replied to comment from Thomas | June 29, 2009 8:48 AM | Reply

Compare your incorrect sentence:
"Help my uncle Jack off a horse"
to the correct version:
"Help my uncle, Jack, off a horse"

and you'll realize you don't know what your talking about.

john replied to comment from gstangler | June 23, 2009 9:07 AM | Reply

gstangler said:

> Unicode is incapable of making this mapping in a single clock.

This is actually not true. English text encoded in Unicode can still be converted between cases using bitwise operations.

Thomas | August 19, 2008 6:59 AM | Reply

The article Dave linked states that "change in case does not change the meaning of a word in spoken language."

Compare:
I had to help my uncle Jack off a horse.
I had to help my uncle jack off a horse.

Dave | July 18, 2008 6:41 AM | Reply

> Seperate from that, being case insensitive is simply a form of laziness, and or lack of discipline.

Absolutely wrong.
Being case sensitive is a form of laziness on the part of the parser who doesn't want to write code to equate A to a.

For a really good article on why case sensitivity is not only stupid but dangerous, lazy and wrong, see here:

http://www.tonymarston.co.uk/php-mysql/case-sensitive-software-is-evil.html

gstangler | July 12, 2007 6:01 PM | Reply

XML tags, etc have no requirement to be english. That means that a tag such as MyTAG and Mytag are as different to any OS as MyTAG and My123 or My_1_.

A-Z and a-z are 26 completely unique numerical values in the system.

At the turn of the 19th century, when ASIIC was defined, after IBM's EBCDIC format, some Genious realized it would be beneficial to Map A-Z and a-z, with just a single bit difference to map these 'special' characters from upper to lower and visa versa. This feature was purely for performance, since bit masking is a single clock in all hardware.

The good people on the Unicode committee, although I'm sure they tried, failed to incorporate such an efficiency.

Since massive amounts of data processing now incorporates XML, performance is very important.

Unicode is incapable of making this mapping in a single clock.

Seperate from that, being case insensitive is simply a form of laziness, and or lack of discipline.

Greg

Saurav | September 21, 2006 9:40 AM | Reply

XML case sensitivity any thing to do with
WC3 consortium rules ?

Random User | June 21, 2005 10:01 PM | Reply

Well, not *all* Ant users like it. Personally, I think case insensitivity is a plague that makes code more complicated everywhere it appears. (Instead of being able to simply compare strings, you now have to know to do a case-insensitive compare). I'd rather that Ant made a clear decision (e.g. lowercase only for everything) rather than allowing users this "flexibility" to have their Ant scripts look different from those in the next cubicle, for no good reason.

Steve Loughran | November 30, 2004 12:10 PM | Reply

Interesting point. Ant is case insensitive on all attributes, because users like it, but we remain case sensitive on element names because the XML parser refuses to match up element ends with element starts if they dont match properly.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

Why is XML case-sensitive?

Tags:

Related Blog Posts

No TrackBacks

9 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Recent Comments

Recent Posts

Why is XML case-sensitive?

Tags:

Related Blog Posts

No TrackBacks

9 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Archives

Tag Cloud

Recent Comments

Recent Posts