Generating WordprocessingML using XSLT: Lists

| 3 Comments | 3 TrackBacks

I'm writing this entry to illustrate basics of generating lists in WordprocessingML documents using XSLT. Also I want to test how my office-related rants are syndicated by the wonderful OfficeZealot.com site.

[Prerequisites: Make sure you've read what "Overview of WordprocessingML" says about lists].

Basically a list in WordprocessingML consists of list format definition (<w:listDef>), list instance definition (<w:list>) and list items. A list item is just specially attributed paragraph. More formally - any paragraph with <w:listPr> element in <w:pPr> element is considered to be a list item. It works this way - list item refers to list it belongs to, while list definition refers to list format definition. List formats and list instances are defined within <w:lists> element, which is child of <w:wordDocument> element. Thus there are no list boundaries structurally, instead list items refer to a list they belong to by list ID.

It seems reasonable once you grasp it. Ok, list definitions. Here s a sample, which defines single list format (#0) and single list (#1):

As can be seen, <w:listDef> defines formatting properties for three levels. Beware - that's important that you've got defititions for all list levels your document might contain, otherwise Word won't display list item as list item. By default Word defines 8 levels for each list format. Then <w:list> element defines list instance, binding it to list format defnition in <w:ilst> element. Done with definitions, now here is a list item:

<w:p>
    <w:pPr>
      <w:listPr>
        <w:ilvl w:val="0"/>
        <w:ilfo w:val="4"/>
      </w:listPr>
    </w:pPr>
    <w:r>
      <w:t>List item text</w:t>
    </w:r>
</w:p>
It's an item, which belongs to 0 level of a list number 4.

Now how this stuff can be generated in XSLT? First of all obviously you need to generate format definitions for all types of lists you gonna have in a document - ordered, unordered etc. Then you need to generate list instance definition for each list in your document, bound to appropriate format definition. And finally generate list items, refering to the nesting level and list instance they belong to. Sounds piece of cake, huh?

Let's say I have an article in my proprietary XML format (similar to XHTML though to be realistic):

And here is my stylesheet, which transforms the article into WordprocessingML document:

Ok, what's inside? You can see definitions of two list formats - first for unordered list and second for ordered. Then I generate instances of lists for each list in source XML uniquely numbering them. And finally for each list item I generate paragraph with <w:listPr> property, where I define nesting level (count(ancestor::ul|ancestor::ol)-1) and ID of the list instance it belongs to. A bit not trivial, but only a bit. Here is the result:

Well, lists in WordprocessingML are a bit tricky. First of all it's quite unusual to have no structural list borders. Lists are defined in document header, while list items are within document body. Hence a lot of indirection. Enables great deal of flexibility, hard to grasp though. Then naming of elements and attributes is confusing (can you say out of hand what w:ilfo or w:ilst means?). But having strong understanding of WordprocessingML you can easily generate them using XSLT. At least I hope that's the feeling you've got finishing reading this text.

Related Blog Posts

3 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/138

I was reading today an internal distribution list and I saw an interesting thread about autonumbered... Read More

I was reading today an internal distribution list and I saw an interesting thread about autonumbered... Read More

I was reading today an internal distribution list and I saw an interesting thread about autonumbered... Read More

3 Comments

Yeah, I didn't want to complicate things in "basic" intro. Restarting lists was a problem to me when writing sample.
I found that even if list items refer to different w:list elements, they are still considered to be in a single list (and numbering continues) if both w:list elements refer to the same w:listDef. Funny enough.
But finally I solved that by generating w:startOverride in w:list:

This way different lists (w:list) behave as distinct lists WRT numbering. Hmmm. Actually I didn't tried other numbering formattings though, may be that doesn't solve the problem actually.

Must to repeat steps you have provided and inspect resulting WordML.

And what for lists in paragraphs - that's different problem. I intentionally omitted it taking XHTML-like input document - in XHTML list cannot be nested in . But in Docbook and many other vocabularies it's allowed. Ok, I've got a topic for my next WordML rant.

Hi Oleg,

Yes, you were "picked up" on OZ. I just thought I'd add one more layer of complexity to your list XSLT. One of the most important concepts of any ordered list is that when you start a new one, it restarts with 1 or a or i or whatever. In order to get this to happen, you need to build your lists so that they're associated with styles at each of the levels so restart can be enabled.

Take the following example:

This is a paragraph.
1. First item in first-level list
2. Second item in first-level list
a. First item in second-level list
b. Second item in second-level list
3. Third item in first-level list
a. First item in second-level list

End of list and another paragraph.

1. First item in first-level list.


Here's how I go about getting lists all set up for a Smart Document:

1. build the styles in Word, without associating with list types

2. once all the styles are created, open the bullets/numbering window and select one of the eight outline-numbered panels, and select "customize" (if you don't select the outline-numbered tab, you have no way to automatically restart numbering unless you start using the Microsoft "hint" namespace).

3. set level1 with no numbering, select the "more" button, and then associate with a style name that will restart the numbering.

4. set level2 to actually be your first level of numbering, associate the style name for your first-level list format, and make sure restart numbering is set to the previous list level; i.e. level 2, number format 1), link level to style NumberedListItemLevel1, restart numbering after level 1.

5. set up any additional nested list formats - the actual second-level list will be set to Word's level 3, etc.

6. Save as WordML and copy the styles definitions into your XSLT. Or examine the code so you're ready to write it by hand the next time ;-)

And we won't mention the difficulties encountered when your schema uses a correct English grammatical structure and nests lists inside of paras ...

Leave a comment