A fellow MVP asked if there is a way to dump XML content while reading it from a stream without buffering the whole XML document. Here is a scenario - an XML document being read from a HttpWebResponse stream and needs to be passed as an XmlReader to an XmlSerializer to deserialize it into objects. This works fine in a streaming way - just create an XmlReder over the stream and pass it to an XmlSerializer. But what if incoming XML needs to be logged? Of course then one could go buffer-log-process architecture, effectively killing performance and scalability. Fortunately there is a better way - by extending XmlReader one can make it dumping current node it's positioned at. Here is how.
public class DumpingXmlTextReader : XmlTextReader { private XmlWriter dump; //Add more constructors as needed public DumpingXmlTextReader(string url, XmlWriter dump) :base(url) { this.dump = dump; } /// <summary> /// Overriden XmlReader's Read() method /// </summary> public override bool Read() { bool baseRead = base.Read(); if (baseRead) { WriteShallowNode(this, dump); } return baseRead; } /// <summary> /// Auxilary method to dump node XmlReader is positioned at. /// Thanks to Mark Fussell, /// http://blogs.msdn.com/mfussell/archive/2005/02/12/371546.aspx /// </summary> static void WriteShallowNode( XmlReader reader, XmlWriter writer ) { if ( reader == null ) { throw new ArgumentNullException("reader"); } if ( writer == null ) { throw new ArgumentNullException("writer"); } switch ( reader.NodeType ) { case XmlNodeType.Element: writer.WriteStartElement( reader.Prefix, reader.LocalName, reader.NamespaceURI ); writer.WriteAttributes( reader, true ); if ( reader.IsEmptyElement ) { writer.WriteEndElement(); } break; case XmlNodeType.Text: writer.WriteString( reader.Value ); break; case XmlNodeType.Whitespace: case XmlNodeType.SignificantWhitespace: writer.WriteWhitespace(reader.Value); break; case XmlNodeType.CDATA: writer.WriteCData( reader.Value ); break; case XmlNodeType.EntityReference: writer.WriteEntityRef(reader.Name); break; case XmlNodeType.XmlDeclaration: case XmlNodeType.ProcessingInstruction: writer.WriteProcessingInstruction( reader.Name, reader.Value ); break; case XmlNodeType.DocumentType: writer.WriteDocType( reader.Name, reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ), reader.Value ); break; case XmlNodeType.Comment: writer.WriteComment( reader.Value ); break; case XmlNodeType.EndElement: writer.WriteFullEndElement(); break; } } }Not a rocket science as you can see, pretty straightforward. The core method - WriteShallowNode, dumping XML node I borrowed from Mark Fussell's post on "Combining the XmlReader and XmlWriter classes for simple streaming transformations".
And here is a usage sample. I'm reading XML from a file stream (imagine instead it's HttpWebResponse stream), feeding it to an XmlSerializer and dumping its content at the same time. And note - XML content never gets buffered as a whole, the processing is pure forward-only non-caching streaming one.
//Prepare dumping writer XmlTextWriter dumpWriter = new XmlTextWriter("dump.xml", Encoding.UTF8); dumpWriter.Formatting = Formatting.Indented; PurchaseOrder po = null; using (FileStream fs = File.OpenRead("PurchaseOrder.xml")) { //Reads and dumps XML content node-by-node to the dumpWriter XmlReader reader = new DumpingXmlTextReader(fs, dumpWriter); XmlSerializer serializer = new XmlSerializer(typeof(PurchaseOrder)); po = (PurchaseOrder)serializer.Deserialize(reader); } //Close dumping writer, the XML dump is in dump.xml dumpWriter.Close(); //Deserialization went ok Console.WriteLine(po.Account);
I wonder if it's a rare use case or we need such class in utilities, e.g. in Mvp.Xml library?
How about combining it with kzu's High Perf XML techinque: http://weblogs.asp.net/cazzu/archive/2005/05/28/XmlMessagePerformance.aspx
Don