the "true" Checklists: XML Performance

| No Comments | No TrackBacks

While I knew that there are some documents named "Patterns and Practices" from Microsoft, I didn't know that there is a section for XML Performance. Sadly, I was scarcely satisfied by that document. The checklist is so insufficient, and some of them are missing the points (and some are even worse for performance). So here I put the "true" checklists to save misinformed .NET XML developers.

  • Design Considerations
    • Avoid XML as long as possible.
    • Avoid processing large documents.
    • Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
    • Avoid DTD, especially IDs and entity references.
    • Use streaming interfaces such as XmlReader or SAXdotnet.
    • Consider hard-coded processing, including validation.
    • Shorten node name length.
    • Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower.
  • Parsing XML
    • Use XmlTextReader and avoid validating readers.
    • When node is required, consider using XmlDocument.ReadNode(), not the entire Load().
    • Set null for XmlResolver property on some XmlReaders to avoid access to external resources.
    • Make full use of MoveToContent() and Skip(). They avoids extraneous name creation. However, it becomes almost nothing when you use XmlValidatingReader.
    • Avoid accessing Value for Text/CDATA nodes as long as possible.
  • Validating XML
    • Avoid extraneous validation.
    • Consider caching schemas.
    • Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
    • Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string.
  • Writing XML
    • Write output directly as long as possible.
    • To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.
  • DOM Processing
    • Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
    • Avoid PreviousSibling. XmlDocument is very inefficient for backward traverse.
    • Append nodes as soon as possible. Adding a big subtree results in longer extraneous run to check ID attributes.
    • Prefer FirstChild/NextSibling and avoid to access ChildNodes. It creates XmlNodeList which is initially not instantiated.
  • XPath Processing
    • Consider to use XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
    • Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
    • Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
    • Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
    • Compile XPath string to XPathExpression and reuse it for frequent query.
    • Don't run XPath query frequently. It is costy since it always have to Clone() XPathNavigators.
  • XSLT Processing
    • Reuse (cache) XslTransform objects.
    • Avoid key() in XSLT. They can return all kind of nodes that prevents node-type based optimization.
    • Avoid document() especially with nonstatic argument.
    • Pull style (e.g. xsl:for-each) is usually better than template match.
    • Minimize output size. More importantly, minimize input.

What I felt funky was that they said that users should use XmlValidatingReader. With that they can never say that XmlReader is better than SAX on performance, since it creates value strings for every node (so the performance tips are mutually exclusive). It is not evitable even if you call Skip() or MoveToContent(), since skipping validation on skipped nodes is not allowed (actually in .NET 1.0 Microsoft developers did that and it caused bugs in XmlValidatingReader).

No TrackBacks

TrackBack URL: http://veritas-vos-liberabit.com/monogatari/mt-tb.cgi/45

Leave a comment