March 2005 Archives

XsltSettings.EnableDocumentFunction

| No Comments | No TrackBacks

This is a short followup for XslCompiledTransform. It accepts a new optional argument XsltSettings. It is an enum that holds two switches:

  • EnableDocumentFunction
  • EnableScript

There is a reason why document("") could be optional. When document() function receives an empty string argument, it returns a node-set that contains the stylesheet itself. That however means, XSLT engine must preserve reference to the input stylesheet document (XPathNavigator). There was a bug report to MS feedback that document() could not be resolved (I once refered to it). So this options looks like Microsoft's response to developers' need.

I thought that not keeping source document is a good idea, so I tried to remove all the references to XPathNavigator clones in our XslTransform (except for document() function). After some hacking, I wrote a simple test which assures that XslTransform does not have reference to the document:

using System; using System.IO; using System.Xml; using System.Xml.XPath; using System.Xml.Xsl; class Test {</p> <p>public static void Main () { string xsl = "&lt;xsl:transform " + "xmlns:xsl='http://www.w3.org/1999/XSL/Transform'" + " version='1.0'/>"; XslTransform t = new XslTransform (); WeakReference wr = StylesheetLoad (t, xsl); GC.Collect (2); Console.WriteLine ("load : " + wr.IsAlive); wr = StylesheetTransform (t, "&lt;root/>"); GC.Collect (2); Console.WriteLine ("transform : " + wr.IsAlive); }</p> <p>static WeakReference StylesheetLoad ( XslTransform t, string xsl) { XPathDocument doc = new XPathDocument ( new StringReader (xsl)); WeakReference wr = new WeakReference (doc); t.Load (doc); return wr; }</p> <p>static WeakReference StylesheetTransform ( XslTransform t, string xml) { XPathDocument doc = new XPathDocument ( new StringReader (xml)); WeakReference wr = new WeakReference (doc); t.Transform (doc, null, TextWriter.Null, null); return wr; } }

So I tried it under Windows:

$ csc test.cs /nologo /nowarn:618
$ mono test.exe
load : False
transform : False
$ ./test
load : True
transform : False

Hmm... I wonder if I missed something.

NvdlValidatingReader

| No Comments | No TrackBacks

Recently I put another XML validator in Mono - NvdlValidatingReader. It implements NVDL, ISO DSDL part 4 Namespace-based Validation Dispatching Language.

It is in our Commons.Xml.Relaxng.dll. I temporarily put autogenerated ndoc documents (no descriptive documents there). It will be included in the next release of Mono. For now, I haven't prepared independent archive, so get the sources from mono SVN repositry. Compiled binary should be in the latest monocharge.

Well, for those who want instant dll, you can get it from here:Commons.Xml.Relaxng.dll.

There is also a set of example code which demonstrates validation: nvdltests.zip

I'm too lazy to write something new, so here am mostly copying the description below from mcs/class/Commons.Xml.Relaxng/README.

NVDL

NvdlValidatingReader is an implementation of ISO DSDL Part 4 Namespace-based Validation Dispatching Language (NVDL). Note that the development is still ongoing, and NVDL specification itself is also still not in standard status as yet.

NOTE: It is "just started" implementation and may have limitations and problems.

By default, NvdlValidatingReader supports RELAX NG, RELAX NG Compact syntax, W3C XML Schema and built-in NVDL validations, however without "PlanAtt" support.

Usage

Using built-in RELAX NG support.

NvdlRules rules = NvdlReader.Read ( new XmlTextReader ("xhtml2-xforms.nvdl")); XmlReader vr = new NvdlValidatingReader ( new XmlTextReader ("index.html"), rules);

static NvdlReader.Read() method reads argument XmlReader and return NvdlRules instance.

NvdlValidatingReader is instantiated from a) XmlReader to be validated, and b) NvdlRules as validating NVDL script.

Custom validation support

NvdlConfig config = new NvdlConfig (); config.AddProvider (myOwnSchematronProvider); // [*1] config.AddProvider (myOwnExamplotronProvider); NvdlRules rules = NvdlReader.Read ( new XmlTextReader ("myscript.nvdl")); XmlReader vr = new NvdlValidatingReader ( new XmlTextReader ("myinstance.xml"), rules, config);

NvdlConfig is here used to support "custom validation provider". In NVDL script, there could be any schema language referenced. I'll describe what validation provider is immediately later.

[*1] Of course Schematron should receive its input as XPathNavigator or IXPathNavigable, but we could still use ReadSubtree() in .NET 2.0. NvdlValidationProvider

NvdlValidationProvider

To support your own validation language, you have to design your own extension to NvdlValidationProdiver type.

Abstract NvdlValidationProvider should implement at least one of the virtual methods below:

  • CreateValidatorGenerator (NvdlValidate validate, string schemaType, NvdlConfig config)
  • CreateValidatorGenerator (XmlReader schema, NvdlConfig config)

Each of them returns NvdlValidatorGenerator implementation (will describe later).

The first one receives MIME type (schemaType) and "validate" NVDL element. If you don't override it, it treats only "*/*-xml" and thus creates XmlReader from either schema attribute or schema element and passes it to another CreateValidatorGenerator() overload.

If this (possibly overriden) method returns null, then this validation

provider does not support the MIME type or the schema document.

The second one is a shorthand method to handle "*/*-xml". By default it just returns null.

Most of validation providers will only have to override the second overload. Few providers such as RELAX NG Compact Syntax support will have to overide the first overload.

NvdlValidatorGenerator

Abstract NvdlValidatorGenerator.CreateValidator() method is designed to create XmlReader from input XmlReader.

For example, we have NvdlXsdValidatorGenerator class. It internally uses XmlValidatingReader which takes XmlReader as its constructor parameter.

An instance of NvdlValidatorGenerator will be created for each "validate" element in the NVDL script. When the validate element applies (for a PlanElem), it creates validator XmlReader.

XslCompiledTransform

| No Comments | No TrackBacks

Happy new year. (I was totally hibernating this winter ;-)

Recently Microsoft pushed another CTP version of Whidbey and I found there is a new XSLT implementation named XslCompiledTransform (BTW MSDN documentation are so obsolete that it still contains XsltCommand and XQueryCommand). It is in System.Data.SqlXml.dll, and (as long as I see Object Browser) there is no other type than XslCompiledTransform related things. (As compared to the assembly file name, it is somewhat funky.)

XslCompiledTransform looks coming from XsltCommand which is based on executable stylesheet IL code like Apache XSLTC. I wonder how many existing types such as XPathExpression and XPathNodeIterator are used in this new implementation. They might exist just for historical extension support.

I noticed that XslCompiledTransform is pretty complete. It looks a great work. I guess it would be the best improvement of System.Xml 2.0.

Here I put two important behavioral differences from existing XslTransform:

Error recovery

According to XSLT 1.0 specification, it is an error if an attribute node is appended to an element where child elements or texts were already appended. In such cases, XslCompiledTransform throws an exception. XslTransform ignores such attributes. Both behavior are allowed in the specification.

Similarly, it is an error that if element nodes are added to an attribute as its content. Here XslCompiledTransform also rejects such output (XslTransform doesn't).

So which is better? I believe that XslCompiledTransform is. Because if you bring your stylesheet to other platform, it might be rejected as wrong stylesheet. With XslTransform, there is no way to check if your stylesheet is sane.

Space stripping

If there is xsl:strip-space in the stylesheet, XslCompiledTransform will reject IXPathNavigable as Transform() input, saying that:

System.Xml.Xsl.XslTransformException: Whitespace cannot be stripped from input documents that have already been loaded. Provide the input document as an XmlReader instead.

The reason is, it is much easier and efficient for XSL transformation engine that those whitespaces in such elements that are listed in xsl:strip-space are originally excluded from the input document (in the transformation process, they are totally ignored). So there must be a filtering XmlReader that skips those whitespace nodes (or IXPathNavigable implementation must just do that).

If you can't change the input source from IXPathNavigable to XmlReader, you could still use new XPathNavigator.ReadSubtree() method.

Other than them, there are some minor changes (such as having Roman numbering as usual formatting, avoiding run-time prefix evaluation on "name" attributes in xsl:attribute and xsl:element), but in general, it looks good.

As for performance wise, node iterators seem to be changed as structs. That means, it does not have to worry about so extraneous object creation. I believe that it must have resulted in significant performance improvements.

So now I tend to throw away existing XslTransform and implement a new XSLTC-like transformation engine, but still not sure. With the stylesheet used in corcompare, XslCompiledTransform just resulted in only about 1.5x - 2x boost.