December 2004 Archives

(2nd) Mono meeting in Tokyo

| No Comments | No TrackBacks

Since Duncan is coming to Tokyo this weekend, we are going to have (2nd) Mono hackers meeting in Tokyo (yes, we had the first meeting two years ago). It is planned on 19th lunch time, at Umegaoka (close to Shimokitazawa). We'd welcome a few more people who would like to in join us (sorry but a few; we have no preparation for large meeting). Please feel free to mail me (atsushi@ximian.com) if you are interested.

One of the reason why XmlSchemaValidator does not rock

After hacking label collector functionality for RELAX NG, I noticed that .NET 2.0's XmlSchemaValidator is kind of such an API (note that the link to MSDN documentation above shows so obsoleted). So I decided to implement it nearly a week ago. And now it's mostly implemented and checked in mono's svn (I think it is one of the hackiest xsd validator ;-).

Here is an example application of XmlSchemaValidator I wrote to test my implementation (it compiled with 2.0 csc).

Some of you might know that implementing XmlSchemaValidator sounds weird, because there is no API documentation for this new face (well, at least for VS 2005 October CTP which I reference). But the functionality is mostly obvious (at least to me). For example, ValidateElement() is startTagOpenDeriv, ValidateAttribute() is attDeriv, and ValidateEndOfAttribute() is startTagCloseDeriv (btw I really don't like those method names) described in James Clark's derivative algorithm for RELAX NG. One thing I was mystified was what ValidateElement(string,string,XmlSchemaInfo,string,string,string,string) overload meant, but thanks to MS developers, it was solved.

Actually XML Schema is much useless than RELAX NG stuff. XmlSchemaValidator is fully stateful, thus you cannot go back to the previous state easily. RELAX NG derivative implementation is stateless, so you can just attach those derivative instances to nodes in the editor. Oh, yes, XmlSchemaValidator could be stateful, if it supports cloning. But I don't think it can be lightweight.

btw, if you want, you can generate xsd from DTD and use XmlSchemaValidator, if you want.

rethinking element/attribute label collector

| No Comments | No TrackBacks
19:51 (alp) eno: dude, am never getting any expected attributes

... So, I missed the point that after RelaxngValidatingReader.Read(), my validation engine which is based on James Clark's derivative algorithm keeps the state only "after it closed the start tag" (i.e. startTagCloseDeriv in that paper) and thus no attributes must be allowed. Sigh. So, to implement attribute auto-completion, I had to expose state transition object that users can try "the state transition after an attribute occured" (i.e. attDeriv). So, now the code became more complicated than yesterday (well, it is required complexity):

XmlTextReader xtr = new XmlTextReader ("relaxng.rng"); RelaxngPattern p = RelaxngPattern.Read ( new XmlTextReader ("relaxng.rng")); RelaxngValidatingReader rvr = new RelaxngValidatingReader (xtr, p); TextWriter Out = Console.Out;</p> <p>for (; !rvr.EOF; rvr.Read ()) { object state = rvr.GetCurrentState (); Out.WriteLine ("Current node: {0} ({1}) -> {2}", rvr.Name, rvr.NodeType, rvr.Emptiable (state) ? "Emptiable" : "not Emptiable"); Out.WriteLine (" - expected elements -"); foreach (XmlQualifiedName qn in rvr.GetElementLabels (state)) { Out.WriteLine (" " + qn); object astate = rvr.AfterOpenStartTag ( state, qn.Name, qn.Namespace); Out.WriteLine (" - expected attributes -"); foreach (XmlQualifiedName aqn in rvr.GetAttributeLabels (astate)) Out.WriteLine (" " + aqn); } }

I put the code example (above), and the updated result.

So now RelaxngValidatingReader implicitly expects to the validating editor not to call .Read() until it closes the start tag. Instead, now each of the elements and attributes can hold the state at the node itself. (Am not sure it really works fine; I should consider cut/paste, insertion, and so on.)

BTW, personally I don't want to expose such features and requires "implementors" of RelaxngValidatingReader functionality to implement highly derivative-dependent features like this (that is bad for standardizng API). I won't recommend to learn this feature as long-live, good to know stuff. I am, on the other hand, expecting System.Xml 2.0 to have such functionality for XML Schema (IF Microsoft people can provide), but still don't think it's worthy of standardization.

On the next stage, I will have to implement some "error recovery" stuff so that users can enter invalid nodes and the implementation can still continue remaining validation.

Many thanks to Alp to try it out with his experimental UI stuff and to let me improve this library (I could also have chance to fix bugs and to optimize Commons.Xml.Relaxng stuff).

(5:00am JST: Updated the API and example that looks better.)

00:18 (alp) eno: do you have any thoughts on how i could
            use a DTD to hack together xml completion?
00:19 (eno) alp: you want to develop such functionality
            in your app?
00:20 (eno) mhm, actually I have no idea that supports
            something like nxml-mode
00:20 (alp) yeah, perhaps for monodevelop

Actually that it is sort of what I wanted. However, for DTD and XSD, the implementation is not extensible (validation implementation is hidden in System.Xml.XmlValidatingReader) So I (kinda) implemented something like that, using my RelaxngValidatingReader:

XmlTextReader xtr = new XmlTextReader ("relaxng.rng"); RelaxngPattern p = RelaxngPattern.Read ( new XmlTextReader ("relaxng.rng")); RelaxngValidatingReader rvr = new RelaxngValidatingReader (xtr, p); rvr.MoveToContent ();</p> <p>for (rvr.MoveToContent (); !rvr.EOF; rvr.Read ()) { Console.WriteLine ("Name: {0}, NodeType: {1} -> {2}", rvr.Name, rvr.NodeType, rvr.Emptiable () ? "Emptiable" : "not Emptiable"); Console.WriteLine (" - expected attributes -"); foreach (XmlQualifiedName qn in rvr.ExpectedAttributes) Console.WriteLine ("{0} in {1}", qn.Name, qn.Namespace); Console.WriteLine (" - expected elements -"); foreach (XmlQualifiedName qn in rvr.ExpectedElements) Console.WriteLine ("{0} in {1}", qn.Name, qn.Namespace); }

Here I put the output of the example above. It is hacky (written mostly in 2 hours) and it does not check rejection by notAllowed. It might be improved later. Also, it uses Hashtable right now, but it does not have to be dictionary.

I also added Emptiable() (of type bool) that determines if an end tag is acceptable or not in current state. Actually to complete an end element, its name should be available, but due to the difference between QName and end tag name, it should be (and could be) implemented without RELAX NG validation stuff (to support such functionality, just keep start tag names in a stack). Similarly, you should also keep track of in-scope namespace declaration to fill proper prefix that is bound to a namespace of the QName contained in the results.

Oh, BTW don't ask Alp about that "dream": he has many other tasks and interests ;-)

mcs now supports /doc

| No Comments | No TrackBacks

Finally, I checked in /doc support patches in mcs. I remember the first patch was written in a day, nearly 7 months ago, and that worked mostly fine.

During the hacking on it, I found some problems around /doc feature:

  • There is no assurange to have documentation lines in the expected order when we are using partial types. (We can control by ordering the file names to csc.exe, but can you, especially when you use vs.net?)
  • There is no check whether a documented prefix 'T:' 'F:' 'P:' 'M:' are correct.
  • T:namespace_name is incorrectly allowed.
  • There is no invalid comment check on (and between) attribute tokens.
  • Parsing comment line looks hacky. Those '///' lines seems handled individually and thus they are laid out for each line, but when we have split markup like "/// <see\ncref='F:Foo'" it will connect those two lines, which means that the whole markup might be parsed (i.e. checked well-formedness) per line.
  • "cref" attributes are pretty nasty. There is no normalization for members in the type itself (e.g. if you have TestType.FooField, there will be cref="F:TestType.FooField" and cref="F:FooField" in the resulting document).

... and more (I cannot remember anymore right now). Well, some of them are not actually problems. Some looks just bugs.

Well, actually csc must be doing better job than my hacky cref interpretation. It seems recursively tokenizing the attribute as a type name as well as the source itself, while I don't.

Anyways, it is kind of job I did only because there are some users (originally it would have been used to examine our System.Xml implementation by using NDoc in practice). I think monodoc format is much better and I don't think C# doc feature is good, as a translator who keeps track of changes in original document, usually from document themselves, not from source code files.

Now I am so glad that I can fully go back to sys.xml hackings.