monogatari

Google Street View hit Japan

During Novell's Hack Week I started a project I was interested around moonlight, but haven't really done it. I'll revisit it once I got one example running (so, no example runs fine yet).

Recently Google Street View has launched in Japan this Summer, and it caused a lot of flamatory and bashing against Google. Details can be read at GlobalVoicesOnline ([*1] and [*2]). Similar arguments occured in France, England, Canada etc., and the U.S (I think, everywhere).

I'm not to explain all about the arguments, but to just update my recent status. Read those articles above if you are interested.

As one of the board members of MIAU, I was privately busy for preparing a symposium on Google's Street View, to discuss it publicly, for about 60 attendees. (Japanese news links: [*1], [*2] and [*3])

Apart from the organization which stands at neutral position, I keep (blogging about it (in Japanese) mostly raising caution that people should calm down, point problems precisely in precise context and distinguish issues and nonissues for each subject people raised, so that we do not have to shut down any kind of web service deployments unnecessarily. Even standing on such prudent position, it's been very hard to correct those furious people. I am also defamed by a lot of people including Anonymous Cowards in slashdot.jp for posting fair (in my belief) evaluation on those opinions. (Like "If there's risk of some property rights then it should be evaluated in an evenhanded fashion" => "F you")

It's not an easy bug to get resolved.

|

rejaw

Yesterday friends of mine have released Rejaw, a microblogging-like, Comet-based threaded web chat infrastructure. It is with a full set of API. The introduction can be read here. I was one of the alpha testing users.

There are already some articles introducing Rejaw:

|

interesting article on how MySpace and Facebook are failing in Japan

This article mostly explains correctly why MySpace and Facebook are not accepted by Japanese.

joining SNS by real name regarded as dangerous

Yes, when Facebook people visited Japan and explained their strategy to expand their market in Japan, by advertising "trusted by real name" network, we found it mostly funky. As the article explained, it is already achieved by mixi. And by that time, mixi is already regarded as dangerous "by exposing real name too widely".

One of the example accidents happened in 2005 August, at the comic market. Comic market place is usually flooded by terrible numbers of otaku people, and they used to be looking bad in general. One of a part time student workers at a hot dog stand wrote an entry in "mixi diary" like: "there was a lot of ugly otaku people there. eek!"

While it is pretty much straightforward, those otaku guys got hurt (at least some of them loudly claimed so), upset and started "profiling" who is that student. It was very easy in mixi, because mixi at that time encouraged to put real-life information with real name. No sooner she was then flooded by a lot of blaming voices, she disappeared from mixi.

OK, she was too careless, or ideally she should not write it (it is always easy to say something ideal). But she was not a geek and does not really understand how the network (mixi) is "open" to others (it is not really "open" by invitation filters, but as mixi grew up to have millions of users, it is of course not "trusted network" anymore). She didn't blame a specific person, and hadn't felt guilty until the company forced her to apologize. This kind of "careless" accidents has kept happening in mixi and it became a social problem.

Nowadays we have the same issue around "zenryaku-prof", where not a few children has faced troubles (for example sexual advances) by the face that the network is "open" to the web by default, while they think it isn't.

Though there must have been similar incidents outside Japan too (for example people fired by his or her blogs), the above (I believe) is the general understanding of the situation in Japan.

Mobile web madness

Another obvious point for Japanese, but would not for else, is that Japanese mobile web support is more important than anything, to get more people joined. Mixi is of course accessible from our cell phones. Even more funky example is "mobage-town", which used to limit access only from cell phones(!). (It is done by sending "contract ID", which is terrible BTW.) Mobage-town is one of the mega hit site in Japanese mobile web. It is mostly for games on the cell phones, but also has a huge SNS inside. It is also funky that the network used to be mostly filled by under-20 children. (Now children grew up to above 20, so the number is not obvious.)

Typically Japanese people spend a lot of boring time, between home and their offices or schools, on trains or buses. They can only do some limited "interesting" stuff. It used to be readings for example, and nowdays it is the mobile web.

Twitter was very successful unlike those failing players. Though I don't think the explanation on the TechCrunch article is right. Twitter had spread by "movatwitter", which is designed as the mobile web UI (and twitter is fully accessible by the API) as well as some additional values such as on-the-fly photo uploader (like Gyazickr for iPhone). It also filled our need (microblogging is a very good way to fill our boring time during our daily move). It lived very well in the mobile web land: no Javascript, no applets, no requirements for huge memory allocation.

When facebook is advertised with its API, what came to my mind was: "Is it even possible to make it for Japanese mobile web? nah"

While we, as a member of "open" world wide web, do not really like this mobile-only web (probably we should read Jonathan Zittrain), it is not a trivial demand that a cell phone is accessible to the mobile-only network. For example, iPhone 3G does not support it (iPhone BTW lacks a lot of features that typical Japanese people expect: for example, camera shake adjuster, it does not provide mobile TV capability, the mobile wallet etc.). It is often referred as "Galapagos network", which is intended as failing to expand their businesses abroad (one of the commentor on the TechCrunch entry mentions it. It is even funny that those iPhone enthusiasts try to claim that their web applications are "open" (as compared to Japanese mobile-only network).

BTW a commentor on the TechCrunch entry tries to object the fact written by the article by quoting google trends worldwide. But (including the graph on top of the article) it is a typical failure on measuring Japanese web access statistics: it does not reflect mobile web access. It is already explained (in Jaanese) very well. The simple fact is that it is becoming less ambient through Alexa, google trends or whatever similar.

SNSes are often domain specific

We would have seen similar phenomena in everywhere else. In China it is QQ. Orkut quickly became SNS for Brazil. There is no universal best.

What other SNSes find business chances in non-mainland countries is some specific purpose. For example, MySpace in Japan is good for producing musicians with its rich UI (many of them would also use mixi as well though).

|

You Mono/.NET users do NOT use XML, because you don't really know XML

Are you using XML for general string storage? If yes, most of you are likely wrong. If you do not understand why the following XML is wrong and how you can avoid it, do NOT use XML for your purpose.

<xml version="1.0" encoding="utf-8">
<root>I have to escape \u0007 as &#7;</root>

If you gave up answering by yourself, read what Daniel Veillard wrote 6 years ago.

|

System.Json

I read through Scott Guthrie's entry on Silverlight 2.0 beta2 release and noticed that it ships with "Linq to JSON" support. I found it under client SDK. The documentation simply redirects to System.Json namespace page on MSDN.

So, last night I was a bit frustrated and couldn't resist. I'm not sure if MS implementation should work like this yet though.

(You can build it only manually right now; "cd mcs/class/System.Json" and "make PROFILE=net_2_1". The assembly is under mcs/class/lib/net_2_1. You can't do "make PROFILE=net_2_1 install".)

I picked up a sample on another Linq to JSON project to try mine - that is, James Newton-King's JSON.NET. At first, I simply replaced type names and it didn't work. The problematic lines are emphasized below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Json;

using JsonProperty = System.Collections.Generic.KeyValuePair<
    string, System.Json.JsonValue>;

public class Post
{
  public string Title;
  public string Description;
  public string Link;
  public List<string> Categories = new List<string> ();
}

public class Test
{
  public static void Main ()
  {

List<Post> posts = new List<Post> ();
posts.Add (new Post () { Title = "test", Description = "desc",
    Link = "urn:foo" });

JsonObject rss = 
  new JsonObject(
    new JsonProperty("channel",
      new JsonObject(
        new JsonProperty("title", "Atsushi Eno"),
        new JsonProperty("link", "http://veritas-vos-liberabit.com"),
        new JsonProperty("description", "Atsushi Eno's blog."),
        new JsonProperty("item",
          new JsonArray(
            from p in posts
            orderby p.Title
            select (JsonValue) new JsonObject(
              new JsonProperty("title", p.Title),
              new JsonProperty("description", p.Description),
              new JsonProperty("link", p.Link),
              new JsonProperty("category",
                new JsonArray(
                  from c in p.Categories
                  select (JsonValue) new JsonPrimitive(c)
                  ))))))));

Console.WriteLine ("{0}", rss);
JsonValue parsed = JsonValue.Parse (rss.ToString ());
Console.WriteLine ("{0}", parsed);

  }
}

Unlike Linq to JSON in JSON.NET, JsonObject and JsonArray in System.Json do not accept System.Object as argument. Instead, it requires strictly-typed generic IEnumerable of JsonValue. That is, for example, IEnumerable<JsonPrimitive> is invalid. Thus I was forced to add explicit cast in my linq expression.

It may be improved in SL2 RTM though. Linq to XML classes accept System.Object argument to make it simple (and well, yes, sort of error-prone).

|

no passionate SOAP supporter in Mono yet

Two years ago I once worked on System.Web.Services complete 2.0 API. That was the worst shamest work I ever had (I hate SOAP). The work was done two years ago, and then there has been no maintainer since the completion. If there is anyone who loves SOAP, he or she can be a hero or heroine.

|

XIM support on Windows Forms

Recently Jonathan (Pobst) announced that our Winforms became feature complete.

Besides feature completeness, there is another important change in the next mono release. We were not ready for multi byte character input. Since we use X11 for Winforms on Linux, we have to treat multi byte input as special case.

We had some code that handles XIM but that was not for text input support. Since there was no one in the winforms team who lives in multi byte land, we hadn't take care of it.

I was not familiar with those stuff, so it took somewhat long time, but lately I got it practically working. I use scim and Atok X3 (it is one of popular traditional commercial Japanese IM engine), which show different behavior / use of X.

The style is over-the-spot (like .NET) and hence it is not as decent as Gtk (i.e. on-the-spot). Also, it supports only XIM (it would be possible to support better IM frameworks like IIIMF). But for now I'm content with a working international input support.

So if you are living in multi byte land, try the next release or daily build version of mono and feel happy :-)

|

Linq to DataSet

Since (I thought) I'm done with winforms XIM support, I moved away and started working on System.Data.DataSetExtensions two days ago. Now it's feature complete.

|

WebHttpBinding - my unexpected Hack Week outcome

Novell had the second "Hack Week". At the beginning I never thought that I was going to implement WebHttpBinding from .NET 3.5 WCF (I was planning to play around some XSLT stuff but soon changed my mind). Actually I have to say, it is not an outcome from just a week - from our patch mail lists I seem to have spent about two weeks. So, I'm cheating ;-)

Anyhow, the Hack Week is over, and now I have partly implemented WebHttpBinding and its family, with a (simple) sample pair of a client and a server that sort of work.

Note that it is (and our WCF classes are) immature and not ready to fly yet.

For details

Want to know the details? Are you serious? OK ... so we have:

... and couple of other bits (such as WebChannelFactory or WebServiceHost, which are really cosmetic).

We still don't have couple of things (I am enthusiastic for explaining how we are incomplete :/) :

Here I haven't mentioned everything in System.ServiceModel.Syndication namespace, which is already done.

I also have current class status of System.ServiceModel.Web.dll in Olive.

|

Stopping making copyright law worse than before

Congraturations Canadian Citizens!

We, MIAU, are also fighting right now. We are so busy on preparing for the upcoming symposium which is to be held on 26th.

|

Syndication API

Wow, I haven't written anything for 6 months. Actually I had been quite busy(?) on several things (namely founding MIAU). In Mono land I had been working on 2.0 API completion those days.

Anyways. At the end of Mono Summit last month, Miguel sorta asked me to hack Syndication API in .NET 3.5, which will be likely part of Silverlight 2.0. So, after some terrible experience in trip back to home (I have lost my buggage on my returning flight, and I had some bad stomachache for more than a week ...), I have started working on it (while I'm having sparse work days and a lot of off days this month). And now, System.ServiceModel.Syndication is almost done.

(... no, you don't have to waste your precious time on learning WCF. You can use the Syndication API almost independently.)

|

Moved

This blog was moved. Well, only literally, and the same contents still appear, as long as it is blog text. The rental hosting server I used to use, stepserver.jp, had a critical accident that all files entirely vanished. This is the worst shutdown for me and hence let me decide move to other place.

|

The firefox add-ons that cause performance down

Today I attended to Firefox devcon Summer 2007 Tokyo. There was more than 150 attendees, which was pretty big for such a conference for an application. I heard that even meetings in Mountain View were not like this. Japanese mozilla community certainly seems to be big.

It is nothing to do with the conference above, but this anonymous guy showed an interesting analysis on which Firefox add-ons cause performance down, with three test pages (6 level tables, 7 level tables and JS-CPU bench). It is written in Japanese so most of you wouldn't be able to read it, but the culprit add-ons can be seen as listed in a table in the middle of the page (the table without "ok"). The biggest one was AdBlock Plus. His add-on adjustments resulted in about 4x perf. boost:

According to this guy, (I'm not sure if it is true but) it seems that IPv6 support is said a bad boy, but it is not true. Interesting.

|

olive XLinq updates

During Japanese holiday week (aka Golden Week), I made significant updates to our XLinq stuff (System.Xml.Linq.dll) in "olive" tree. It has been about 1 year and half since the first checkin, and except for some cosmetic changes it had been kept as is.

It still lacks couple of things such as xsd support and XStreamingElement, but hopefully it became somewhat functional. And from the public API surface by corcompare it is largely done.

Miguel and Marek has been adding some of C# 3.0 support to the compiler, and I heard that there was another hacker who implemented some type inference (var) stuff, so at some stage we will see Linq to XML in action (either in reality or in test tube).

Other than that I did almost nothing this holiday week, except for spending time on junk talk with Lingr fellows. The worst concern about lingr (which is agreed among us, including one of the four lingr dev. people) is lowering productivity due to too much fun talking with them, especially when it is combined with twitter, which is somehow becoming bigger in Japan ...

Speaking of twitter, I was asked some mono questions (mostly about bugs) by a Japanese hacker and ended up to know that he wrote TwitterIrcGateway and was trying to run it with mono. It is a WinForms application (for task tray control) which requires some soon-to-be-released v1.2.4 features but it just worked out of the box on my machine. It is so useful.

|

Bugs and standard conformance, when it comes from MS

My recent secondary concern is about the lack of legal mind and English-only nationalism in American House of Representatives on "Ianfu" issue, as well as the concern on Nanking massacre denialists's attack on the film 'Nanking'. But both are political topics that I usually don't dig in depth...

My recent primary concern is about the essential private rights in Japanese private law system, which seems correctly imported from German pandecten and incorrectly adopted to Japanese law. My recent reading Japanese book "the distance between freedom and privilege", which is based on Carl Schmidt and talks about the nature of limitation on (property) rights, was also nice. I'm hoping that those arguments would decrease some basic misunderstandings on the scope of copyrights in principle.

Anyways.

As some of mono community dudes know, I always fight against those who only consider .NET compatibility than conformance to standards. They end up to try to justify date function to be incorrect even in the proposal draft of standards just because of a bug in MS Excel (as well as Lotus), which doesn't really make sense.

Anyways what I remembered on seeing Miguel's doubt on MS compiler (csc.exe) behavior was my own feedback to MS that csc has ECMA 334 violation on comparison between System.IntPtr and null (I believe System.IntPtr.Zero is introduced exactly for this issue). I reported it about one year ago when I was trying to build YaneSDK.NET (an OpenGL-based game SDK) on mono.

This bug had left until Jan 31, and then it was closed without any comments. I quickly reopened it and thus it is still active. Of course I totally don't mind that .NET has bugs (having live / long-standing issues is very usual) including ECMA violations.

Since I have reported couple of .NET bugs, sometimes I saw "interesting" reactions. In this case, MS says that this is "a bug in ECMA spec" so that "they will come up with a new spec". They think they "own" the control of the specification (I'm not talking about the ownership of the copyrights on the specification).

After all, it seems that I'm always concerned about rules and justifications.

|

Kind of back

Looks like six months have passed since my last entry. I was not dead at least literally. Lately my interest has been on some legal stuff (especially criminal law) which is of course in Japanese context. Anyways ...

So, I have been working on WCF implementation now. The Olive plan was announced back in October, but the work was started much earlier, like in Autumn 2005. Though as you would guess, it has been flaky - I haven't chosen WCF over helping Winforms last spring, or over ASP.NET 2.0 Web Services last winter (and it's still ongoing).

BasicHttpBinding was already working last spring, and ClientBase was working last summer. Ankit had been working on WSDL support and ASP.NET .svc handler, until we decided some work reassignment during the Mono Meeting days in October. (He is now working on MonoDevelop - am so envious! ;-) MonoDevelop hacking is a lot of joy than WS-* sack.)

After making ClientBase working, my primary task in this area has been to get WS-Security (SecurityBindingElement) working. WS-Security and WS-SecurityPolicy are nightmare. There is a lot of work in this area. I dug several security classes in depth to make my sample code simpler, simpler and simpler. Now I have very primitive pair of service and client. And after several bugfixes primarily in System.ServiceModel.dll and System.Security.dll, finally I got my WS-Security enabled message successfully consumed by WCF :-)

Probably none of you would understand why I am so excited about it since the code still is not practically working yet, but I am happy right now. I would need a lot more words to explain why, and I doubt it is worthy :|

|

Microsoft Permissive License

As I often write here, the worst legal violation I dislike is False Advertising.

Did you ever know that there are more than just one Microsoft Permissive License? I only knew this one which is widely argued that it could be regarded as conformant to Open Source Definition.

I originally disliked the idea that this MS-PL labelled as one of that "Shared Source License", as it rather seemed an attempt to justifying existing restricted shared source licenses, while I like that MS-PL terms in general.

Now, read this Microsoft Permissive License. It is defined as:

(B) Platform Limitation- The licenses granted in sections 2(A) & 2(B) extend only to the software or derivative works that you create that run on a Microsoft Windows operating system product.

This is an excerpt from Shared Source Initiative Frequently Asked Questions:

Q. What licenses are used for Shared Source releases?

A. [...] Microsoft is making it easier for developers all over the world to get its source code under licenses that are simple, predictable, and easy to understand. Microsoft has drafted three primary licenses for all Shared Source releases: the Microsoft Permissive License (Ms-PL), the Microsoft Community License (Ms-CL), and the Microsoft Reference License (Ms-RL). Each is designed to meet a specific set of developer, customer, or business requirements. To learn more about the licenses, please review the Shared Source Licenses overview.

(Emphasis by myself.)

Now I wonder if this is the exact reason why Microsoft did not submit MS-PL as Open Source Definition conformant license.

Now I ponder the legal affection on this behavior (as I was a law student). If it is not regarded as illegal false advertising, it would result that any kind of license statements lose reliability. If it is regarded as illegal, it would result that any kind of authors are imposed to keep their licenses every time they modified. Oh, maybe not all. It could be just limited to wherever SOX laws apply. Which is better?

|

CCHits

Recently I got to know CCHits, a digg-like music ranking website specialized to which are available in Creative Commons license (as a normal ex law student I am a big fan of CC). This is what I really wanted to see or build. Since I just moved to my new room and am without fast connection I cannot practically use it right now, but I really want to try it asap. Music is one of the things that I dream to be free, like cooking recipe where no rights lie and there are still improvements. There is a lot of boring commercial songs that are still used just to make some sounds in shops and waste money.

Now what lacks there is the ranking counts. I'd like to see at least hundreds of votes. When I'm ready, I'd start from listening high-rated songs and add more hits to make this website grown up enough like digg.

There are some great search engines to help finding CC musics: Yahoo! and Google. I love those search engines, not because they are innovative, but because they recognizes how freedom represented as CC is important. There are some other search engines which are rather interested in their own "innovation", but is not something that I am interested.

It also reminded me of what Microsoft people said about Google Spreadsheets. I think such argument that "Google Spreadsheets is not innovative because it is what google acquired" is really ignorable. That made me feel that Microsoft altitude is 10 years behind Google. I am rather interested in the company principle behind (say) Google Book Search.

|

OpenDocument validator

Someone in the OpenDocument Fellowship read my blog entry on RelaxngValidatingReader and OpenDocument, and then asked me if I can create example code. Sure :-) Here is my tiny "odfvalidate" that validates .odt or its content xml files (any of content.xml, styles.xml, meta.xml, or settings.xml should be fine) using the RELAX NG schema.

I put an example file to validate, actually simply converted OpenDocument specification document by OO.o 2.0. When you validate the document, it will report you a validation error which is really an error in content.xml.

Mono users just can get that simple .cs file and OpenDocument rng schema, and then compile it as "mcs -r:Commons.Xml.Relaxng -r:ICSharpCode.SharpZipLib". Microsoft.NET users can still try it, by downloading those dll files listed there.

Beyond my tiny tool, I think AODL would be able to use it. In the above directory I put a tiny diff file for AODL code. Add Commons.Xml.Relaxng.dll as a reference, add OpenDocument-schema-v1.0-os.rng as a resource and then build. I haven't actually run it so it might not work though.

If any further questions, please feel free to ask me.

|

We need WinForms bugfixes

Lately I was mostly spending my time on bugfixing core libraries such as corlib and System.dll. Yesterday I got to know that we need a lot of bugfixes in our MWF (managed win forms).

So I have started to look around existing bug list and work on it. At first I thought that it is highly difficult task: to extract simple reproducible code from problematic sources, and of course, to fix bugs. Turned out that it is not that difficult.

Actually, it is very easy to find MWF bugs. If you run your MWF program, it will immediately show you some problems. What I don't want you to do at this stage is just to report it as is. We kinda know that your code does not work on Mono. It's almost not a bug report. Please try to create good bug report. Here I want to share my ideas on how you can try creating good MWF bug reports and bugfixes.

(Dis)claimer: I'm far from Windows.Forms fanboy. Don't hit me.

the background

Mono's Windows Forms implementation is WndProc based, which is implemented on top of XplatUI framework. For drawing we have Theme engine which is implemented on top of System.Drawing which uses libgdiplus on non-Windows and simply GDI+ on Windows.

Thanks to this architecture, if you have both Windows and Linux (and OSX) boxes, you can try them and see if all of them fail or not. In general we have XplatUI implementation and Theme implementation for each environment (XplatUIWin32, XplatUIX11, XplatUIOSX and ThemeClearLooks, ThemeGtk, ThemeNice, ThemeWin32Classic), so the bug you are tring to dig might be bugs in there (well, also note that if you have wrong code such as Windows-path dependent code, it won't work. Here I continue winforms-related topic). To find out the culprit layer:

See our WinForms page for more backgrounds.

find the culprit

The first stage is to find out the problematic component.

create a simple repro code

Once you think you found the culprit control, try to create a simpler reproducible code. It does not only help debugging (those bugs without "readable" code are not likely be fixed soon), but also avoids your mistaken assumption on bugs.

When you try to create simple repro code, you can reuse existing simple code collection.

If you are not accustomed to non-VS.NET coding, Here is my example code:


using System;
using System.Windows.Forms;

public class Test : Form
{
	public static void Main ()
	{
		Application.Run (new Test ());
	}

	public Test ()
	{
		#region You can just replace here.

		ComboBox cb = new ComboBox ();
		cb.DropDownStyle = ComboBoxStyle.DropDownList;
		cb.DataSource = new string [] {"A", "B1", "B2",
			"C", "D", "D", "Z", "T1", "B3", "\u3042\u3044"};
		Controls.Add (cb);

		#region
	}
}

Compile it with mcs -pkg:dotnet blah.cs.

bugfixing

After you create good bug reports (also note that we have a general explanation on it), it wouldn't be that difficult to even fix the bug :-) Actually we receive a lot of bug reports, while we have relatively few bugfixors.

As Ximian used to be GNOME company, external skilled Windows developers would know much better on how to implement MWF controls and fix bugs. In my case I even had to start bugfixing from learning what kind of WindowStyles there are (WS_POPUP etc. that we define in XplatUIStructs.cs).

My way of debugging won't be informative... Anyways. When I try to fix those MWF bugs, I just use a text editor and mono --trace. My frequently used option is --trace=T:System.Windows.Forms.blahControl. I mostly spend time on understanding the component I'm touching.

When you created a patch (hooray!), adding [PATCH] prefix on the bug summary would be nicer so that we can find possible fixes sooner.

|

Another NVDL implementation

Lately Makoto Murata told me that another NVDL implementation appeared. I am extraordinary pleased. Before that there was only my NVDL implementation, which means that NVDL has not been worthy of the name of a "standard".

It is also awesome that this implementation is written in Java, which means that it has different fields of users. Now I wonder, when will C implementation be there? :-)

BTW Makoto Murata also gave me his NVDL tests which is in preparation and I ended up to make several updates on my implementation. Some tests makes me feel something like "uh? Is NVDL such a specification?" i.e. I had some misunderstanding on the specification. With it I could also update a few parts to the FDIS specification (I wrote my implementation when it was FCD). As far as I know the tests will be released under a certain open source license (unlike BumbleBee XQuery test collection).

|

the Mono meeting in Tokyo

We have finished the latest Mono meeting in Tokyo targetting Japanese, with Alex Schonfeld of popjisyo.com. (we call it "the first meeting" in Japanese but as you know we used to have some non-Japanese ones.)

At the time we announced it [1] [2] I personally expected 5 or 6 people at best so that we can make somewhat in-depth discussion about Mono, depending on the attendees. What happened was pretty different. We ended up to welcome 16 external attendees (including akiramei), so we rented the entire room in the bar and mostly spent time from our presentations and Q & A between them and me. We were so surprised that such tiny pair of announcements were read by so many people.

It was a big fun - there was an academic guy who writes his own C#/VB compilers referencing mcs/gmcs. There was a guy who wants to know some details about JIT. I also met a person who was trying my NVDL validator. There was a number of people who are interested in running their applications under Mono (well, Alex has also long been interested in ASP.NET 2.0 support). Many guys were interested in how Mono is used in production land. There were also many guys claiming that Mono needs easiler debugging environment. Some people are interested in how Mono is ready for commercial support.

As usual I have some regressions on the progress but the meeting went so well. It is very likely that we plan another Japanese Mono meeting.

|

opt. week

Happy new year (maybe in Chinese context).

About a week and more ago Miguel told me that there is summary page for performance comparison for Java, Mono and .NET by Jeswin.P.

It is always nice to see that someone is kind enough to provide such information :-) The numbers however are not good for Mono, so I kinda started attempt to enhance mono XML performance.

Miguel told me that I can't know how those performance improvement record is impressive to some users (well, I totally agree; I was not surprised to see the perf. results). So I started to record that XMLmark results, now using NPlot (for Gtk# (0.99)). You can see sort-of-daily graph, like:

DOM performance plot SAX performance plot

As you see, this measuring box has some errors, but it won't matter so much. Anyways, not bad. The most effective patch was from Paolo.

The next item I want(ed) to optimize was Encoding. I have already done some tiny sort of optimization in UTF8Encoding (the latest svn should be about 1.2x-1.3x faster than 1.1.13), but it needs more love. It is ongoing.

The best improvements I have made was not any of the above. I had been optimizing my RelaxngValidatingReader by feeding OpenDocument grammar which is about 530 KB and some documents (I extracted it and converted to .odt using OO.o 2.0 and extracted content.xml).

At first I couldn't start from that massive schema. I had to fix several bugs. I spent most of hacking time on it last winter vacations. It became pretty solid. Those new year days I started to validate some OpenDocument content.

I started from one random pickup of OASIS docs, maybe this one, which was about 70KB, 1750 lines of instance. At svn r54896 it almost stopped around line 30: too bad start. I started to read James Clark's derivative algorithm again, and started several attempt to optimize it. After some step-by-step improvements, at r55497 the validator finally reached to the end for the first time. I still needed further optimization since it cost me about 80 seconds, but one hour later it went nearly 20x faster, at r55498.

After having 70KB document validated within 5 seconds, I thought it was time to try the biggest one - OpenDocument 1.0 specification. I added indentation to this 4MB of content.xml and it became 5MB. After fixing XML Schema datatype bug that was found at line 4626, I had to make further optimization, since it was eating up the memory, looked almost infinite. I dumped the list of total memory consumption for each Read(), but the fix (r56434) was stupidly simple. Anyways finally I could validate 5MB of an instance in 6 seconds.

So, now it is possible enough to validate OpenDocument xml using RelaxngValidatingReader, for example with AODL, An OpenDocument Library.

Maybe it's time to fix remaining 8 bugs found by James Clark's test suite - 7 are from XLink 5.4 violation, and 1 is in unique name analysis.

(sorry but no plot for RELAX NG stuff, it is kind of non-drawable line ;-)

|

Lame solution, better solution

During my in-camera vacation work, I ended up to fix some annoying bugs in Commons.Xml.Relaxng.dll. Well, they were not that annoying. I just needed to revisit section 4.* and 7.* of the specification. Thanks to my own God, I was pretty productive yesterday to dispose most of them. After r54882 in our svn, the implementation became pretty better. Now I have 13 out of 373 testcases from James Clark's test suite. Half of them are XLink URI matter and most of the remaining bits are related to empty list processing (<list><empty></list>, which usually is not likely to happen).

Speaking of Commons.Xml.Relaxng.dll, I updated my (oh yeah here I can use "the" instead of "my" for now) NVDL implementation to match the latest Final Draft International Standard (FDIS). The specification looks fairly stable and there were little changes.

MURATA Makoto explains how you can avoid silly processor-dependent technology (not) to validate xml:*, such as System.Xml.Schema.XmlSchemaValidationFlags. Oh, well you don't have to worry about that there is no validator for Java - someone in rng-users ML wrote that his group is going to implement one. It would be even nicer if someone provide another implementation for libxml (it won't make sense if I implement some).

I hope I have enough time to revisit this holiday project (yeah it is mostly done in my weekends and holidays) rather than wasting my time on other silly stuff.

|

Holy shit

Wow, I haven't been blogged for a while. It just tells that I have done nothing that long. And am going to hibernate this year (having summer vacations next two weeks).

I wanted to get somewhat useful schema before writing some code for Monodoc. After getting lengthy xsd, I decided to write dtd2rng based on dtd2xsd (this silly design is because I cannot make Commons.Xml.Relaxng as System.Xml dependency) and got a smart boy. (Well, to just get converted grammar like this, you could use trang).

We are going to have Mono meeting in Tokyo next Monday. The details are not set yet, but if any of you are interested, please feel free to ask me the details. (it is likely to be just having dinner and drinking, heh).

Last month I went to Kyoto (Japan) with Dan (Mills, aka thunder) and found a nice holy shit:

holy shit

Unfortunately, I lost this religious treasure last week :-( I was pretty disappointed. It must be from I entered a Starbucks to I got on a train (I noticed that I lost it at that time). So I was wondering if I should ask those Starbucks partners like "didn't you guys see my shit here?" All my friends whom I asked for advice were saying I should do, so I ended up asking the partners.

"I might lost my cell accessory here."

"What colour was it?"

"Well, it was a golden cra..."

"Ok, please wait for a while"

... and then she went into staff room, and said there was no such one. After a few seconds of wondering if I should be more descriptive, but was reminded. Shame.

|

the Adventure of Linq

Chris just needed 6 hours, but I needed much more.

|

don't use DateTime.Parse(). DateTime.ParseExact() instead

It seems that very few people know that DateTime.Parse() is COM dependent, evil one. Since we don't support COM, those COM dependent functionality won't work fine on Mono.

Moreover, even in Microsoft.NET, there is no assurance that such string that DateTime.Parse() accepts on your machine is acceptable under other machines. Basically DateTime.Parse() is for newbies who don't mind unpredictable behavior.

Yes,

DateTime.ParseExact (datestr, CultureInfo.CurrentCulture .DateTimeFormat.GetAllDateTimePatterns (), CultureInfo.CurrentCulture)

instead of

DateTime.Parse(datestr)

is quite annoying because of its lengthy parameters. If you think so, you can push Microsoft to provide an equivalent shortcut, like I did. But first thing you should do right now is to prepare your own MyDateTime.Parse() that just returns the same result of the above. It's also true to several culture-dependent String methods (IndexOf, LastIndexOf, Compare, StartsWith, EndsWith - all of them are culture sensitive, even with InvariantCulture) to avoid unexpected matches/comparisons (remember that "eno" and "eno\u3007" are regarded as equivalent).

|

Default encoding for mcs

Today I checked in a patch for mcs that changes the default encoding of the source file, to System.Text.Encoding.Default instead of Latin1. Surprisingly we have been using it as the default encoding, regardless of the actual environment. For example, Microsoft csc uses Shift_JIS on my Japanese box, not Latin1 (Latin1 is even not listed as candidate when we - Japanese - save files in text editors).

This change brought confusion, since we have several Latin1-dependent source files in our own class libraries, and on modern Linux environment the default encoding is likely to be UTF8, regardless of the region and the language (while I was making this change on Windows machine). Thus I ended up to add /codepage:28591 to some classlib Makefiles at first, and then to revert that change and started a discussion.

I felt kinda depressed, on the other hand I thought it's kinda interesting. I must be making such a change that makes sense, but it is likely to bring confusion to EuroAmerican people. Yes, that is what we (non-EuroAmericans) had experienced - we could not compile sources and used to wonder why mcs can't, until we tried to use e.g. /codepage:932. When I went to Japanese Novell customer to help deploying ASP.NET application on SuSE server, I had to ask them to convert the source to UTF-8 (well now I think I could ask to set fileEncoding in web.config instead, but anyways not sure whether CP932 encoding worked fine or not).

If mcs does not premise Latin1, your Latin1-dependent ASP.NET applications won't run fine in the future version of mono on Linux. Even though mcs compiled sources with the platform-dependent default encoding instead of Latin1, it does not mean that your ASP.NET applications written in your "native" encoding (on Windows) will run fine on Linux without modifications. As I wrote above, the default encoding is likely to be UTF8 and thus you will have to explicitly specify "fileEncoding" in your web.config file.

Sounds pretty bad? Yeah, might be for Latin1 people. However, it has been happening on non-Latin1 environment. Thus almost no one is likely to be saved. But going back to "Latin1 by default" does not sound a good idea. At least it is not a culture-neutral decision.

There is a similar but totally different problem on DllImportAttribute.CharSet when the value is CharSet.Ansi. It is locale dependent encoding on Microsoft.NET, but in mono it is always UTF-8. The situation is similar, but here we use platform-neutral UTF-8 (well, I know that UTF-8 itself is still not culture-neutral, as it definitely prefer Latin over CJK. It lays Latin letters out in 2 byte areas, while most of CJK letters are in 3 byte areas, including only about 200 letters of Hiragana/Katakanas).

|

Managed collation support in Mono

Finally I checked in managed collation (CompareInfo) implementation in mono. It was about four months task, while I mostly spent my time just to find out how Windows collation table is composed. If Windows is conformant to Unicode standards (UTS #10, Unicode Collation Algorithm, and more importantly Unicode Character Database), it would have been much shorter (maybe in two months or so).

Even that I spent extra months, it resulted in a small good side effect - We got the Fact about Windows collation, which had been said good but in fact turned out not that good.

The worst concern now I have about .NET API is that CompareInfo is used EVERYWHERE we use corlib API such as String.IndexOf(), String.StartsWith(), ArrayList.Sort() etc. etc... They cause unexpected behavior on your application code which does not always have to be culture sensitive. We want them only under certain situations.

Anyways, now managed collation is checked in, though right now it's not activated unless you explicitly set environment variable. The implementation is almost close to Windows I believe. I started from SortKey binary data structures actually that was described in the book "Developing International Software, Second Edition"), sorting out how invariant GetSortKey() returns computed sort keys for each characters to find out how characters are categorized and how they can be drawn from Unicode Character Database (UCD). Now I know most of the composition of the sortkey table, which also uncovered that Windows sorting is not always based on UCD.

From Unicode Normalization (UTR #15) experience (actually it was the first attempt for my collation implementation to implement UTR #15, since I believed that Windows collation must have been conformant to Unicode standard), I made quick codepoint comparison optimization in Compare(), which made Compare() for the equivalent string 125x faster (in other words it was so slow). It became closer to MS.NET CompareInfo but still far behind ICU4C 3.4 (it is about 2x quicker). I'm pretty sure that it could be faster if I can spend more memory, but it is corlib where we should mind memory usage.

I tried some performance testing. Note that they are just some cases and collation performance is heavily dependent on the target strings. Anyways the results were like below and the code are here. (The numbers are seconds):

CORRECTION: I removed "Ordinal" comparison which was actually nothing to do with ICU. It was mono's internal (pretty straightforward) comparison icall.

Optionsw/ ICU4C 2.8w/ ICU4C 3.4Managed
Ordinal1.5021.5220.761
None0.6710.3910.741
StringSort0.6710.3900.731
IgnoreCase0.6810.3910.731
IgnoreSymbols0.6710.4100.741
IgnoreKanaType0.6710.3810.741

MS.NET is about 1.2x faster than ICU 2.8 (i.e. it is also much slower than ICU 3.4). Compared to ICU 3.4, it is much slower, but Compare() looks not so bad.

On the other hand, below are Index search related methods (I used IgnoreNonSpace here. The numbers are seconds again):

UPDATED: after some hacking, I could reduce the execution time by 3/4.

Operationw/ ICU4C 2.8w/ ICU4C 3.4Managed
IsPrefix1.1710.2300.231
IsSuffix000
IndexOf (1)0.080.085.7081.092
IndexOf (2)0.080.080.2700.230
LastIndexOf (1)0.9920.9914.5361.162
LastIndexOf (2)0.1700.1614.7301.492

Sadly index search stuff looks pretty slower for now. Well, actually it depends on the compared strings (for example, changing strings in IsSuffix() resulted in that managed collation is the fastest. The example code is practically not runnable under MS.NET which is extremely slow). One remaining optimization that immediately come up to my mind is IndexOf(), where none of known algorithms such as BM or KMP are taken... They might not be so straightforward for culture-sensitive search. It would be interesting to find out how (or whether) optimizations could be done here.

|

the "true" Checklists: XML Performance

While I knew that there are some documents named "Patterns and Practices" from Microsoft, I didn't know that there is a section for XML Performance. Sadly, I was scarcely satisfied by that document. The checklist is so insufficient, and some of them are missing the points (and some are even worse for performance). So here I put the "true" checklists to save misinformed .NET XML developers.

What I felt funky was that they said that users should use XmlValidatingReader. With that they can never say that XmlReader is better than SAX on performance, since it creates value strings for every node (so the performance tips are mutually exclusive). It is not evitable even if you call Skip() or MoveToContent(), since skipping validation on skipped nodes is not allowed (actually in .NET 1.0 Microsoft developers did that and it caused bugs in XmlValidatingReader).

|