Technology ◊ Photography ◊ Still not driven to drink
Today sees the first release of a set of stylesheets for creating documentation and diagrams from RELAX-NG schemas.
The stylesheet rng2docbook.xsl does more or less what it says in the file name, creates a Docbook XML instance from a RELAX-NG schema. The stylesheet will document the content model and attributes of each element and each define in a RELAX-NG schema. Any documentation strings included in the default RELAX-NG documentation namespace will also appear in the Docbook output.
The second set of stylesheets creates one or more SVG diagrams from a RELAX-NG schema. You can choose to have the whole schema in a single diagram or to have separate diagrams for specific elements and defines (or for all elements and/or defines).
Read the full details on the stylesheets here.
A new release of TMTab, version 0.4.3 is now available for download.
TMTab is a tab-widget plug-in for the Protege ontology editor which enables Protege ontologies to be exported in XTM syntax.
This release is a bug-fix release. The two principle errors fixed are the occassional generation of duplicate ID attribute values for associations and the generation of numeric ID attribute values when using IDs generated from the Protege Knowledge Base objects directly.
This will probably be the last version of TMTab to support the 1.x releases of Protege. The next planned release (0.5.0) will support Protege 2.0. Watch this space!
A new development-branch release of TM4J is available from today. The new release, 0.9.0 alpha 2, adds features including the optional static merging of topics in the in-memory back-end; a GUI tool for managing persistent topic map stores; performance enhancements and bugfixes.
Download from SourceForge here.
Take a read through this interesting post on XML Dev. From bitter experience, I can second the assertion that to be sure that you have a valid WXS schema instance you need to validate with at least two parsers and preferably more. Worse still we have XML editors that, because they use one or the other of the common (broken) parsers or their own (usually broken) implementation, will reject valid instances or let you create invalid instances. How did we get here ?
Not such a long time ago, XML was new. Lots of people wrote parsers, because it was an easy thing to do. Lots and lots of people used those parsers because many of them were open source and/or free. Bugs were found. Bugs were fixed. Now most people use one of a handful of XML parsers that, for DTD validation at least, are robust and reliable.
W3C XML Schema has been a recommendation now for 2 and a half years. Some implementations are older than the recommendataion of course, but most are of the order of 2 years old. Why are there so many bugs ? Could it be that not as many people are using the tools (and so reporting the bugs) ? Possible, but not likely given that the W3C juggernaut is forcing even relatively sane people to use schemas in order to unbreak the mess that is Namespaces or because they need to user other standards that require schemas. So perhaps its because the developers of these tools aren't fixing bugs ? Have developers/software companies suddenly decided that parser bugs are "no big deal" - I find that hard to believe.
More likely is that W3C XML Schemas are just too complex to implement. My guess is that there are no "little bugs" left in the parsers, just nasty, hard-to-fix, deep-in-the-code bugs. The sort of bugs you get from trying to implement a specification as impenetrable as the 3 part monster from the W3C.
There is another way - RelaxNG and Schematron - now both under the wing of ISO in the DSDL work. Both have features that WXS does not. Both are much easier to understand and easier to write. The tools for these schema languages are, in my experience, robust and reliable. Of course, the reality is that WXS is here to stay and we have to deal with it - like it or not. But if you do have the luxury of a choice of schema languages, the practical programmer should take a good look at the tool sets available for these non-W3C languages and think hard before following the crowd.
Call me jaded, but its been a long time since I've seen anything on the web that made me laff this much.
A new wiki has been created for the discussion of topic map design and design patterns for topic maps. The wiki is hosted on Topic Map Central - a new resource created by Techquila for the collaborative development of topic maps and topic map patterns.
Today saw the release of a new version of TM4J on the "stable" branch of the project. This release fixes a few bugs reported with TM4J 0.8.2, but the primary focus is two new features.
The first is catalog-based resolution of locators which has been contributed by Murray Altheim. The resolver can be configured to read any number of XML catalog files and, when retrieving content from a URL specified by a Locator, the catalog will be consulted first. This allows local copies of remote files to be made and then requests for the remote files are redirected to the local copy through the catalog.
The second feature enhancement is the Administrator's Tool, a GUI tool for managing topic maps in a persistent store. Currently the tool supports importing and exporting topic maps from a persistent store, and deleting topic maps. In future versions of this tool, more operations will be added.
As usual, this release is available in source and precompiled binary packages in both gzipped tar and zip formats from SourceForge.
Just posted in the Publications section of the site is an update to the Topic Map Patterns For Information Architecture paper and a paper presented at XML Europe 2003 on Peer-To-Peer exchange of topic map information.
Also updated today was the thesaurus PSI document - adding new subjects and bringing the subject indicators in line with the patterns discussed in the new version of the IA paper.
Feeling inspired by tonight's ByteNight IT charity sleep-out, I was casting around for a way to help my favourite charity, Shelter. Then today I stumbled across JustGiving and so am now pleased to announce that TMTab and MDF are to be made "Charityware".
The basic idea is pretty simple. If you use TMTab or MDF and you find them useful, then please consider making a donation to Shelter through my sponsorship page. Its pretty easy to do and the money you donate goes towards helping Shelter to help the homeless and those with poor housing. Every penny helps!
A common modelling decision in creating a topic map is when to use an association with 3 or more roles (an n-ary association) and when to represent it as n-1 binary associations. Herewith a discussion on the relative merits of the two forms and some pointers (ok, opinions) on the Right Thing To Do.
In many cases in creating topic maps we are presented with the issue of how to represent n-way associations. Some examples could be:
- The members of a department (an association between one department and n-1 people)
- The books written by an author (an association between 1 author and n-1 books)
- The parts of an machine (an association between 1 whole and n-1 parts)
- A vote taken by a committee (an association between a decision and n-1 committee members)
- A murder depicted in an opera (an association between a victim, a murderer and a method of death)
The issue that comes up is whether to code these relationships in a single multi-legged association (an n-ary association) or several two-way associations (binary associations). There are trade-offs to be made, but in my opinion the first rule of thumb is:
Smaller is Better
Or more specifically, "More granular is better" - the smaller statements we make, the more control we have over them. Breaking statements up without creating new topics gives us the ability to apply metadata to those statements individually and to query, traverse and modify one statement without any impact on or concern for the others.
Of course, there is a point of diminishing returns and this is when you need to start adding new classes of entity to your model to be able to split up n-way associations. In general, if you can break up an n-way association without creating new topics, do it. If you need to create a new topic to break up an association it is likely that you are creating a topic that represents the fact of the association - if you end up having a need for that, then all well and good, but in most cases, it is something to be avoided as once you start down this reification route, its hard to know when to stop.
The second rule of thumb I follow is to ask:
"Is the association divisible without creating another topic."
In other words would it make sense to divide up the association into smaller (typically binary) associations.
Another third useful rule of thumb is:
"Does the presence of one player of a given role have any bearing on the presence of the other players"
In other words, if one player were removed, would the statement being made suddenly become untrue (rather than just incomplete).
So, with those three rules of thumb in hand...lets play the "Binary or N-Ary Game"!
The members of a department (an association between one department and n-1 people)
BINARY! - If Fred, Joe and Barney are members of the Finance Department, Fred and Joe will still be members after Barney retires. There is no dependency between the players of the 'member' role, so we can model this association as 3 binary associations rather than one four-way association.
The books written by an author (an association between 1 author and n-1 books)
BINARY! - 'Hunter S. Thompson wrote "Fear and Loathing in Las Vegas" and "Hell's Angels"'. These are independent facts and the statement as it stands is incomplete anyway (Thompson wrote more than those two books). In both English and Topic Maps, I can break this statement up into 'Hunter S. Thompson wrote "Fear and Loathing in Las Vegas"' and 'Hunter S. Thompson wrote "Hell's Angels"'. So I would model this as two binary associations rather than a single 3-way association.
The parts of an machine (an association between 1 whole and n-1 parts)
BINARY! or N-ARY! - If the meaning of the association is that it is a closed and complete list of all the components which make up the machine, it is reasonable to argue that without one of the components, the machine is not complete so in this case we should use an N-ary association that explicitly groups together all the components. On the other hand, often such part-whole relationships are often not complete (e.g. "The engine contains a fuel pump, spark plugs and a carburettor"), in which case the individual parts are independently related to the whole and so should be represented with binary associations.
A vote taken by a committee (an association between a decision and n-1 committee members)
N-ARY! - There is an example in the RDF Model And Syntax Specification (see section 3.5) which goes "The committee of Fred, Wilma, and Dino approved the resolution". The association between the resolution and Fred, Wilma and Dino cannot be subdivided as the decision was made collectively - we do not want to assert that one of the three made the decision, but instead that all three came to the decision (by some undocumented means). So in this case, an N-Ary association provides us with the necessary dependency between the committee members and the decision made.
A murder depicted in an opera (an association between a victim, a murderer and a method of death)
N-ARY - No article on topic maps is complete without a reference to Italian Opera and this is no exception. The classic Ontopia topic map contains 4 way associations such as "Baron Scarpia was kiled by stabbing by Tosca in the opera Tosca" - the role players are Baron Scarpia (playing the role of victim), Tosca the character(playing the role of perpetrator), stabbing (playing the role of cause of death) and Tosca the opera (playing the role of opera). To break this down into binary associations we would need to create a new topic of type murder, then we could say:
Modelling associations is best done with a bit of thought. Although the temptation is to just stuff as much as possible into a single association (especially when writing XTM syntax by hand), using small associations where possible gives you more flexibility in the long run as it allows greater control over attaching metadata to specific statements.
More granular associations also enable a great deal more clarity. Allowing the author to be explicit about whether role players are interdependent or not is important and making use of standard topic map machinery to do that means that you need not be dependent on an ontology description to make clear what the topic map model is already capable of expressing.
Thinking about the arity of associations at the time you are constructing your topic map ontology will reap benefits in the long run.
- The victim of the murder was Baron Scarpia
- The perpetrator of the murder was Tosca
- The method of the murder was stabbing
- The murder is depicted in Tosca (the opera)