Measuring Out The Semantic Web

Introduction

In his closing keynote at this years TMRA conference (you weren’t there? you should have been!), Steve Newcomb made reference to the wonders of ISO/IEC 10744 or HyTime to its friends.

HyTime is a monster standard – it is complex and so difficult to implement in its entirety that I believe only one person has ever tried. That said, the standard contains so much that is useful and generally required for a functioning Web, that many of its pieces got cannibalized, stripped down and turned into hacker-friendly W3C “standards” – XLink. Just as XML owes its very existence to SGML, so XLink and SMIL both need to look back to HyTime as an ancestor (albeit one that never gets invited to the family parties). Anyway, Steve talked about how there is still much left in HyTime that could be useful and in particular picked out Clause 9 – Scheduling as one such piece. IMHO he is right on the money.

The Problem Statement

One of the biggest problems on the semantic web or linked data web is that we have no way to communicate measurements or positions using a grounded algorithm. There is no way that a fully conformant Topic Maps processor, RDF processor or OWL reasoner can tell that a given property is actually a point on some axis. And there is no way that these processors could convert from one axis type to another (say feet to metres). To do all this with current technology you need to bake in to your application some ontology-specific knowledge – something that tells you “Values of property X are always expressed in milimeters, and values of property Y are always expressed in seconds since midnight on 1st January 1970″.

It is staggering to realise that we can’t do this yet on the Semantic Web when you think about it. Its even more staggering to see bold statements being made for the “Web of Linked Data” without addressing the basic problem of “How do I know what units this data is expressed in”. I believe that this is where carcass of HyTime can be picked over once more :-)

What I’m going to attempt

I have decided to go back to the HyTime standard and see what can be taken away from it for the benefit of those currently struggling with merging, comparing and meaningfully transforming Linked Data. HyTime is a massive spec, and as the Readers Guide To HyTime recommends, I’m not going to even attempt to read all 450 pages but will instead focus just on Clause 9. This part of the HyTime specification deals with the issue of describing the positioning and extent of an object in N-dimensional space and provides a mechanism for defining the units used for measuring along each axis.

My hope is that this facility can be used not only for defining the size and position of objects but also as a general purpose facility for expressing measurements of all kinds, and that is going to be my closed set of problems to address:

  1. Specifying a measurement with a value and units in a way that allows an application to automatically compare and convert measurements that use different scales or units.
  2. Specifying a location and/or extent of an object in N dimensions where each dimension has its own associated measurement domain.

Because this will probably take some time, and because I know you are too busy to read a three-screen-long thesis, and because I would  actually value feedback as I go along, I’m going to break this exploration up into a number of separate posts. I’ll create a Category to group all the posts together as I go. Please feel free to hit up the links above and dive in there and stick around for the next post where I’ll start to actually read this stuff.

What Do You Think ?

I would be interested in what others think about this. Do you know of some other existing ontology for measurements ? Have you tried to do something similar yourself (and if so what were your experiences) ? What chapter did you get to in the HyTime spec ? All comments, suggestions and peanuts from the gallery are welcome in the comments!

  • http://www.garshol.priv.no/blog/ Lars Marius Garshol

    First of all, I think you are right that this is a major issue, and particularly in a linked data context. Some people try to handle this by putting the unit into the occurrence type name or in documentation on the occurrence type (or property in RDF), but it’s difficult to argue that this is a good solution.

    I’m not convinced that HyTime has much to offer here, though. It may have some text about N-dimensional spaces and so on, but does it have an actual syntax and actual algorithms for handling this stuff? Does it have any way of detecting that your and my independent definitions of ISO meters are the same? If so, what does all of this look like? (A blog posting answering this would be nice. :)

    However, let’s say that HyTime does let you express this, and that it does so in a way that’s better than simply referencing known units like celsius, kelvin, fahrenheit, reaumur, etc. Then what? If your nice HyTime SGML document sits over there, and your nice topic map is here, what good is that? What I’m getting at is that there must be some way to connect the actual number in a data set with the definition of some unit. That may well be harder than coming up with definitions for the main units in use.

    There are some ways one could do this:
    * Topics for units, then scope for attachment. (Meh.)
    * Topics for units, then reification for attachment. (Double-meh.)
    * Topics for units, then standardized association from occurrence type for attachment. (Well…)
    * Use subtypes of datatypes. That is, unit:length-in-meters subtypes xsd:float.

    The last one in many ways seems like the best, but it does require you to have some machinery for expressing subtyping of datatypes. Plus you probably need to make a topic for your data type and associate with your unit of choice. Given all that it might be workable.

  • http://www.snee.com/bobdc.blog Bob DuCharme

    Hi Kal –

    I don’t know if you heard that I’ve joined TopQuadrant last August, but around that time some co-workers convinced NASA to let them publish the measurement units ontology that they developed with NASA. See http://composing-the-semantic-web.blogspot.com/2009/08/units-ontology-with-spin-support.html. I’m sure they’d be very happy to get some feedback on its use.

    Bob

  • http://www.durusau.net Patrick Durusau

    Kal,

    Excellent! Just excellent!

    Err, there is a charter pending for a TC in OASIS that is going to be working on measurement ontologies (although not the sort of mapping you are talking about). Still, it may be of interest: OASIS Quantities and Units of Measure Ontology Standard (QUOMOS) Technical Committee, http://lists.oasis-open.org/archives/tc-announce/200911/msg00013.html.

    Where to start? That’s a hard question, particularly given my long time co-editorship with Newcomb. Not sure I read it the same way I would have without that. I am willing to look at it again, this month in fact, to see if i can suggest some likely chunks that avoid some of the added complexity. If nothing else it would be more interesting that some of the other reading that occupies a good bit of my time.

    This is both interesting and important, not too often that is seen in any area of endeavor.

    Hope you are having a great day!

    Patrick

  • Pingback: Twitted by semanticnews

  • Steve Ray

    You might be interested to know that the ontology community is just now organizing itself to synthesize a “Units of Measure” ontology as an OASIS standard, using the various existing measurement ontologies. See http://ontolog.cim3.net/cgi-bin/wiki.pl?UoM_Ontology_Standard. Having said that, we ought to discuss Hytime – I’ll bring your blog article to the group’s attention.

    Cheers,

    - Steve

  • Kal

    @Lars Marius:

    The next post will look at what I think HyTime has got. Then hopefully the one after that will look at how we could make use of what HyTime has got in a Topic Map. I actually think that scope for assignment might be the easiest way to make this work, but doing something with decorating the occurrence type in the ontology to define a default measurement unit for a specific occurrence type might be an interesting way forwards.

    @Bob:

    QUDT looks like exactly the kind of thing I am thinking about. Probably the only place I would have a quibble (and it is a minor one) is the use of decimal values to express the conversion factors rather than expressing conversion as a ratio of whole numbers. Maybe I don’t have to carry on wading through HyTime :)

    @Patrick and Steve

    The charter for the OASIS TC looks very promising. Of course, it says nothing about Topic Maps, so perhaps the way forward for this blog is to think about what Lars Marius calls “attachment” (good term) of units to measurements expressed in a topic map. However, I’ve got far enough down the road of reading the right bits of HyTime that I might as well also write up my own findings as well…stay tuned :-)

    Plenty to mull over…