Skip to main content

Recent discussion on the topicmapmail mailing list has been on the creation and maintenance of Published Subject Indicators (PSIs). A PSI is a resource which describes a vocabulary (or part of a vocabulary) and provides URIs for terms in the vocabulary (called Published Subject Identifiers which confusingly then has the same acronym, PSI). The discussion has been provoked by the proposal to create a registry of PSIs - a task which I personally welcome.

It seems to me that there are at least two different issues here. One is about the creation of PSIs and the other is about raising the profile of a particular set of PSIs.

First of all lets understand that there is nothing magic about a PSI in its technical aspects. Its just a URI that points to a resource that describes a vocabulary. The "magic" (if there is any) is in the processes that surround the maintenance of the PSI. The publisher that makes a PSI available is supposed to make a commitment to the stability of that PSI.

So what does stability mean ? I think that it means two things:

1) Stability of presence - the PSI's URI is not going to go away within some meaningful time frame (although I hear discussions of stability over hundreds of years, my feeling is that in this business aiming for stability over a period of 5-10 years is a sufficiently Herculean task to gain the status of PSI)

2) Stability of meaning - that the PSI's URI will always be dereferenced to a description of a term that is consistent throughout the lifetime of the PSI (Not necessarily the same all the time - e.g. a PSI for a person might be continually updated to reflect his changing status - marriage, promotion, publications and so on)

Now, neither of these commitments require a large investment in resources for those folks from the typical sem web community (it does leave out a large chunk of the world, but that is an issue that the IT industry as a whole must address). Nor do either of these commitments impose any constraints on users of the PSI. As a user of a PSI I am free to make my own value judgments about the stability of a PSI, and balance them against my judgment of its usefulness to me and the community that I am addressing with my applications. I may be uncomfortable using a PSI created by an individual whom I do not know, I may be uncomfortable using a PSI created by any individual, I may be unwilling to use a PSI created by a particular standards body or by a group I percieve as being unreliable (for whatever reason). The fact that ISO, OASIS, or the Spanish Knitting Association have put their imprimateur on a PSI is simply a factor in my judgement about the usefulness of this PSI to me.

There are good examples on the Web of vocabularies created by committe and by community. MARC is a committee-led vocabulary, as is HL7 and any number of XML vocabularies - created by a formal group (perhaps a public and inclusive group, perhaps a private and closed group) and a formal process.

Community-led vocabularies grow more organically from a user base - for example the Friend Of A Friend vocabulary (FOAF) has grown both in terms of its use and indeed its size as users get interested in applying it. The same could be said for the many faces of RSS.

In general, it seems to me that successful community-led vocabularies are smaller in size and more tightly focussed in scope than committee-led vocabularies. In addition, with no organisational imprimateur to fall back on, community-led vocabularies survive or die on their uptake. Thats not to say that the same dynamics do not also apply to committee-led vocabularies, but the organisation can provide some stability against the tide of user opinion.

So in measurement of stability, a community-led vocabulary can be as stable as a committee-led vocabulary and when one considers the other factors in the choice of vocabulary, the lighter weight, tighter focus and the ability to participate as a member of the user community may make a community-led vocabulary more attractive to some users.

Next we come to the issue of publicising a PSI. PSIs could be gathered together in a number of ways using existing web technology:

1) A centralised repository of PSIs - all subject descriptors are placed in a repository under a single common base URI. Some management process determines which PSIs are published and which are rejected.

2) A centralised registry of PSIs - PSI meta data is stored in a repository with a known address and a search interface which enables PSIs of interest to be located (either by human or machine users). A management process may be used to determin which PSIs are published, but it is not necessary in this case.

3) Informal publication - PSIs are announced on mailing lists and in weblogs or through other informal publication channels. Perhaps the author of a set of PSIs writes some articles on them, or publicises them through their use in a project with public visibility.

4) Search - PSI resources are flagged in some way (perhaps a specific META tag in the HTML representation of the resource) which enables an aware search engine to determine that a page is a resource containing Published Subject Indicators.

There are probably some other ways too. It is true that some of these forms are more restrictive than others for the creators of PSIs - particularly (1) which involves a process which could be open to abuse or to the perception of abuse. But what about the users ? Again I believe we come to the issue of choice. Some users will only be comfortable with PSIs from a centralised repository - some may even be required to use those PSIs because of their toolset. But without choice in the matter, the Semantic Web will be a poorer place. Imagine the Web if Yahoo were the only search engine (or if Google were the only search engine, if Yahoo is your preference...). Diversity causes difficulties for some - and this is an opportunity for an enabling organisation such as OASIS to define the management structures for a centralised repository, or for an enterprising vendor to create such a repository as fits with their tool set. But with the SemWeb in its current nascent state, diversity is to be welcomed and the opportunity for all to participate as both publishers and users of PSIs is vital to its success. That is why I welcome recent proposals to create an open-source registry of PSIs with minimal management processes and look forward to participating in its development as a contributor and as a user.


Comments powered by Disqus