Skip to main content

A Practical Introduction to Topic Maps

Note

This paper was written back in 2002. Since that time the XTM standard has been updated to version 2.0 which has some syntactic differences. However all the key concepts described here still apply.

Introduction

Topic maps are an ISO standard for the representation and interchange of structured information models. The first version of the standard, ISO 13250:2000 was released just two years ago and has since been adopted for use on the Web by the TopicMaps.Org consortium. The XML Topic Maps (XTM) syntax created by the TopicMaps.Org consortium is now also included as a normative appendix to the Second Edition of the ISO standard (ISO 13250:2002).

The original motivation for the development of the topic map paradigm was to assist in the process or merging traditional back-of-the-book indexes. It is no surprise, therefore, that the paradigm easily accommodates the representation of finding-aids associated with books and other forms of paper documentation such as indexes, tables of contents, thesauri and so on. However, the generalised nature of the topic map paradigm has enabled its application to other problems and in its current incarnations, we find topic maps being used to represent ontologies; to provide a framework for the development of web applications; to integrate diverse electronic information sources under a single portal; as well as for improving access to published information of all sorts.

In this article I will introduce the basic concepts of topic maps and show how these concepts can be represented in XTM syntax. Due to space constraints, this article will not cover the practical applications of topic maps, but the interested reader can find case studies published on the Techquila web-site.

Topic Map Basics

A topic map is a collection of structured mark-up external to any resources it may describe. It is common to think of the topic map as a layer 'floating' above resources which may be retrievable electronic files such as web pages or files, or other forms of information (perhaps information which is retrieved manually or retrieved as the result of performing a database query). There is a clean division between the topic map and the resources referenced from the topic map that is only crossed by the topic map occurrence construct (described later). This division between the topics and associations in 'topic space' and the resources documented by the topic map is shown in Figure 1.

/images/papers/tm1.png

Figure 1 - Topic space and resource space

Topics

A topic map consists of a collection of topics. A topic is a proxy for anything that the topic map author wishes to document in his or her topic map. As already indicated, the thing that a topic represents can be:

  • An electronic resource that can be retrieved and processed by a computer.
  • An electronic resource that is not accessible to the computer.
  • A "real-world thing" such as a person or an object which cannot be retrieved or processed by the computer (although the computer may have information about how an actor in the real world could retrieve the thing).
  • A concept with no physical manifestation such as an emotion or a technical concept. A business entity such as a company or a department falls into this category too.
  • A construct in a topic map - we will see later how this is useful for being able to add successive levels of detail to a topic map.

Figure 2, below, shows the distinction between a topic and the object or concept that the topic represents in the real world. An author has in his or her mind the subject to be documented and creates a topic. The topic is a machine representation of the subject which can now be stored, queried and manipulated by the computer.

/images/papers/tmworld.png

Figure 2 - Representing the real world with topic maps

Topic Characteristics

Characteristics is a collective term used to describe the three properties which a topic may have which together form the collection of assertions that the topic map author has made about the subject that the topic represents.

  • Names are labels for the topic that serve to identify it to the user of the topic map in some way. These labels can also be used by an application for sorting and display of topics. Names are string labels but each string label can have any number of alternate representations, including non-string forms such as graphics. For example a company may have a full name and a ticker symbol on the market on which it is listed. In addition, some names may be differentiated by their purpose such as sorting, iconic display and so on.

  • Occurrences are identifiable resources that are in some way related to the topic. Note that an occurrence resource needs only to be identifiable to a topic map processor. It does not need to be retrievable, although typically a non-retrievable resource is not of much use to a topic map application. For topic representing a company, a retrievable occurrence might be its home page or a stock quote for the company. A non-retrievable occurrence might be the registration papers for the company.

    As well as resources external to the topic map, occurrences can also be used to specify additional information about the topic which is kept in the topic map itself.

    In Figure 1, occurrences are shown as the dashed lines connecting the topics to resources in the resource pool.

  • Roles played in associations form the final class of characteristic of a topic. Associations are used to relate two or more topics to each other. In Figure 1 associations are shown as solid lines connecting topics together. We will look at associations in a moment; all that is important to know right now is that in any given association a topic will play some identifiable "role". For example an individual might own some shares in a company. In this case there is an association between individual and company with the individual playing the role of "shareholder" and the company playing the role of "share-holding". Note that the roles would be the same in the case of an organisation such as a financial institution holding shares in the company. In XTM syntax, the mark-up that declares what roles a topic plays in an association are part of the mark-up of the association, not of the topic, but conceptually the roles that a topic plays are considered to be a characteristic of the topic.

Basic Topic Syntax

The following example shows the representation of a simple topic map consisting of a single topic. Note that the <topicMap> element defines the default namespace as the namespace defined for XTM 1.0 and the namespace for XLink. In later examples I will omit the XML declaration and the namespace declaration to save space. To save space, I have not included the full text of the XTM DTD in this article, but it is available to online on the TopicMaps.Org site at http://www.topicmaps.org/xtm/1.0/

<?xml version="1.0"?>
<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
          xmlns="http://www.w3.org/1999/xlink">
    <topic id="xzyyz">
        <baseName>
            <baseNameString>Redmond Computers Inc.</baseNameString>
        </baseName>
        <occurrence>
            <resourceRef xlink:href="http://www.redmondcomputers.com/"/>
        </occurrence>
        <occurrence>
            <resourceData>1977</resourceData>
        </occurrence>
    </topic>
</topicMap>

Some points to note about the XTM syntax introduced in this example:

  • The <topic> element has an id attribute. I have deliberately used a value bearing no relation to the topic itself. The id attribute is simply a syntactical construct that will allow us to make references to the topic later; beyond this it has absolutely no meaning to a topic map processor. All elements defined by the XTM 1.0 DTD have an id attribute, enabling any part of the topic map document to be referenced if need be, but the id attribute value is only required for the <topic> element.
  • The name label is contained within a <baseNameString> element, which is in turn contained within a <baseName> element. As you might expect, this indicates that there are further properties of a base name that we have not yet discussed.
  • Similarly for the <occurrence> the reference to the occurrence resource itself is contained in a <resourceRef> element. The <resourceRef> element is declared in the XTM DTD as an XLink simple link. It is this use of XLink (and its use in all the other reference constructs in the XTM DTD) which requires the declaration of the XLink namespace. The second of the occurrences shows the use of the <resourceData> element to provide the occurrence data "in-line" with the topic map. This makes it somewhat easier to add meta-data to topics.

Types

If you look at the simple topic map shown in the sample code aboce, you will see that the topic does not really convey any meaning. There is nothing to indicate that the topic represents a company, nor is there anything to indicate how the occurrences are related to the topic. What we need is some way to indicate the type of 'thing' represented by the topic and the type of relationship between the topic and the resources identified or specified by the occurrences.

In topic maps types are defined, like almost everything else, by topics. We create a topic for the topic type or for the type of relationship between the topic and the occurrence resource and then use the <instanceOf> element with a nested <topicRef> element to reference the type-specifying topic. So to do this with our simple sample, we need to create some new topics as shown in the sample topic map below.

<topicMap>

<topic id="company">
    <baseName><baseNameString>Company</baseNameString></baseName>
</topic>

<topic id="homepage">
    <baseName><baseNameString>Home Page</baseNameString></baseName>
</topic>

<topic id="year-established">
    <baseName>
        <baseNameString>Year Established</baseNameString>
    </baseName>
</topic>

<topic id="xzyyz">
    <instanceOf>
        <topicRef xlink:href="#company"/>
    </instanceOf>
    <baseName>
        <baseNameString>Redmond Computers Inc.</baseNameString>
     </baseName>
    <occurrence>
        <instanceOf>
            <topicRef xlink:href="#homepage"/>
        </instanceOf>
        <resourceRef xlink:href="http://www.redmondcomputers.com/"/>
    </occurrence>
    <occurrence>
        <instanceOf>
            <topicRef xlink:href="#year-established"/>
        </instanceOf>
        <resourceData>1977</resourceData>
    </occurrence>
</topic>

</topicMap>

You can also see the principle of typing conceptually in Figure 3, below, where the different topic types are rendered as different shapes for the topics. In the diagram, the topics defining the types are shown as hollow shapes; the instances of those types are shown as solid shapes and the class-instance relationship is shown as a dashed line. However there are no restrictions placed on a topic used to define a type. A topic that is used to define a type can still be used like any other topic in the topic map and it is perfectly legal for a topic to be used as a type and to have one or more types of its own. This feature makes it possible to document both the ontology used by the topic map and the instances of that ontology, all with the same basic topic map mechanics.

/images/papers/tm2.png

Figure 3 - A topic map with topic types

Each occurrence can only be of a single type; however, a topic may have multiple types, which are represented using multiple <instanceOf> elements. The <instanceOf> element asserts a class-instance relationship between the object containing the <instanceOf> element (be it a topic, occurrence or association) and the topic which is referenced from the <instanceOf> element.

It is important to understand that an instance-of relationship is not the same as a superclass-subclass relationship. It is a common mistake to think that mark-up like the fragment shown in Sample 3 represents a class-hierarchy "'mycorp' is a 'company', a 'company' is an 'organisation', therefore 'mycorp' is an 'organisation'". Unfortunately, that statement is not what is represented in the fragment. In fact the fragment asserts that "'mycorp' is a 'company' and 'company' is an 'organisation'." Not only is that English statement grammatically incorrect, the topic map is also semantically incorrect! What we really need to do is to create a subclass-superclass relationship between organisation and company. Such an association would make the assertion that "any company is an organisation". We will see how to do this later.

<topic id="organisation"/>
<topic id="company">
    <instanceOf><topicRef xlink:href="#organisation"/></instanceOf>
</topic>
<topic id="mycorp">
    <instanceOf><topicRef xlink:href="#company"/></instanceOf>
</topic>

Name Variants

The <baseNameString> element contains only PCDATA, but you will recall that non-string resources such as graphics can be used to label topics. This facility is provided by the <variantName> construct. A single base name may have any number of variant names associated with it. The structure of a <variantName> element is very similar to that of an <occurrence> - it may contain either an inline string resource or a reference to an external resource.

Unlike occurrences, variant names are not typed; however, they do have a mechanism for indicating the kind of variant that they are. A variant may have any number of parameters which indicate the nature of the variant. As with types, each parameter is defined by a topic, and the topic map paradigm allows an author to use any topic he or she likes as a parameter. The example below shows our company topic with a variant name which is the normalised sort string for the topic. This sort of variant name might be used by an application to order a list of companies (although the base name string itself would be the one used for display purposes).

<topicMap>
   <topic id="sort-string"/>
   <topic id="xyzzy">
      <baseName>
         <baseNameString>Redmond Computers Inc.</baseNameString>
         <variant>
            <parameters>
               <topicRef xlink:href="#sort-string"/>
            </parameters>
            <resourceData>redmond computers incorporated</resourceData>
         </variant>
      </baseName>
   </topic>
</topicMap>

Variants can also be nested within each other. A variant nested inside another variant inherits all of the parameters of its parent. So if we have, for example, icons of different sizes, we can use nesting to group them all together. This is shown in the example below - the two variants with <resourceRef> elements pointing to icons are identified with a single parameter indicating their size. They inherit the parameter indicating that they are icons from the containing <variant> element. It should be noted that the topics with id attribute values of "icon", "_16x16" and "_32x32" are the minimal mark-up possible for a topic. This is not a recommended approach to topic creation, but is used here to save space.

<topicMap>
   <topic id="icon"/>
   <topic id="_16x16"/>
   <topic id="_32x32"/>
   <topic id="xyzzy">
      <baseName>
         <baseNameString>Redmond Computers Inc.</baseNameString>
         <variant>
            <parameters>
               <topicRef xlink:href="#icon"/>
            </parameters>
            <variant>
               <parameters>
                  <topicRef xlink:href="#_16x16"/>
               </parameters>
               <variantName>
                  <resourceRef xlink:href="http://www.myserver.com/icons/16x16/redmond.png"/>
               </variantName>
            </variant>
            <variant>
               <parameters>
                  <topicRef xlink:href="#_32x32"/>
               </parameters>
               <variantName>
                  <resourceRef xlink:href="http://www.myserver.com/icons/32x32/redmond.png"/>
               </variantName>
            </variant>
         </variant>
      </baseName>
   </topic>
</topicMap>

Scope

A scope indicates the context within which a characteristic of a topic (a name, an occurrence or a role played in an association) may be considered to be true. One common use of scope is to provide localised names for topics. For example "company" in English could be "société" in French or "Firma" in German. They all describe the same concept and so should be represented by the same topic. Rather than creating separate topics one should instead create a single topic to represent the concept of a company and add to it three separate base names, using scope to indicate that a particular base name should be used for a specific language environment. The example below shows a simple use of scope.

<topicMap>

<!-- Some topics for the languages -->
<topic id="en"/>
<topic id="fr"/>
<topic id="de"/>

<topic id="company">
   <baseName>
      <scope>
         <topicRef xlink:href="#en"/>
      </scope>
      <baseNameString>company</baseNameString>
   </baseName>

   <baseName>
      <scope>
         <topicRef xlink:href="#fr"/>
      </scope>
      <baseNameString>société</baseNameString>
   </baseName>

   <baseName>
      <scope>
         <topicRef xlink:href="#de"/>
      </scope>
      <baseNameString>Firma</baseNameString>
   </baseName>
</topic>

</topicMap>

If a characteristic has no scope defined for it, then it is said to be in the unconstrained scope. Any characteristic defined in the unconstrained scope is always considered to be valid. Beyond this, the XTM specification leaves the precise effect of scope on processing up to individual applications. In practice, the scope of a characteristic is usually compared to a collection of topics representing the current context of the user or the application. The results of this comparison are then used to determine whether or not a given characteristic is valid. When comparing sets of topics in this way, standard mathematical set operations can be applied. For example a characteristic could be considered "valid" if the topics in its scope form a subset of the topics that define the user context. Alternatively, a characteristic could be regarded as "valid" if the union of the topics in the scope of the characteristic and the topics defining the user context is not the empty set. These two possible approaches are shown in Figure 4, below.

/images/papers/tm3.png

Figure 4 - Two approaches to determining characteristic validity

Identity

So far the topics that we have created have little in the way of machine-processable identity. For example, the topic with id "sort-string" is used to indicate that a variant name is a normalised sort string. But if we were to try the topic map in a topic map application, how would the application know that it is this particular topic that indicates that the nature of the variant name is suitable for sorting ? What is needed is some commonly agreed identity for the topic that represents the concept of "suitable for sorting".

This problem is not limited to the "structural" topics, which might be used to express processing options to a topic map application, but applies also to all other topics we create. For example, if we have created a topic for the concept of a 'company', ideally we would like to apply some commonly agreed identity for the concept to that topic so that the topic map could be more easily interchanged.

In addition to application-specific use, a topic map processor also takes note of the identity assigned to topics. When the processor determines that two topics are 'about' the same thing, then those topics will be merged. How a processor determines that two topics are 'about' the same thing may be application specific, however XTM does define some basic principles, based on the different forms of identity described below.

The subject that a topic represents can be identified either by reference to the resource that the topic represents; or else by reference to a resource that in some way describes the subject in a way that is meaningful to a human being. These resources are known as subject-constituting and subject-indicating resources respectively. In addition to these formal indicators of identity, the topic map paradigm also includes a mapping of the names of topics to an identity. This name-to-identity mapping is defined by a rule called the topic naming constraint.

Subject-Constituting Resources

If we want to make some assertions about a Web page, or some other retrievable resource with a unique address, we can use the address of the resource as the identifier for the topic we create to represent it. In this case, the identifier is said to reference a subject-constituting resource. A topic may only reference a single subject-constituting resource - this makes sense because a topic can only ever be about a single thing. It is considered an error if an attempt is made to merge two topics with different subject-constituting resources. When two topics have the same subject-constituting resource, a topic map processor will regard them as being about the same thing and will merge them.

The mark-up for a subject-constituting resource is the <resourceRef> child element inside the <subjectIdentity> element. The resource pointed to by the XLink href attribute of the <resourceRef> element is the subject-constituting resource for the topic.

<topicMap>
  <topic id="redmond-home-page">
    <subjectIdentity>
      <resourceRef xlink:href="http://www.redmondcomputers.com/" />
    </subjectIdentity>
    <baseName>
      <baseNameString>
        The Redmond Computers Inc. Home Page
      </baseNameString>
    </baseName>
  </topic>
</topicMap>

Subject-Indicating Resources

In the example above, we use the address of the company's web-site in order to make some assertions about the site itself. If we want to make some assertions about a company itself, we could use the address of the company homepage as an identifier for the topic. In using the address of the company web-site in this way, we are assuming that a reader who reads the home-page of the web-site will understand that it is the company we are describing in our topic map. In this case, the identifier is said to reference a subject-indicating resource. A topic may have any number of subject-indicating resources because each of these resources describes the thing that the topic represents and is not the represented thing itself. For the same reason, a topic may have both a subject-constituting resource and one or more subject-indicating resources. When two topics have one or more subject-indicating resources in common, a topic map processor will consider them to be about the same subject and will merge them. In addition, if one the subject-indicating resources of a topic is the address of another topic, the topic map processor will consider those topics to be about the same subject and will merge them.

At first, this last constraint may seem a little strange and it is worth describing in a little more detail. Consider two topics, A and B. Topic A represents the concept of a company. Topic B has the address of topic A as a subject-indicating resource. This means that topic B is about the subject described by topic A. This situation is shown in Figure 5, below. As we already know that topic A represents the concept of a company, it must therefore be obvious that the one subject that A describes best is that concept. Therefore B must also represent the concept of a company because A is the descriptor for that subject. This means that A and B are 'about' the same subject and should be merged.

/images/papers/tm5.png

Figure 5 - One way in which two topics can represent the same subject

The example below shows the syntax for treating a URI as a subject-indicating resource. The <subjectIndicatorRef> element is a child of the <subjectIdentity> element, which uses an XLink simple link to point to the resource that describes the subject of the topic.

<topicMap>
  <topic id="xyzzy">
    <subjectIdentity>
       <subjectIndicatorRef
           xlink:href="http://www.redmondcomputers.com/"
       />
    </subjectIdentity>
    <baseName>
      <baseNameString>Redmond Computers Inc.</baseNameString>
    </baseName>
  </topic>
</topicMap>

The Topic Naming Constraint

The topic naming constraint states that, in the words of the XTM specification "any topics having the same base name in the same scope implicitly refer to the same subject". This rule essentially makes the label assigned to a topic into a form of identity for the topic. It is important when creating the labels for topics that the author be aware of this rule. When creating a new base name, an author should be sure to qualify the name either within the label string itself, or else to scope it appropriately.

The example below shows how this rule can lead to some unexpected results. The topics with id "rci-sales" and "abc-sales" are intended to represent the sales departments of Redmond Computers Inc. and ABC software respectively, but because each has the name "sales" in the unconstrained scope, a topic map processor will assume that those topics refer to the same scope and will merge them. Obviously, in this case such a merge would be incorrect. In order to prevent the merge from happening, the author of this topic map must apply more qualified names to these topics.

<topicMap>
    <topic id="xyzzy">
        <baseName>
            <baseNameString>Redmond Computers Inc.</baseNameString>
        </baseName>
    </topic>

    <!-- Departments in Redmond Computers Inc. -->

    <topic id="rci-sales">
        <baseName>
            <baseNameString>Sales</baseNameString>
        </baseName>
    </topic>
        ...

    <topic id="abc">
        <baseName>
            <baseNameString>ABC Software</baseNameString>
        </baseName>
    </topic>

    <!-- Departments in ABC Software -->

    <topic id="abc-sales">
        <baseName>
            <baseNameString>Sales</baseNameString>
        </baseName>
    </topic>
      ...
</topicMap>

There are two different ways in which more qualified names can be created. One approach is to keep the same name string and add a differentiating scope; the other way is to modify the name string to include some differentiating information. In the example below, we use the company itself as a differentiator. The assumption made here is that any given company will have only one sales department (for a multinational company, of course, both company and geographic region might be required for complete differentiation).

To differentiate using scope, we simply add a <scope> element to the <baseName>, containing a <topicRef> pointing to the topic which represents the appropriate company. To differentiate using a modified name string, we include the company name in the department name string. If it is intended that the topics representing the departments should be accessible outside the context of the company, a combination of these two approaches is most appropriate as the more qualified name string will be useful for display in cases where the two departments might occur in the same list of search results set.

<topicMap>
    <topic id="xyzzy">
        <baseName>
            <baseNameString>Redmond Computers Inc.</baseNameString>
        </baseName>
    </topic>

    <!-- Departments in Redmond Computers Inc. -->

    <topic id="rci-sales">
        <baseName>
            <scope><topicRef xlink:href="xyzzy"/></scope>
            <baseNameString>Sales</baseNameString>
        </baseName>
        <baseName>
            <baseNameString>Redmond Computers Inc., Sales</baseNameString>
        </baseName>
    </topic>
    ...

    <topic id="abc">
        <baseName>
            <baseNameString>ABC Software</baseNameString>
        </baseName>
    </topic>

    <!-- Departments in ABC Software -->

    <topic id="abc-sales">
        <baseName>
            <scope><topicRef xlink:href="abc"/></scope>
            <baseNameString>Sales</baseNameString>
        </baseName>
        <baseName>
            <baseNameString>ABC Software, Sales</baseNameString>
        </baseName>
    </topic>
    ...
</topicMap>

Topic Merging

When two topics are merged, the result is a single topic representing the aggregate of the information of the two merged topics. In practice this means that the types of the new topic are the union of the types of the two source topics; likewise the names and occurrences of the new topic are the union of the names and occurrences of the two source topics; and finally wherever either of the two topics plays a role in an association, or provides the type for another topic map construct, they will be replaced by the new topic.

Although simple, these merging rules give a great deal of power and flexibility to topic maps, enabling the development of modular systems of topic maps each providing a different view of the same basic concepts; or the development of topic maps by automated processes which can then be further developed manually without the need to edit the automatically generated topic map directly.

Associations

Associations represent the relationship between two or more topics. An association consists of two parts. Firstly there is the association itself: this defines the nature of the relationship between all of the associated topics. As with occurrences, associations can be typed by a single type-specifying topic. It is this type that defines the nature of the relationship indicated by the association. Secondly, the association consists of a number of players, each of which is a topic and which plays a role in the association that is in turn described by another topic.

Let us build up an example association showing a relationship between Redmond Computers Inc. and an employee, John Smith. We can start by creating a simple association between the topic representing the company and the topic representing John Smith. This is shown in the following example:

<topicMap>

  <topic id="rci">
    <baseName>
      <baseNameString>Redmond Computers Inc.</baseNameString>
    </baseName>
  </topic>

  <topic id="john-smith">
    <baseName>
      <baseNameString>John Smith</baseNameString>
    </baseName>
  </topic>

  <association>
    <member>
      <topicRef xlink:href="#rci"/>
    </member>
    <member>
      <topicRef xlink:href="#john-smith"/>
    </member>
  </association>
</topicMap>

As with our first topic example, this sample barely conveys any information at all. It states that there is some relationship between something called "Redmond Computers Inc." and something called "John Smith" but does not say anything about the nature of the relationship nor about what roles each partner plays in the relationship. Once again, this information is conveyed by topics. It is the type of the association that defines the nature of the relationship.

The association type is defined using an <instanceOf> element. Each member of the relationship can be given a specific role using a <roleSpec> element. This is shown in the example below:

<topicMap>

    <topic id="employs">
        <baseName>
            <baseNameString>Employs</baseNameString>
        </baseName>
    </topic>

    <topic id="employer">
        <baseName>
            <baseNameString>Employer</baseNameString>
        </baseName>
    </topic>

    <topic id="employee">
        <baseName>
            <baseNameString>Employee</baseNameString>
        </baseName>
    </topic>

    <topic id="rci">
        <baseName>
            <baseNameString>Redmond Computers Inc.</baseNameString>
        </baseName>
    </topic>

    <topic id="john-smith">
        <baseName>
            <baseNameString>Redmond, Washington</baseNameString>
        </baseName>
    </topic>

    <association>
        <instanceOf>
            <topicRef xlink:href="#employs"/>
        </instanceOf>
        <member>
            <roleSpec>
                <topicRef xlink:href="#employer"/>
            </roleSpec>
            <topicRef xlink:href="#rci"/>
        </member>
        <member>
            <roleSpec>
                <topicRef xlink:href="#employee"/>
            </roleSpec>
            <topicRef xlink:href="#john-smith"/>
            <!-- additional topicRefs to other employees can go in here -->
        </member>
    </association>

</topicMap>

Now we can see far more information. There is type of the relationship is labelled as "Employs", and the contributions made by the topics of "Redmond Computers Inc." and "John Smith" are characterised as "employer" and "employee" respectively. In fact, the naming of these topics can be misleading - it is easy to assume from this association that the relationship is a one-way relationship from "Redmond Computers Inc." to "John Smith"; whereas, in fact, the association simply groups together the topics which play roles in it, without implying any ordered relationship between them.

This aggregation property of the association is part of what gives the topic map paradigm its extraordinary power. From the topic of "Redmond Computers Inc." it might be possible to list all of the associations of the type "Employs" in which it plays the role of "Employer", and so get a company directory listing. Equally by following associations of type "Employs" from "John Smith" in which that topic plays the role of "Employee", we might get a complete employment history for this person.

If I wanted to list a number of employees in the same association construct, this is allowed in XTM syntax. I can either add another <member> element with its child <roleSpec> and <topicRef> elements or I can simply add another <topicRef> to the existing <member> element for the role of "employee". This latter option is a syntactic short cut for allowing multiple players of the same role in the same association to be specified.

Labelling Associations

One common issue when creating topic maps is how to label associations. Unlike topics, associations do not have any mark-up for specifying a label for each instance. Instead, many topic map applications will use the label of the topic that defines the type of the association. Very often the labels for associations in a topic map will be verbs, for example "Redmond Computers Inc. employs John Smith", but these verbs imply a direction to the association. Many topic map practitioners give the topic that types the association a label for each role that the association supports and then scope those labels by the topic that defines the role (the topic referred to from the <roleSpec> element). The logical conclusion of this approach with a simple binary association such as the one in our sample is to assign three separate names to the topic which defines the association type. In the unconstrained scope, the association type should be named with a noun such as "Employment". The use of a noun frees the name from the context of one or other of the roles in the association. The two other names should be verbs using the roles as the context for the name and the role types to define the scope of each name.

This is shown in the sample below. In this sample, the label "Employment" is created in the unconstrained scope to be treated as the default name for the topic; the label "Employs" is to be used in the context of the employer and so is scoped by the topic "Employer"; similarly the label "Employed By" is scoped by the topic "Employee". An application may then use the role played by the topic currently in focus in the application as part of the user context when determining which is the best name to be applied, so in the context of "John Smith" playing the role of "Employee", the application would select the label "Employed By".

<topicMap>

  <topic id="employs">
    <baseName>
      <baseNameString>Employment</baseNameString>
    </baseName>
    <baseName>
      <scope>
        <topicRef xlink:href="#employer"/>
      </scope>
      <baseNameString>Employs</baseNameString>
    </baseName>
    <baseName>
      <scope>
        <topicRef xlink:href="#employee"/>
      </scope>
      <baseNameString>Employed By</baseNameString>
    </baseName>
  </topic>

  <topic id="employer">
    <baseName>
      <baseNameString>Employer</baseNameString>
    </baseName>
  </topic>

  <topic id="employee">
    <baseName>
      <baseNameString>Employee</baseNameString>
    </baseName>
  </topic>

  <topic id="rci">
    <baseName>
      <baseNameString>Redmond Computers Inc.</baseNameString>
    </baseName>
  </topic>

  <topic id="john-smith">
    <baseName>
      <baseNameString>John Smith</baseNameString>
    </baseName>
  </topic>

  <association>
    <instanceOf>
      <topicRef xlink:href="#employs"/>
    </instanceOf>
    <member>
      <roleSpec>
        <topicRef xlink:href="#employer"/>
      </roleSpec>
      <topicRef xlink:href="#rci"/>
    </member>
    <member>
      <roleSpec>
        <topicRef xlink:href="#employee"/>
      </roleSpec>
      <topicRef xlink:href="#john-smith"/>
    </member>
  </association>

</topicMap>

Of course, it is sometimes useful or necessary to attach other information to a specific association. We will look at how to do that later.

Scope and Associations

Just as we use scope to express the context within which a name or occurrence of a topic is valid, so we can also use scope to express the context within which an association is valid.

The <scope> mark-up itself is exactly the same as that used for <baseName> and <occurrence> elements, and the mark-up appears as an optional child of the <association> element.

As an example, let us suppose that a rumour of merger talks between two companies is to be represented in the topic map. One way to do this would be to create a distinct association type, but we would then need to create a distinct type for every rumoured association. An alternative method would be to scope the association by a topic that indicates that the context for the association is that of "rumour". This is shown below:

<topicMap>

  <topic id="rumour"/>
  <topic id="merger"/>
  <topic id="merge-partner"/>

  <topic id="companyA"/>
  <topic id="companyB"/>

  <association>
    <instanceOf><topicRef xlink:href="#merger"/></instanceOf>
    <scope><topicRef xlink:href="#rumour"/></scope>
    <member>
      <roleSpec><topicRef xlink:href="#merge-partner"/></roleSpec>
      <topicRef xlink:href="#companyA"/>
      <topicRef xlink:href="#companyB"/>
    </member>
  </association>

</topicMap>

Reification

Of all the constructs in a topic map, only the topic is allowed to have names and occurrences and to play roles in associations. In other words, one can only make assertions about a subject which is represented by a topic. Those assertions themselves are not topics and so we cannot make assertions about assertions. Reification is the process by which a topic may be constructed to represent the assertion made by some other construct in the topic map. This process enables a name to be given to a particular occurrence of a topic, or documentation of an association to be "attached" to the association itself.

The mechanics of reification are quite simple. To create a topic that reifies another construct in the topic map, simply create a topic with a subject-indicating resource locator which points to the construct in question. For example, consider a partnership between two companies. Such a relationship may be publicly announced in the form of a press release; or from analysts' reports in trade press. By reifying the association, the information resources that gave rise to the creation of the association can be documented, allowing users of the topic map to get more information about the merger. The mark-up for this reification is shown in the example below and a conceptual overview of the reification is shown in Figure 6.

<topicMap>

  <topic id="xyzzy">
    <baseName>
      <baseNameString>Redmond Computers Inc.</baseNameString>
    </baseName>
  </topic>

  <topic id="abc">
    <baseName>
      <baseNameString>ABC Software</baseNameString>
    </baseName>
  </topic>

  <topic id="partnership"/>
  <topic id="partner"/>

  <association id="rci-abc-partners">
    <instanceOf><topicRef xlink:href="#partnership"/></instanceOf>
    <member>
      <roleSpec><topicRef xlink:href="#partner"/></instanceOf>
      <topicRef xlink:href="#xyzzy"/>
      <topicRef xlink:href="#abc"/>
    </member>
  </association>

  <!-- This topic "reifies" the partnership association -->
  <!-- This enables us to attach the press release
       announcing the partnership to the association itself -->
  <topic id="foo">
    <subjectIdentity>
      <subjectIndicatorRef xlink:href="#rci-abc-partners"/>
    </subjectIdentity>
    <occurrence>
      <instanceOf>
        <topicRef xlink:href="#press-release"/>
      </instanceOf>
         <resourceRef xlink:href="http://www.redmondcomputers.com/pressrel/01042002_001.html">
    </occurrence>
  </topic>
</topicMap>
/images/papers/tm4.png

Figure 6 - Reification

Topic Map Merging

Merging is a cornerstone of the topic map paradigm. The merge process enables distributed and modular creation of topic map "knowledge-bases". There are three ways in which two topic maps can be merged.

  • Under explicit application control. A topic map application may provide the user with the ability to selectively merge topic maps. There is no restriction in the XTM specification to prevent an application from merging topic maps at runtime as needed or as directed by the application user.
  • Processing a <mergeMap> element. The <mergeMap> element allows an author to explicitly request the merge of another topic map with the map that he or she creates. The mark-up for the <mergeMap> element also allows the author to define a set of topics to be added to the scope of every characteristic in the external map. This additional, externally defined, scope can be useful for preventing unwanted topic merges from occurring, or else to indicate the source of a characteristic in the final merged map.
  • Processing a <topicRef> element. A <topicRef> element is not limited to referring only to topics contained within the same topic map document. A reference can be made to a topic in another topic map document. If a reference is made to a topic in an external topic map, then a topic map processor is required to retrieve the entire topic map containing that topic and to merge it with the topic map containing the reference.

Summary

This article has described the basic principles of topic maps and introduced the XTM 1.0 interchange syntax for topic map information. We have seen how topic maps are constructed from the basic elements of topics and associations and how more advanced features such as scope, identity and reification can be applied to make detailed, context-sensitive information available from the topic map.

Resources

There are already a wide range of resources related to topic maps available on the Web, including specifications, papers and tools. Here are a few highlights:

Standards

Toolkits

Open source tool-kits are available in a couple of different flavours.