Skip to main content

Developing a Topic Map Programming Model

Abstract

This paper presents an argument for the development of a standard programming interface for accessing and manipulating topic map information. The paper also presents a set of criteria for the evaluation of such an interface and goes on to describe four different proposed architectures which are then evaluated against the criteria.

This paper was first presented at Knowledge Technologies 2001, Austin, Texas.

Justification for Developing a Topic Map Programming Model

The topic map programming model embodies two major pieces of work - the development of a topic map data model; and the development of an object-oriented API for mainstream object-oriented languages such as Java, C++ and Python. In our opinion, it is the latter which is the important end-product of this process and it is that which we focus on in this paper. Our justification for focusing on the API is that it should be possible for any data model which supports a complete implementation of the topic map concepts to be mapped to a standard API which provides access to topic map constructs.

Standard APIs are in general a Good Thing. The benefits of a standard API are reduced developer learning curve, application portability and the enabling of widespread library development.

Reduced Developer Learning Curve

By introducing a standard API for topic maps, a developer can take her knowledge from one implementation of a topic map system to another without having to spend significant amounts of time learning a new API. Additionally, by standardising on a single API, training material and developer support communities need not be restricted to the vendors - this would make it possible for a much larger body of training material to be made available to the topic map "newbie". An extant example of this is the number of DOM API training courses which are available (A quick Google search lists 6,260 hits for the search string "DOM API Training")

Application Portability

For organisations making investment in developing topic map applications, the existence of a standard API provides a degree of protection for that investment. The current situation for topic map application developers is that moving between systems would require a complete rewrite of the customised code, so despite having invested in a portable format for information organisation and exchange, customers are still locked-in to a particular vendor's implementation by the APIs used to develop their bespoke applications or application extensions.

Enable Library Development

A standard API for topic map access and manipulation would also allow developers to create higher-level application development libraries and tools which are portable across all systems implementing that API. This could enable the development of high-level toolkits such as standard indexing and querying APIs, toolkits for topic map generation from meta data sources and topic map visualisation and navigation applications. Again, XML's SAX and DOM show the way, with applications ranging from transformation (XSLT) to content management applications (e.g. Cocoon) all being created on top of these standard access and manipulation APIs.

Standard APIs also enable easier integration of the technology within the wide-spread wide-spread consumer applications such as web browsers and operating systems.

Relationship to TMQL

Topic Map Query Language (TMQL) is a proposed work item for both ISO and XTM. TMQL will provide a standardised language for topic map query and update, similar in scope to that of SQL for relational database systems. There is overlap between the purpose of TMQL and that of a standard topic map API in that both are attempting to define a standard means of data access and manipulation. At least one proposal for a topic map querying language defines operations on topic maps which return topic maps as their 'results set'. Such results sets would still require representation in a data model and APIs for accessing that data to be useful to a client application, in this way a standard topic map API would be a natural adjunct to TMQL providing a JDBC-like API for manipulating the results of a TMQL query.

The TMQL effort will also be producing a rigorous formal model of the topic map abstract data-type based on the topic maps Conceptual Model presented in the XTM 1.0 Specification , and on a set of common operations which the topic map user community finds useful. At the time of writing, this is ongoing work for the TMQL group and although some proposals have been made, no initial version was available on which to base a proposal for a programming model, however it will be important that any community effort to develop a standard programming model should pay close attention to this formal model as it is available.

Requirements of a Topic Map Programming Model

The requirements for the programming model are generated from the need to get a potentially very wide audience of developers up to speed with topic map technology as quickly as possible. For this reason, the core focus of our requirements are in two areas: Simplicity and Practicality.

Simplicity

The topic map programming model should be simple to explain and easy to learn. This means that the number of classes and operations should ideally be kept low and that the API should only focus on covering the topic map model itself. Complex extensions such as query languages and inference rules are definitely out of scope. It was more difficult to draw the line in the grey area of other, more common, extensions such as transitive association types and type hierarchies. However, we believe that these common extensions should be easily implemented on top of a core API which is focused only on the representation of the topic map model and so have no place in the API we propose to develop.

To further simplify the programming model, it was felt to be important that the constructs in the programming model should be closely parallel to those constructs found in the syntax. By making parallel structures in the programming model to those which exist in the topic map syntax, we enable programmers who have looked through the XTM specification to quickly get a 'feel' for the API.

Practicality

The API could easily support common topic map operations such as import and export of a serialised form of the topic map; association traversal and direct manipulation of topic map constructs. Providing additional practicality such as merging, indexing and filtering operations within this core API would sacrifice the simplicity principle. By providing a solid foundation, the common extensions should be easy to implement in a layered manner. The precedence for this can be clearly seen in the XML family of standards, with implementations of linking (XLink), path expressions (XPath) and more complex operations such as transformation (XSLT) all being typically implemented upon the DOM abstraction of the XML document.

The programming model must support all those topic map operations which are the most fundamental parts of topic map applications. This means that the programming model must provide the means to achieve at least the following:

  • Topic map parsing - APIs for reading from a serialized syntax into some internal data structure.
  • Topic map manipulation - APIs for manipulating the constructs of topic maps (topics, associations, occurrences etc.)
  • Topic map serialization - the reverse of parsing, these are APIs for creating a serialized version of the internal data structures used to represent the topic map.

The parsing and serialization of topic maps can be considered "utility" functions which are useful additions to a core topic map API. Serialization is merely the process of walking a topic map data structure using the API and generating the appropriate XML syntax to represent the constructs found there. Parsing is the reverse, although the parsing process must also include the processing of the raw parsed data into a topic map in accordance with the requirements of the XTM Specification .

Such an programming model should also provide a solid foundation on which ancillary standards and systems may be implemented. The precedence for this is clearly set by the XML family of standards - systems which implement XSLT and XPath are typically built upon a DOM implementation. The ancillary standards for topic maps are not yet fully defined, work is already started on a query language (TMQL) and a schema/constraint language (TMCL).

Additional Constraints

To set some boundaries on what should be included in a the core topic map programming model, it was decided to constrain the level of functionality that the programming model would be expected to provide. For this reason, the models developed here do not directly address:

  • Representation of the mergeMap construct and the topic map merging process
  • Representation of templating mechanisms.
  • Indexing of topic map constructs.
  • Support for transitive association types or type hierarchies.
  • Maintenance of lexical information about the source XTM document

In many respects, these constraints are common requirements for a topic map programming model, regardless of its form, and this paper is concentrating on the potential differences between various forms of programming model, rather than on the commonalities between those forms. However, the constraints do serve as a useful guide in determining the practicality of a given model as these are all higher-level functions which it must be possible to build on the core programming model.

The development and analysis work presented here also does not delve into the extremely important API issues relating to object deletion, referential integrity, transaction support, duplicate suppression and the handling of merges which take place as a result of manipulating API objects directly. Again, these are all issues which must be addressed regardless of the API used.

Finally, the APIs developed here all take the same approach to the question of multi-valued properties. All properties which may have zero or more items as their value give rise to three API methods - a get() method which returns an array of the value type; an add method() which takes a single parameter of the value type and has no return value; and a remove() method which takes a single parameter of the value type and has no return value. A practical API may also include a set() method and/or a clear() method to further manipulate the property. Additionally, the return type from the get() method could be implemented to return a collection object or an iterator object - if returning a collection object, a design decision must be made as to whether that collection is read-only or read-write.

Architecture Proposals

In this section we present each of the architectures considered as candidates for a topic map programming model.

API-1: DOM Extension Architecture

The XML Document Object Model (DOM) provides a simple abstraction of an XML document as a tree (or collection of trees) consisting of nodes which represent the XML document markup and content. It is popular with developers because of its simplicity - especially for a developer already familiar with the concepts of XML markup - and because of its functionality - for example, being able to locate the DOM node which has a specific ID attribute value, or locating the set of nodes representing elements with a specific tag name.

API-1 is a topic map programming API developed as an extension of the DOM, similar to the HTML extension which is part of the DOM specification. The DOM Node class provides basic node hierarchy operations, such as insert and deletion of nodes, managing a child list of a node and support for both depth-first and breadth-first traversal. The DOM Element class, which is derived from the DOM Node class. The topic map extension provides a set of additional classes, all derived from the DOM Element class. Most of the classes provide no additional functionality other than a labelling function (returning a distinct value for the nodeType property) with the exception of TopicMap, AddressableSubject and NonAddressableSubject which return the URI of the base address of the topic map, the addressable subject or the subject indicator respectively; and TopicReference which returns the type of reference (with distinct values for references made directly to the topic and references made via a subject indicator reference) and the locator used in the reference. The UML diagram in Figure 1 shows the class structure of this architecture. The DOM classes of Node, Document and Element are shown in this diagram along with some of their public functions to give a feel for the range of operations such an implementation would make available to the programmer.

Developing The API

The principal design issue in developing this architecture is the handling of topic references within the constraints of a tree-based architecture. The data model which we are attempting to represent is not a tree, but a graph of interconnected topics. A tree cannot be used to represent a graph (in the general case) without a construct for cross-linking between tree nodes which are not in a direct parent-child relationship. We provide this construct in the form of a TopicReference Node which is a surrogate for a Topic Node. The TopicReference Node must provide a function to resolve the reference to a TopicNode (which will be a direct child of the TopicMap Node).

Another decision, common to development of all the architectures is the representation of syntactic short-cut constructs such as the <instanceOf> element (which is a short-cut for creating a type-instance association between two topics) and names (which are privileged forms of occurrence). For this representation, we choose to match the form of the DOM extensions such as the HTML DOM and directly represent the syntax. This means that type/association equivalencies and other syntactic short-cuts are directly reflected in the model.

The third issue regards the representation of subjects which are not directly represented by topics in the topic map. Such subjects may be referenced from <subjectIndicatorRef> elements in parent elements such as <subjectIdentity>, <instanceOf> and <member>. For this model, we may either simply represent the parsed XTM syntax or else 'normalize' the syntax in some way. One proposed method of normalization is to reify all subjects referenced in the topic map. This means that when a <subjectIndicatorRef> is imported into this model, if its parent is any element other than a <subjectIdentity> element, it is represented by creating a new Topic Node (as a child of the TopicMap node) with no children and a child SubjectIdentity node which contains a single NonAddressableSubject node which has an href Attribute with the same value as the href attribute of the <subjectIndicatorRef> element; the <subjectIndicatorRef> element itself is represented with a TopicReference node which points to the newly created TopicNode. The advantage that this normalization mechanism confers is that the handling of references to non-addressable subjects is simplified somewhat as the application now need only ever deal with a reference to a Topic Node.

The following table and UML diagram illustrate the form of the proposed DOM extension programming model. The table shows the proposed node types for the DOM extension, with an indication of the expected containment hierarchy (the expected parent of an instance of the node type) and a mapping to the XTM element that the node type represents [1]. The UML diagram shows that the classes representing topic map constructs contain no methods or attributes, as all properties of these topic map constructs can be accessed using the DOM Level 1 operations defined by their base classes. However, in practice the methods of the base classes would almost certainly be supplemented by convenience functions if the API were to be fully developed further.

Node Type (Type in the DOM) Node parent Represents
TopicMap (Element) None topicMap
Type (Element) Topic, Occurrence, Association, Member instanceOf and roleSpec
Topic (Element) TopicMap topic
SubjectIdentity (Element) Topic subjectIdentity
AddressableSubject SubjectIdentity subjectIdentity/resourceRef
NonAddressableSubject SubjectIdentity subjectIdentity/subjectIndicatorRef
BaseName (Element) Topic baseName
Occurrence (Element) Topic occurrence
Scope (Element) Association, BaseName, Occurrence scope
Name (Element) BaseName, VariantName baseNameString, variantName/resourceData
Variant (Element) BaseName, Variant variant
Parameters (Element) Variant parameters
Occurrence (Element) Topic occurrence
Reference (Element) Occurrence, Variant resourceRef
Association (Element) TopicMap association
Member (Element) Association members
TopicReference (Element) Member, Scope, Parameters topicRef, subjectIndicatorRef
/images/papers/05-04-03-fig01.gif

Figure 1 - UML for API-1

API Analysis

API-1 offers complete coverage of the XTM syntax, with a class for each of the elements defined in the XTM DTD. As the API is node-based, most of the classes are provided for tagging requirements only. While it would be possible to remove many of the classes shown in Figure 1, these classes do provide the essential hook for extensibility and the development of more complete APIs on a consistent base. Including the DOM classes of Document, Node and Element required to represent a topic map, the API consists of 19 classes and at least 13 class methods (more methods are defined for the DOM classes than are shown in the diagram, but these 13 are the minimum needed to traverse and manipulate the topic map).

Figure 2 shows a simple topic map represented in the programming model of API-1. The associations shown in red between TopicRef objects and Topic objects are generated as a result of evaluating the TopicRef to the Topic it references. The topic map being represented by the data structure shown in this diagram consists of a single association (assoc) between two topics (topic1 and topic2), both of which have a single base name in the unconstrained scope and one of which has an occurrence. The association and the roles of the association are typed by published subjects which are indicated by the reifying topics (rt1, rt2 and at1). This API requires a total of 27 objects to represent the topic map. The large number of programming constructs is due to the closeness of API-1 to the, somewhat verbose, XTM syntax, requiring that <topicRef> and <instanceOf> elements in the XTM syntax of the topic map have matching constructs in the programming model.

/images/papers/05-04-03-fig02.gif

Figure 2 - API-1 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

A more serious criticism of this model is that the ordered-tree model of the DOM is not a suitable model for representing a topic map. The DOM regards node order as important, which is not the case for a topic map [2] . The DOM NodeLists are ordered lists, not sets, and so do not directly support operations such as duplicate suppression which are required for a complete implementation of the topic map model. Most seriously of all, the DOM provides no explicit support for making references between nodes which means that either the extension API must define such support (which could be regarded as breaking the DOM model to support topic maps) or else references can only be supported through the manipulation of DOM Attribute Node values. The API presented here does not create explicit references between nodes, but instead relies on the run-time resolution of attribute values to resolve TopicRef nodes to their referenced Topic node. This form of syntax-based reference is awkward for the developer to create and maintaining integrity of references would be more difficult to implement than in a system which uses direct object-to-object references.

API-2: Graph-based Architecture

The graph-based topic map API architecture is developed from the 4th December 2000 draft of the XTM Processing (XTMP) model This document is now no longer available on the Web, however the model is also substantially similar to the Topic Map Processing Model proposed by Newcomb and Biezunski . The XTMP model document, and the Topic Map Processing Model both define the processing of a topic map document into a graph structure which consists of just three types of nodes and four types of connecting arc between nodes. The nodes represent the basic elements of the abstract data model of topic maps: topics, associations and scopes. The connecting arcs are:

  • Scope Participant Arcs which connect a scope node to a topic node which defines one of the subjects in that scope.
  • Association Scope Arcs which connect an association node to the scope node which defines the context within which the association is considered to be valid.
  • Association Member Arcs which connect an association node to a topic node which plays a role in the association. These nodes are optionally labelled by a further topic node which characterises the role being played in the association.
  • Association Template Arcs which connect an association node to a topic node which defines a template for the association, that constrains the roles and the role players which may be used in the association.

The processing model also defines the concept of the Subject Identity Point which is a binding point where all topics with the same subject are merged. The concept of merging is central to topic maps and subject and subject identity are pivotal to this concept. A subject may be represented in two distinct ways - by reference to the addressable object which is the subject (the subject constituting resource) or by reference to an addressable object which describes the subject (the subject indicator resource) . A subject identity point is shared by all subject indicator resources which describe the same subject (and the subject constituting resource which is the subject, if such a resource exists).

Developing the API

In developing the XTMP model into an API, we have elected to create classes representing each of the node types and class associations to represent the arcs. However, this approach makes it impossible to capture the 'label' property of an Association Member Arc, and so the Association Member Arc has to be promoted to a first-class object and give a property to represent this label.

The second decision to be made regards the representation of the concept of the Subject Identity Point. A Subject Identity Point consists of zero or one subject constituting resources and zero or more subject indicator resources. The XTM Processing Model defines rules which require that in a consistent topic map there be only one topic node for each Subject Identity Point. This one-to-one relationship means that the properties of a Subject Identity Point (the subject constituting and subject indicating resources) may be expressed as properties of the TNode class. Doing this does not prevent the API from representing topic maps which are not consistent as any such topic map would simply include more than one TNode with the same value for either subjectIndicatingResource subjectConstituting resource.

The third issue regards the representation of the class-instance relationship. The XTMP model uses a templating mechanism in order to define the core association type which are required to express class-instance relationships, topic-occurrence relationships and other fundamental relationships of the topic map data model. For this reason, we need to break with our previously imposed constraint against inclusion of templating mechanisms in the programming model and include a template property for an ANode which references the TNode which defines the association template.

Finally an object is required to represent the entire graph with all of its nodes and arcs. This is provided by the TopicMap class which simply serves as a container of topic nodes, scope nodes and association nodes.

The UML diagram of this API is shown in Figure 3. It should be noted that this proposal is very liberal in allowing almost all references between classes to be traversed bidirectionally. It is arguable that bidirectional traversal of properties should be left out of the core programming model subsystem, delegating these reverse look-ups to an indexing subsystem built on top of the core model. However the bidirectional nature of these relationships is part of the essence of the topic map, especially when viewing that topic map as a graph.

/images/papers/05-04-03-fig06.gif

Figure 3 - UML Diagram of API-2

Model Analysis

The programming model developed for API-2 is extremely minimal. It certainly has the desired property of being small in size, just 6 classes (although a total of 40 methods are required to provide complete access to all of the properties of the topic map), but this simplicity is achieved at a cost to practicality as shown by the collaboration diagram in Figure 4. The diagram shows a similar simple topic map to that shown in Figure 2 for API-1, with the exception that to maintain some clarity in the diagram, the occurrence of one of the topics is not shown. Without this occurrence, 31 API objects are required to express the topic map (this total includes the TopicMap object which is not shown in the diagram). With the addition of the occurrence, an extra 6 objects would be required to express the topic-occurrence association template and an extra 5 to represent the topic-occurrence association, bringing the total number of objects required to 42. In practical use, the API complicates the job of the programmer who must be familiar with the XTM Processing Model as well as the XTM Syntax specification to be able to create and manipulate topic maps.

On the positive side, API-2 treats all syntactic constructs, with the exception of the <scope> and <member> and <subjectIdentity> constructs, as TNodes - so reification of topic map constructs other than <topics>s is easily implemented and API-2 also includes full support for the templating mechanism described by the XTM Processing Model, a feature which is not an integral part of any of the other APIs developed here.

/images/papers/05-04-03-fig03.gif

Figure 4 - API-2 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

API-3: XTM Conceptual Model Based Architecture

The XTM 1.0 Specification includes an annex which describes the conceptual model implemented by the specification. As a record of 'what was in the minds' of the group which produced the XTM Specification, this document provides important input into the process of developing a programming model. To examine if this model is sufficient for a programming model, we present here a programming model developed upon the XTM Conceptual Model.

The programming model presented as a pair of UML diagrams below is derived directly from the UML diagrams presented in the XTM 1.0 Specification annex. The first diagram shows the top-level class hierarchy of the API. The class Subject provides the explicit representation of the reification function of topics its relationship with the class Class, is used to represent the various type-instance relationships which exist in the XTM model and represented syntactically by the <instanceOf> and <roleSpec> elements.

/images/papers/05-04-03-fig09.gif

Figure 5 - UML Class Diagram of API-3 - Upper Hierarchy

/images/papers/05-04-03-fig10.gif

Figure 6 - UML Class Diagram of API-3 - Main Classes

Model Analysis

While the Conceptual Model clearly defines the relationship between topics and real world objects (by the use of Subject, NonAddressableSubject and Resource classes), the additional constructs required to do so add three extra classes, complicating the API for developers and causing API-3 to diverge from the XTM syntax. In fact despite consisting of some 13 classes and 40 methods, API-3 suffers from a lack of completeness with respect to coverage of the syntax as the <variantName> syntactic construct is not represented. To represent the <variantName> construct, it is necessary to consider a <variantName> as an occurrence of a topic with a fixed role type and with a scope defined as a union of the subjects referenced from the <parameter> elements of its ancestor <variant> elements and the <scope> element of its ancestor <baseName> element. From a conceptual perspective, this is clean as the two forms (a <variantName> and an <occurrence> of a specific type) are equivalent and it is redundant to include both forms in the model. However, from a programmer's perspective, the need to iterate or search through all of the occurrences of a topic to locate and manipulate its variant names and the lack of the ability to create a nested hierarchy of variant names as provided by the XTM syntax are weaknesses in the programming model.

The way in which API-3 expresses the class-instance relationship is also divergent from the XTM syntax. API-3 allows any Subject instance to be in a class-instance relationship with zero or more Class instances (each of which is a NonAddressableResource). This is an accurate reflection of the underlying model of topic maps. However, the mechanism provided by the XTM syntax for defining class-instance relationships is to reify the Subject and the Classes to Topics and to define a class-instance relationship between the reifying topics. This syntactic mechanism should be more directly supported by a programming model to enable simpler import and export of XTM syntax data to/from the programming model, and to improve the mapping between the syntax and the programming model for developers already familiarity with the XTM syntax and the mechanism of reification. That said, Figure 7 shows how much simpler this makes the collection of objects required to express a sub-type/super-type relationship. The core concepts of sub-type, super-type and the sub-type/super-type association are represented as three Class objects (which are derived from Subject and so may have 0 or more SubjectIndicators), without the need for creating topics to reify the subjects. This means that only 16 objects are required to express the topic map (including the TopicMap object which is not shown in the diagram for clarity).

/images/papers/05-04-03-fig04.gif

Figure 7 - API-3 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

API-4: Modified Conceptual Model-Based Architecture

API-3 shows great promise as a base from which to develop a practical topic map programming model. To refine the model, we need to reduce the complexity of the representation of the relationship between a topic and a subject and we need to add the necessary classes to enable the syntactic constructs of <variant> and <variantName> to be represented more explicitly in the programming model.

Developing The API

To simplify the upper hierarchy of API-3, we choose to instead implement the Topic/Locator relationship from API-2. We remove Class, Subject, Resource and NonAddressableResource, and add the Locator class, making it the type for the properties of subject and subjectIndicator (changed from subjectConstitutingResource and subjectIndicatingResource, to more closely match the current state of the XTM Specification).

Having removed Subject and Class from the class hierachy of the API, we must replace the, now removed, class-instance association for at least Topic, Association and TopicCharacteristic. Any class-instance relationship in a topic map may be represented as an association between a topic reifying the typed node and the topic or topics which reify the class of the node. However, the most common syntactic form of representation is the use of the <instanceOf> child element for the typed node, which is equivalent to defining an association between a topic reifying the typed node and the topic or topics reifying the class of that node. Thus the mechanism for the representation of type-instance associations in the programming model are directly related to the mechanism used for representing the reification of constructs of the topic map (such as associations, members, occurrences and so on). The means of representing reification of topic map constructs within a programming model may be broadly divided into the "implicit reification" of topic map constructs and the "explicit reification" of the constructs. Implicit reification requires that the programming model's class hierarchy acknowledges that all topic map constructs may be reified and so may exhibit any of the properties of a Topic. Typically, this would be done by making the Topic class a super-class of all other classes representing topic map constructs. Implicit reification is already a part of API-2. Explicit reification requires that the programmer control reification through the creation of a Topic object which regards another topic map construct as its subject indicator - typically an explicit reification programming model provides no direct support for reification beyond a means to uniquely and persistently identify any topic map construct. API-1 and API-3 are examples of explicit reification programming models.

/images/papers/05-04-03-fig08.gif

Figure 8 - Modified API-3 Class hierarchy with Implicit Refication

/images/papers/05-04-03-fig07.gif

Figure 9 - Modified API-3 Class hierarchy with Explicit Reficiation

The implicit reification programming model (shown in Figure 8) has the advantage that a programmer is not required to perform any operations to establish the reification of a topic map construct. If she wants to give an Occurrence object a BaseName, this may be achieved simply through the inherited functions of the Topic class. The explicit reification model (shown in Figure 9) has the advantage of being more closely related to the XTM syntax, however, in this case, maintaining a close relationship with the syntax is exposing the developer to one of the weaknesses of the serialized syntactical form, so in this case a divergence from the API may be justified in that it delivers far greater functionality and there is a very clear relationship between the syntactical form of reification and its representation in the programming model. To support implicit reification of all topic map constructs, we choose to make Topic a super-type of Association and Member. The class TopicCharacteristic has no methods or properties which are common to its subclasses - so it is removed from the class hierarchy.

In order to support the class-instance association in its commonly-used syntactic short-cut form (using <instanceOf> to represent a class-instance association in the unconstrained scope), we define the classes property for a Topic as a collection of Topic objects. The classes property represents only the class-instance associations in the unconstrained scope. To get all types in a particular scope, we must define a helper operation getTypes(Scope s) which returns all of the Topic objects which play the role of class in a class-instance association in which the current topic plays the instance role, where the association member characteristics of each role are in the scope s. It is not necessary to provide an equivalent setTypes() function as this operation may be implemented by creating an Association object which links the Topic object and its type. As we have already solved reification by deriving all other topic map constructs from Topic, the solution for Topic applies equally to all other topic map constructs represented in the API.

To complete the coverage of the XTM syntax in API-4, two new classes are added, Variant and VariantName. Initially, both Variant and VariantName are derived from TopicNode. However, as both BaseName and Variant share the property of a list of child Variants, the API is extended to introduce a common super-class, VariantContainer. The resulting API class diagram is shown in Figure 10

/images/papers/05-04-03-fig11.gif

Figure 10 - API-4 final class diagram

API Analysis

API-4 maintains a very close mapping to the XTM syntax. All of the syntactic constructs can be mapped to a class or property in the API which in most cases shares the name of the syntactic construct. The only construct without a direct mapping is the <subjectIdentity>, the content of which is represented by the subjectIndicators and subject properties of the Topic class. This complete coverage is costly in terms of additional classes and functions, bringing the total size of this API to 11 classes and 48 methods. Much of the complexity of the API is contained in the Topic class (with 13 class methods) which is the super-type for most of the other classes. Figure 11 shows that for our simple example, API-4 proves no more complex than API-3, requiring 16 objects (including the TopicMap object which is not shown) to represent the topic map.

/images/papers/05-04-03-fig05.gif

Figure 11 - API-4 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

[1] Where a node type represents an element in context, the context is shown using XPath slash-separated path syntax.
[2] In fact, it could be argued that for certain applications, it is desirable for the ordering of objects in the source topic map to be preserved. This is especially the case for editing applications where the ability to 'round-trip' a file without significant alteration to its content is often seen as a desirable feature. Back

Beyond The Data API

So far, the APIs presented have focused on one task only - the representation and manipulation of topic map data. However, it is recognised that an effective topic map application will require additional services. The representational APIs presented here should be used to form the core of a modularized topic map API. Modularization helps in a number of ways. For the implementor, modularization enables implementors to manage the trade-off between support for a complete API and the complexity involved in implementing all of that API. Additionally, modularization makes it possible for "standard" implementations of higher-level modules to be created which make use only of the API of lower level modules. Just as an XSLT processor might use any underlying DOM implementation, so a module of topic map indexing services could be written to make use of any underlying core data representation and event representation modules. For a developer, having clearly defined modules enables her to choose which implementation to use based on the modules she requires for the task, or to create applications which use an easily specified subset of the functionality of a "complete" topic map API.

The following diagram shows a proposed set of packages for a complete topic map API. The package core contains the data representation classes such as those APIs presented in this paper. The events package is used to provide event/action-based handling of updates to the topic map data. The index package provides standard indexing services for the topic map data (e.g. index of objects by their type, or an index of topics by their subject or subject indicators). utils is a catch-all package for additional convenience functions and indexes and may require further refinement. In addition to these basic modules, we show modules handling the streaming of topic maps either for serialization and deserialization to interchange syntaxes such as XTM 1.0, or simply streaming topic map objects found during a walk of the data set. and the XTM 1.0 model.

/images/papers/05-04-03-fig12.gif

Figure 12 - Modules of a Topic Map API

Conclusions

The one thing which is striking about the topic map paradigm is the number of different ways in which the same underlying conceptual model can be represented both syntactically and programmatically. These differences do not (we hope) reflect a difference in understanding of the topic map model, but rather stem from the desire to make "short-cuts" for various topic map model constructs (the issue of reification being a good example, as is the choice to represent a class-instance relationship with a short-cut element syntax as opposed to the 'pure' form of an association between two topics). A topic map programming model is another area where the same trade-off between simplicity and practicality must be made, but the requirements are subtly different from those of a programming model and it will probably prove to be the case that the programming model and the syntax model will never be fully convergent for this reason.

Comparison of the APIs

The following table shows a side-by-side comparison of the four APIs developed here. The API size index is computed as (number of methods + (number of classes x 2)). Constructor and destructors are not included in the method count. The representation complexity index is computed as the number of distinct objects required to represent our sample topic map of two named topics, one with a single occurrence, in a sub-type/super-type relationship. The syntax convergence index is subjectively defined on a rating of 0 (poor convergence) to 3 (perfect convergence).

API Classes Methods Size Index Representation Complexity Syntax Convergence
API-1 19 14 52 27 2.5
API-2 6 40 52 42 0
API-3 13 40 66 16 1
API-4 11 48 70 16 2

A purely statistical comparison such as this is somewhat inconclusive, especially with such a limited set of use cases to determine the Representation Complexity index, and with no historical data to more accurately weight the method and class counts. In practice, we believe that the real reasons for choosing one API over another are not so much the size and representational complexity issues as they are an issue of having a basic data model suitable for the representation of a topic map and convergence with XTM syntax to flatten the learning curve.

As explained in the analysis for that model, API-1 is not really suitable for representing a topic map as it is based on an programming model originally developed for the representation of an ordered tree data structure rather than the graph structure of a topic map. The additional mechanisms which would be required to support the set operations and node reference operations are beyond the scope of the DOM specification and would make a DOM extension for topic maps considerably more heavy-weight than the HTML extension. For these reasons, API-1 is felt to be a weak candidate.

API-2 captures the graph nature of the topic map and encapsulates the complete model in a very small API. However, the lack of constructs for the direct representation and manipulation of common topic map objects such as occurrences and baseNames is a serious drawback. Requiring that a programmer be familiar with both the syntax and a processing model makes the learning curve for this API steeper than any of the others presented here. This issue alone makes API-2 unattractive as a standard programming model, although its simplicity makes it an interesting candidate for a data model for topic map storage. It was felt that if API-2 were to be implemented, there would be a need for a higher-level API layered on top providing direct access to an manipulation of topic map constructs other than topic, association and scope, and given that this was the case, the final, higher-level, API might well look more like API-3 or API-4.

The difference between API-3 and API-4 is relatively minimal, the treatment of class-instance association being the major departure. The treatment in API-3, making a Class a first-class object is a divergence from the XTM syntax and given that a class can only be represented by a topic in the syntax, it would seem that Class is unnecessary in a programming model, so in this respect, API-4 is to be preferred to API-3. API-4 is also more complete in its coverage of the syntax, providing a class to represent the <variant> and <variantName> constructs.

It is acknowledged that the comparisons presented here are limited in scope and it would be hard, on the basis of these results alone to defend the selection of any one of the presented programming models over the others. However, from this preliminary work it would appear that the modified conceptual model, with implicit reification as presented in API-4 may be the most effective of the models.

A further practical consideration in the evaluation of these APIs for the development of a common topic map API is that a number of implementations of topic map APIs are already developed and deployed. It is likely, therefore, that any common topic map API will have to also take into account the design and implementation decisions made in those APIs in order to achieve a broad recognition and support.

Future Work

The work presented here is acknowledged to be light on analysis. Much more rigorous assessment of the practicality of the different programming models needs to be undertaken, both in terms of more complex representation examples and in source code measurements for implementation of topic map processing functions under the different models. In addition, other formal models of the topic map paradigm are under development: Topicmaps.Org is in the process of developing a standard topic map processing model; and the TMQL effort will also involve the development of a formal model of the topic map abstract data-type based on the conceptual model and on a set of common operations which the topic map user community finds useful. Both of these efforts must be considered in any further work towards the development of a standard topic map API.