Table of Contents
This paper discusses TMShare, a peer-to-peer information-sharing application based on topic map technology.
The paper begins by discussing the objectives of the application. Primarily, this application has been developed as a test-bed for an innovative approach to topic map information discovery and integration based on decentralised administration and pools of information, which use controlled vocabularies in the form of subject indicators to enrich each other.
The main focus of this paper is on the role of topic maps in the data model for the system and specifically the approach taken to exchange of topic map data between peers. The system as currently implemented supports three different types of request which are presented in detail. Regardless of the type of request issued, the response received is always expressed as a topic map fragment. This fragment is a complete topic map consisting of only those topic map objects which match the query and sufficient context objects to make the topic map meaningful.
Finally, some future plans for the application are discussed and the possibility of integration with more traditional server-based topic map applications are explored.
TMShare was created in order to achieve a number of goals:
to test the use of topic maps in a highly distributed information sharing environment;
to explore new ways in which peer-to-peer networks can be used to share information rather than content;
as a practical application of the TM4J topic map engine .
This paper focuses principally on the first of these goals and in particular on how the exchange of subsets of topic map information between peers can be used as the basis for information sharing in a distributed environment. The author has chosen to call these exchanged subsets topic map fragments. The paper describes the techniques used by both the information provider and the information receiver in an exchange to select the subset to be encoded as a topic map fragment and to incorporate multiple topic map fragments to present a coherent view of the information available from a collection of interacting peers. Topic map fragments are combined with each other and with locally held topic map information using the topic map merge process defined in the XML Topic Maps specification . The principle feature of that process is that when two topics are determined to represent the same subject, all of the information provided by those topics are combined into a single topic in the merged result. Although the context described here is that of a peer-to-peer application, these concepts of selecting and interchanging topic map fragments can be applied to more traditional client-server applications.
The exchange of small subsets of topic map information is the principle form of communication between TMShare peers. A topic map fragment is a self-contained topic map, expressed in the XTM 1.0 interchange syntax. The fragment is created as a copy of constructs from one or more source topic maps. The constructs which are copied are based on a simple algorithm which is described in more detail below.
Requests are issued by TMShare peers either as a result of the user browsing to a particular topic in a local topic map or in the cache of topic map information received from previous requests; or as a result of a direct invocation of the query by user action. Requests fall into one of three categories:
A request for a specific topic by the topic's object ID.
A request for all topics which represent a specific subject by specifying a subject indicator or subject address URI.
A request for all topics which satisfy a tolog query.
Request Topic By ID
This type of request requires that the sending peer be already aware of a particular topic map object that it requires more information on. This knowledge may have been acquired by some process external to the operation of TMShare (e.g. the object ID was provided as a link in an email from a colleague); or it may make use of information about topic map objects received on previous requests.
The ability to specify one particular topic map object depends upon a scheme for globally unique identifiers for topic map objects. TMShare achieves this by use of the globally unique peer identity strings provided by JXTA — each topic map object ID is constructed from a concatenation of the JXTA peer ID; the topic map identifier assigned to the topic map by the peer which manages the topic map; and the ID assigned to the object within the topic map. As JXTA peer identifiers are universally unique; and topic map identifiers assigned by a peer are required to be unique across all topic maps managed by that peer, the individual identifiers assigned to topic map objects can be shown to also be universally unique.
A useful side-effect of the object identification strategy is that by making use of the JXTA peer identifier as the root of the object identifier, a peer can quickly identify the source of a particular topic map object and if necessary, use that identifier to establish a direct connection to the source.
Request Subject By Subject Indicator or Subject Address
This type of request allows a requesting peer to specify one or more subject indicator or subject address URIs. In a topic map, any given subject may have one or more subject indicators, which are identifiers for the subject encoded as URIs. The subject indicator URI is independent of the address of the topic or topics that describe the subject. As a result, multiple peers may all possess different information regarding a subject identified using the same subject indicator URI. By broadcasting a request for information pertaining to a subject by using a subject indicator URI a peer may expect multiple responses, which may support or conflict with each other. It is the responsibility of the requesting peer to determine how to handle any conflicts that may arise.
Due to the location-independent nature of subject indicators, these are the preferred means of requesting additional topic map information from remote peers when browsing a local topic map. By default, a TMShare application will send a request using this mechanism to all members of the TMShare network whenever a topic is selected in the browser interface that has one or more locally specified or cached subject indicators associated with it. To minimize traffic between nodes, a TMShare peer keeps track of the most recently sent requests and will not rebroadcast requests for those subject identifiers which were last requested within a configurable time-out period.
Request By Tolog Query
The final type of request allows a peer to retrieve a more restricted set of information pertaining to a given subject. The tolog language allows queries to be expressed in terms of the association relationships between subjects. For example the tolog query:
is a request to peers to return a topic map fragment in which the principle subjects represented are all performers of the song "I Fought The Law" as identified by the (fictional) subject indicator URI " http://www.songz.com/I/I_Fought_The_Law"
Tolog queries result in a two-dimensional table of results, much like that created by an SQL query of a relational database. The standard fragment construction algorithm loses the information about which topics in the fragment are actually part of the tolog query result set and which topics are merely added to the fragment as part of the fragment-building algorithm. In addition, a tolog query may actually request bindings for multiple variables. For example, the query:
select $A, $B from recorded-by($A:song, $B:artist), written-by($A:song, i"http://www.songz.com/perry_lee")
requests all songs written by Lee Perry and the artists who recorded those songs. The algorithm for fragment construction will lose the information which indicates which topics are bound to the variable $A and which are bound to the variable $B, and which combinations of the variable bindings match the query criteria.
The topics that are bound to variable in the result set can be easily incorporated into the topic map fragment sent back as the result. However, the incorporation of the information about which combinations of variable bindings are valid is more complex. This is dealt with later.
When a peer receives a request for a topic map fragment, the request query is executed against the unified view of both locally held topic map information and cached topic map information received from other peers. In this way it is possible for topic map information to be propagated around a network of peers and to be stored on multiple peers, thus increasing the chance that it will be found when requested.
Each of the requests resolves to a set of topics which meet the request criteria. This set is then processed to construct a topic map fragment to be returned. A topic map fragment is a subset of the topic map information maintained by the peer which is centred on the topic or topics in the result set of the request. The basic topic map fragment construction process is governed by the following inputs:
The set of topics that match the query criteria.
The breadth of the fragment requested by the querying peer.
The maximum breadth of fragment that the responding peer is prepared to send.
The responding peer first determines the breadth of fragment that will be sent. This is the lesser of the breadth of fragment requested by the querying peer and the maximum breadth of fragment that the responding peer is prepared to send.
Fragment construction copies topic map objects from the source topic map or topic maps to the result fragment. The process uses the concept of a shallow copy of a topic and an association, and the concept of stub topics. A stub topic is a copy of the source topic which excludes any typing information for the topic, any occurrences of the topic, all roles played by the topic, and all names which are not in the unconstrained scope. A stub topic must always have at least one subject indicator URI which may be later used to retrieve all the information which is not copied to the stub. If the topic does not have a subject indicator URI assigned to it, then the address of the source topic is used as the stub's subject indicator URI.
A shallow copy of a topic results in a copy of all of the characteristics of the topic, except for the roles played by the topic in any associations. In a shallow copy of a topic, only the topic's subject identifier; subject indicators and unscoped base names are copied. In addition, all topics used to define the type or types of the shallow-copied topic are copied as stub topics.
A shallow copy of an association results in a copy being made of all roles of the association, but the association type, the themes in the scope of the association and all association role players are copied as stub topics.
The algorithm for fragment construction is best described in pseudo code.
let deepCopies be the empty set let shallowCopies be the empty set let maxRadius be the width of the fragment required let resultsSet be the set of topics which matches the query criteria for each r in resultsSet: addTopic(r, 1) addTopic(t, depth): if (t in deepCopies): return if there is a copy of t in shallowCopies: let cp be the shallow copy of t that was already made else: let cp be a new shallow copy of t if (depth < maxRadius): for each x which is a typing topic of t: addTopic(x, depth + 1) for each c which is a scoped name or variant name of t: add a copy of c to cp for each x which is a theme in the scope of c: addTopic(x, depth + 1) for each o which is an occurrence of t: add a copy of o to cp for each x which is a theme in the scope of o: addTopic(x, depth + 1) let ot be the topic defining the type of o addTopic(ot, depth + 1) for each a which is an association in which t plays some role: addAssociation(a, depth) if cp is in shallowCopies: remove cp from shallowCopies add cp to deepCopies else: add cp to shallowCopies addAssociation(a, depth) let cp be a shallow copy of a if (depth + 1 < maxRadius): let at be the topic which types a addTopic(at, depth + 1) for each role r in association a: let mt be the topic which defines the type of role r addTopic(mt, depth + 1) for each p which is a topic playing role r in association a: addTopic(p, depth + 1)
Because stub topics are always created with at least one subject indicator, this algorithm ensures that in the case where a topic is first copied as a stub and is then later in the fragment building process found to actually be within the breadth of the fragment then the stub and the full copy of the topic will be merged. In addition, this property of stub topics ensures that a peer is able to combine the fragments received from several sequential queries, again using the subject-based merging principle of the topic map paradigm.
The diagram below shows the effects of this fragment construction algorithm. It shows a schematic view of a topic map as a collection of topics connected by associations. For the purposes of this explanation, this view does not show topics used for scope or types. The upper half of the diagram shows the topics and associations that are copied to a fragment with a breadth of (1); the lower half shows the topics and associations that are copied to a fragment with a breadth of (2). The numbers inside the circles represent the number of hops from the root topic of the fragment.
As already discussed, a tolog result set consists of a collection of topics structured into what can be thought of as a two-dimensional table of results. Simply adding the topics in the result set to the result fragment would not preserve the structure of the result set, which is important information in being able to present the results of the query meaningfully.
To overcome this problem, two different approaches to enriching the result set were considered. One approach embeds the structure of the result set into the topic map fragment generated by the peer. The second approach separates the description of the result set structure from the topic map fragment generated from the results.
Enrichment By Adding Topic Map Constructs
This approach enriches the topic map fragment created from the result set by adding topics that represent the variables and associations that aggregate the topics on each row of the tolog result set.
For each variable in the 'select...from' part of the tolog query, one topic is created to represent the variable. For each combination of variable bindings that match the query criteria, one association is created to represent that particular combination. The association created has one role for each variable binding in the select clause where the role player is the topic bound to that variable.
For example, if the tolog query given above resulted in finding the song "I Fought The Law", performed by The Clash, the tolog result set could be represented as the table shown below (the URIs in parenthesis are fictional subject indicators for the result topics).
I Fought The Law (http://www.songz.com/songs/I/I_Fought_The_Law)
The Clash (http://www.songz.com/artists/C/Clash_The)
In XTM syntax, this tolog result set would be represented as:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE topicMap SYSTEM "xtm1.dtd" > <topicMap> <!-- A topic to type the variable-binding association(s) --> <topic id="row"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/result-set-row"/> </subjectIdentity> </topic> <!-- Topics representing the variables in the select clause of the tolog query --> <topic id="var_A"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/variable/A"/> </subjectIdentity> </topic> <topic id="var_B"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/variable/B"/> </subjectIdentity> </topic> <!-- Topics in the tolog result set --> <topic id="i_fought_the_law"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.songz.com/songs/I/I_Fought_The_Law"/> </subjectIdentity> <!-- Other topic characteristics --> </topic> <topic id="the_clash"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.songz.com/artists/C/Clash_The"/> </subjectIdentity> </topic> <!-- This association represents one row in the result set --> <association id="row1"> <instanceOf> <topicRef xlink:href="#row"/> </instanceOf> <member> <roleSpec><topicRef xlink:href="#var_A"/></roleSpec> <topicRef xlink:href="#i_fought_the_law"/> </member> <member> <roleSpec><topicRef xlink:href="#var_B"/></roleSpec> <topicRef xlink:href="#the_clash"/> </member> </association> <!-- Other topic map constructs in the result set fragment --> </topicMap>
This approach integrates the structure of the tolog result set directly into the topic map that is generated by the query processing. This has the advantage that the structure information can be extracted without the need to implement any new form of parsing. However, this approach also complicates the issue of caching topic map fragments received in response to a tolog query as it is unlikely that the information regarding the structure of a particular result set is required by the client once the tabular display of the results for a particular query is closed. For this reason, it was decided to examine a different approach which separates the result set topic map fragment from the information regarding the structure of the result set.
Enrichment By A Parallel XML Document
This approach to representing the results of a tolog query separates the topic map fragment created from the result set from an XML document which describes the result set structure. The topic map and the result set structure document are both then included in the response message generated by the peer after processing the request.
Using this approach, the response message generated by a TMShare peer on processing a tolog query request has the structure shown below:
<?xml version="1.0" encoding="UTF-8"?> <tmshare:TologResult xmlns:tmshare="http://jxta.org"> <query> <line>select $A, $B from recorded-by($A:song, $B:artist), written-by($A:song, i"http://www.songz.com/lee_perry") ?</line> </query> <results> <numCols>2</numCols> <vars> <var>A</var> <var>B</var> </vars> <row> <value>http://www.songz.com/songs/I/I_Fought_The_Law</value> <value>http://www.songz.com/artists/C/Clash_The</value> </row> </results> <resultsMap> <xtm:topicMap xmlns:xtm="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"/> ... </topicMap> </resultsMap> </tmshare:TologResult>
It should be noted that the JXTA APIs used by TMShare do not allow the message to be constructed in such a way that it would parse as a well-formed name-spaced XML document. For this reason, the XTM syntax topic map is escaped in the response message, enabling it to be extracted as a single string by the receiving peer for processing through TM4J's XTM parser, while the XML description of the tolog result set structure is accessed through the JXTA APIs.
This approach enables those subsystems of the TMShare application which are interested in presenting the structure of the tolog result set (such as the tabular display of results) to access that information while not importing that information into the general cache — thus preventing it from cluttering the browser view of the unified topic map.
When a peer receives a fragment in response to a request, the topic map fragment is imported into the local cache. This cache is itself a single topic map, so the import process is a merge of the received fragment with the cache topic map using the standard rules defined by the XTM specification.
Before being imported, however, the received topic map fragment is enriched by adding to it topic map constructs which enable the provenance of the topic map information to be determined. This provenance information consists of a topic that represents the peer from which the fragment was received and a topic that represents the received message itself. This latter is called the message provenance topic.
The message provenance topic is used as an added theme when the topic map fragment is merged with the cache topic map. This results in the provenance topic being used as a theme in the scope of every topic characteristic provided by the received topic map fragment. The topic that represents the peer is associated with the message provenance topic making it easy to locate all of the topic map information provided by a specific peer. In addition, a message provenance topic has a single occurrence that specifies the time at which the message was received. In combination, the peer and message provenance topics can be used to:
filter out of the UI all information received from a specific peer;
expunge information from the cache after a given period;
identify to other peers what information provided in a response is information received from other parties.
The advantage of having this provenance information as part of the topic map used to cache received information is that the standard topic map filtering and extraction tools provided by the TM4J library can be used to implement all of these features and it also makes it possible for later extensions such as peer trust metrics to be implemented without requiring a change in the format of the request and response messages used by peers.
The TMShare application shows that the interchange of topic maps need not be restricted to the transfer of large XTM syntax files, but that the application of a simple algorithm can be used to enable the incremental discovery of information by a process of repeated queries. Although presented here in the form of a peer-to-peer application, the process is essentially that of request and response and so would be equally suited to a more traditional client-server architecture, for example by implementation as a web service.
Refinements to the Fragment Construction Algorithm
The fragment construction algorithm is currently based purely upon the number of hops between a topic and the root topic or topics of the fragment. It has been proposed by Barnell that other criteria can be applied to the construction of subsets of semantic data. Barnell's proposal of filtering against vocabularies could also be applied to the fragment construction algorithm presented here.
Integrating Other Data Sources
The peer-to-peer approach used by TMShare leads to some interesting possible application areas. As peers discover each other through the JXTA protocols, a new peer providing new information can be added to the network without the need for existing peers to be updated in any way to take advantage of the new information source. If information resources such as databases, content management systems and personal information management systems could be mapped to topics and associations in a topic map and if such a mapping could be performed on the fly by an agent acting as a peer in a TMShare network, then that information could be made available to all other peers on the same network, enabling integration of diverse information resources without the administrative overhead of integrating those resources on each client.
Peer Trust Metrics
It is realised that the current TMShare application is far from perfect. Principle failings include the current lack of any form of trust metrics, however the principle of topic map fragment enrichment by peers may provide a solution at least to the interchange of trust metrics.