rdfQuery + OpenCalais + Cloud Storage = Personal Knowledge Base ?

Last night’s Oxford SWiG meeting was interesting and sociable as usual. There were three great presentations – Jeni Tennison on rdfQuery, a jQuery-like Javascript library for parsing, querying and generating RDFa markup; Iain Emsley presented a WP plug-in that creates RDF graphs for blog posts showing a nice use of multiple ontologies; Laurian Gridinoc talked about the plans for PowerMagpie – with lots of ideas for navigation/presentation of large taxonomies and complex ontologies.

As usual though the real action was in the pub. One of the things we got to discussing was whether rdfQuery could be used to create stand-off markup on someone else’s content. Inigo Surguy pointed out that using tools such as Greasemonkey it should be pretty easy to get rdfQuery to scrape a page for the RDFa it contains and to add custom scripts to do something cool with that data. The problem comes when trying to persist any new RDF statements you might create. RDFa is a syntax for embedding RDF within HTML – so if you are in control of the page that you are adding the markup to, it is trivial to persist that markup simply by saving the modified file. If you are not in control of the page then you have some problems. The easy case is when the publisher of the page has already identified things that you might want to talk about and wrapped them in some RDFa. In this case you can simply add some more statements about those entities. What is harder is if the publisher of the page hasn’t marked up anything with RDFa. What is needed is a “bootstrap” mechanism to locate entities that you might want to talk about.

That is where OpenCalais comes in. The OpenCalais service takes content and locates entities within it, returning the content with markup added that identifies the entities within. Using some custom code interfacing to rdfQuery, it should be possible to turn the results from OpenCalais into RDFa, then you can do all the funky stuff you want with the RDF and serialize it to some persistent store (either on another web service such as the Talis platform or maybe to a local persistence mechanism such as Gears). Now, when you return to the page, your script again goes to OpenCalais to get the entities identified within and again turns this into RDF, but now you can smoosh in the RDF from your persistent store to retrieve all that cool markup you added.

What’s even better is that because OpenCalais has unique identifiers for the entities it recognizes, if you then visit another page that contains a reference to the same entity you should be able to pull in your extra markup automatically. I’m pretty sure that with this approach it should be possible to build up a personal knowledge store that can be merged into web pages as you view them, combine with some clever javascript to present that information and to allow you to extend the set of statements in the store and you have something really rather cool.

Just need to code it now ;-)

  • http://opencalais.com Tom Tague

    Kal:

    Tom Tague from Calais here.

    Some very interesting ideas in here. A couple of additional points to think about:

    First, yes Calais does provide consistent identifiers. But one thing to keep in mind is that those identifiers are only disambiguated for a subset of the entity types we recognize (Company, geography, etc). So “John Doe” “John Doe” with any degree of certainty across sites. On the other hand “IBM” = “Taligent” = “International Business Machines” across all sites.

    Second, don’t forget about facts and events! While entities are great – being able to create community / organize information around events (e.g. XYZ Corporation announces Bankruptcy, a Tsunami occurred in Tibet) could be equally interesting.

    Can we take it beyond a personal knowledge store to a shared store with personal spaces? The collaboration opportunities would be pretty great.

    When can we expect a demo?

  • Kal

    Hi Tom,

    Thanks for the comments. You are right on the money about facts and events – they are definitely at least as interesting as entities, maybe more so because they form a nexus for entities and this mechanism could be used to capture opinions/assertions about the interactions between entities and events.

    As for a demo…well there is a BarCamp coming up in Oxford fairly soon and I’m toying with the idea of pitching for some help with this there. But maybe it would be worth my while getting some basic crufty demo together before that and then letting the Javascript/design gurus make it work better/look prettier. If only there were 24 more hours per day…