As usual though the real action was in the pub. One of the things we got to discussing was whether rdfQuery could be used to create stand-off markup on someone else’s content. Inigo Surguy pointed out that using tools such as Greasemonkey it should be pretty easy to get rdfQuery to scrape a page for the RDFa it contains and to add custom scripts to do something cool with that data. The problem comes when trying to persist any new RDF statements you might create. RDFa is a syntax for embedding RDF within HTML – so if you are in control of the page that you are adding the markup to, it is trivial to persist that markup simply by saving the modified file. If you are not in control of the page then you have some problems. The easy case is when the publisher of the page has already identified things that you might want to talk about and wrapped them in some RDFa. In this case you can simply add some more statements about those entities. What is harder is if the publisher of the page hasn’t marked up anything with RDFa. What is needed is a “bootstrap” mechanism to locate entities that you might want to talk about.
That is where OpenCalais comes in. The OpenCalais service takes content and locates entities within it, returning the content with markup added that identifies the entities within. Using some custom code interfacing to rdfQuery, it should be possible to turn the results from OpenCalais into RDFa, then you can do all the funky stuff you want with the RDF and serialize it to some persistent store (either on another web service such as the Talis platform or maybe to a local persistence mechanism such as Gears). Now, when you return to the page, your script again goes to OpenCalais to get the entities identified within and again turns this into RDF, but now you can smoosh in the RDF from your persistent store to retrieve all that cool markup you added.
Just need to code it now