I want to share a few notes about viewing the embedded data from RDFa pages,as a sort of mini-guide for anyone interested.
The thing to get out of the way upfront is that the easiest thing to extract look the ugliest and is often hard to follow. Its worth taking a few precautions to avoid the horror of machine generated RDF/XML. So,install the Tabulator Firefox extension from MIT and find the button labelled “N3″–it looks like a dense network icon. Hit that for a compact text based view,and un-toggle the default Tabular as screen space requires. The default is the loose network icon.
To actually extract the data,use the RDFa Distiller service. Put in a URL and this service gives you ugly RDF/XML by default,but the Tabulator extension comes to the rescue. With Tabulator hitting “Go”gives you –unsurprisingly –a table and switching completely to N3 is just two clicks.
In table mode,Tabulator will pick up and cache labels for things as it goes along and will use the last bit of the URL if it doesn’t have a label yet. If your URLs look ugly,then the view in Tabulator will look ugly –hopefully your URLs are pretty.
Pretty can also be bad,especially if the unique part of a URL is at the front. For example:
- http://feelitlive.com/events/2009/7/3/W2/2UH/Hyde+Park/Blur#event
- http://feelitlive.com/events/2009/7/4/HA9/0WS/Wembley+Stadium/Take+That#event
- http://feelitlive.com/events/2009/7/5/SW7/2AP/Royal+Albert+Hall/The+Killers#event
will all be rendered as “event”–very confusing! If this happens,switching to N3 may be the way forward. Part of the problem seems to be that Tabulator does not read RDFa on its own,which makes it harder to access the RDF and harder for Tabulator to calculate good labels. Apparently the next version will read RDFa –great.
Those links again:
Recent Comments