On Thursday last week I flew to Perth, in Western Australia, to speak at an event at Curtin University on visualisation of cultural heritage. Erik Champion, Professor of Cultural Visualisation, who organised the event, had asked me to talk about digital heritage collections and Linked Open Data (“LOD”).
The one-day event was entitled “GLAM VR: talks on Digital heritage, scholarly making & experiential media”, and combined presentations and workshops on cultural heritage data (GLAM = Galleries, Libraries, Archives, and Museums) with advanced visualisation technology (VR = Virtual Reality).
The venue was the Curtin HIVE (Hub for Immersive Visualisation and eResearch); a really impressive visualisation facility at Curtin University, with huge screens and panoramic and 3d displays.
There were about 50 people in attendance, and there would have been over a dozen different presenters, covering a lot of different topics, though with common threads linking them together. I really enjoyed the experience, and learned a lot. I won’t go into the detail of the other presentations, here, but quite a few people were live-tweeting, and I’ve collected most of the Twitter stream from the day into a Storify story, which is well worth a read and following up.
For my part, I had 40 minutes to cover my topic. I’d been a bit concerned that my talk was more data-focused and contained nothing specifically about VR, but I think on the day the relevance was actually apparent.
The presentation slides are available here as a PDF: Linked Open Data Visualisation
My aims were:
- At a tactical level, to explain the basics of Linked Data from a technical point of view (i.e. to answer the question “what is it?”); to show that it’s not as hard as it’s usually made out to be; and to inspire people to get started with generating it, consuming it, and visualising it.
- At a strategic level, to make the case for using Linked Data as a basis for visualisation; that the discipline of adopting Linked Data technology is not at all a distraction from visualisation, but rather a powerful generic framework on top of which visualisations of various kinds can be more easily constructed, and given the kind of robustness that real scholarly work deserves.
Linked Data basics
I spent the first part of my talk explaining what Linked Open Data means; starting with “what is a graph?” and introducing RDF triples and Linked Data. Finally I showed a few simple SPARQL queries, without explaining SPARQL in any detail, but just to show the kinds of questions you can ask with a few lines of SPARQL code.
While I explained about graph data models, I saw attendees nodding, which I took as a sign of understanding and not that they were nodding off to sleep; it was still pretty early in the day for that.
One thing I hoped to get across in this part of the presentation was just that Linked Data is not all that hard to get into. Sure, it’s not a trivial technology, but barriers to entry are not that high; the basics of it are quite basic, so you can make a start and do plenty of useful things without having to know all the advanced stuff. For instance, there are a whole bunch of RDF serializations, but in fact you can get by with knowing only one. There are a zillion different ontologies, but again you only need to know the ontology you want to use, and you can do plenty of things without worrying about a formal ontology at all. I’d make the case for university eResearch agencies, software carpentry, and similar efforts, to be offering classes and basic support in this technology, especially in library and information science, and the humanities generally.
Linked Data as architecture
People often use the analogy of building, when talking about making software. We talk about a “build process”, “platforms”, and “architecture”, and so on. It’s not an exact analogy, but it is useful. Using that analogy, Linked Data provides a foundation that you can build a solid edifice on top of. If you skimp on the foundation, you may get started more quickly, but you will encounter problems later. If your project is small, and if it’s a temporary structure (a shack or bivouac), then architecture is not so important, and you can get away with skimping on foundations (and you probably should!), but the larger the project is (an office building), and the longer you want it to persist (a cathedral), the more valuable a good architecture will be. In the case of digital scholarly works, the common situation in academia is that weakly-architected works are being cranked out and published, but being hard to maintain, they tend to crumble away within a few years.
Crucially, a Linked Data dataset can capture the essence of what needs to be visualised, without being inextricably bound up with any particular genre of visualisation, or any particular visualisation software tool. This relative independence from specific tools is important because a dataset which is tied to a particular software platform needs to rely on the continued existence of that software, and experience shows that individual software packages come and go depressingly quickly. Often only a few years are enough for a software program to be “orphaned”, unavailable, obsolete, incompatible with the current software environment (e.g. requires Windows 95 or IE6), or even, in the case of software available online as a service, for it to completely disappear into thin air, if the service provider goes bust or shuts down the service for reasons of their own. In these cases you can suddenly realise you’ve been building your “scholarly output” on sand.
By contrast, a Linked Data dataset is standardised, and it’s readable with a variety of tools that support that standard. That provides you with a lot of options for how you could go on to visualise the data; that generic foundation gives you the possibility of building (and rebuilding) all kinds of different things on top of it.
Because of its generic nature and its openness to the Web, Linked Data technology has become a broad software ecosystem which already has a lot of people’s data riding on it; that kind of mass investment (a “bandwagon”, if you like) is insurance against it being wiped out by the whims or vicissitudes of individual businesses. That’s the major reason why a Linked Data dataset can be archived and stored long term with confidence.
Linked Open Data is about sharing your data for reuse
Finally, by publishing your dataset as Linked Open Data (independently of any visualisations you may have made of it), you are opening it up to reuse not only by yourself, but by others.
The graph model allows you to describe the meaning of the terms you’ve used (i.e. the analytical categories used in your data can themselves be described and categorised, because everything is a node in a graph). This means that other people can work out what your dataset actually means.
The use of URIs for identifiers means that others can easily cite your work and effectively contribute to your work by creating their own annotations on it. They don’t need to impinge on your work; their annotations can live somewhere else altogether and merely refer to nodes in your graph by those nodes’ identifiers (URIs). They can comment; they can add cross-references; they can assert equivalences to nodes in other graphs, elsewhere. Your scholarly work can break out of its box, to become part of an open web of knowledge that grows and ramifies and enriches us all.