Names in the Museum

My last blog post described an experimental Linked Open Data service I created, underpinned by Museum Victoria’s collection API. Mainly, I described the LOD service’s general framework, and explained how it worked in terms of data flow.

To recap briefly, the LOD service receives a request from a browser and in turn translates that request into one or more requests to the Museum Victoria API, interprets the result in terms of the CIDOC CRM, and returns the result to the browser. The LOD service does not have any data storage of its own; it’s purely an intermediary or proxy, like one of those real-time interpreters at the United Nations. I call this technique a “Linked Data proxy”.

I have a couple more blog posts to write about the experience. In this post, I’m going to write about how the Linked Data proxy deals with the issue of naming the various things which the Museum’s database contains.

Using Uniform Resource Identifiers (URIs) as names

Names are a central issue in any Linked Data system; anything of interest must be named with an HTTP URI; every piece of information which is recorded about a thing is attached to this name, and crucially, because these names are HTTP URIs, they can (in fact in a Linked Data system, they must) also serve as a means to obtain information about the thing.

In a nutshell there are three main tasks the Linked Data proxy has to be able to perform:

  1. When it receives an HTTP request, it has to recognise the HTTP URI as an identifier that identifies a particular individual belonging to some general type: an artefact; a species; a manufacturing technique; etc.
  2. Having recognised as some sort of name, it has to be able to look up and retrieve information about the particular individual which it identifies.
  3. Having found some information about the named thing, it has to convert that information into RDF (the language of Linked Data), in the process converting any identifiers it has found into the kind of HTTP URIs it can recognise in future. A Linked Open Data client is going to want to use those identifiers to make further requests, so they have to match the kind of identifiers the LOD service can recognise (in step 1 above).

Recognising various HTTP URIs as identifiers for things in Museum Victoria’s collection

Let’s look at the task of recognising URIs as names first.

The Linked Data Proxy distinguishes between URIs that name different types of things by recognising different prefixes in the URIs. For instance, a URI beginning with http://conaltuohy.com/xproc-z/museum-victoria/resource/item/ will identity a particular item in the collection, whereas a URI beginning with http://conaltuohy.com/xproc-z/museum-victoria/resource/technique/ will identify some particular technique used in the manufacture of an item.

The four central entities of Museum Victoria’s API

The Museum Victoria API is organised around four main types of entity:

  • items
  • specimens
  • species
  • articles

The LOD service handles all four very similarly: since the MV API provides an identifier for every item, specimen, species, or article, the LOD service can generate a linked data identifier for each one just by sticking a prefix on the front. For example, the item which Museum Victoria identifies with the identifier items/1221868 can be identified with the Linked Data identifier http://conaltuohy.com/xproc-z/museum-victoria/resource/items/1221868 just by sticking http://conaltuohy.com/xproc-z/museum-victoria/resource/ in front of it, and a document about that item can be identified by http://conaltuohy.com/xproc-z/museum-victoria/data/items/1221868.

Secondary entities

So far so straightforward, but apart from these four main entity types, there are a number of things of interest which the Museum Victoria API deals with in a secondary way.

For example, the MV database includes information on how many of the artefacts in the collection were manufactured, in a field called technique. For instance, many ceramic items (e.g. teacups) in their collection were created from a mould, and have the value moulded in their technique field. The tricky thing here is that the techniques are not “first-class” entities like items. Instead, a technique is just a textual attribute of an item. This is a common enough situation in legacy data systems: the focus of the system is on what it sees as a “core” entity (a museum item in this case), which have their own identifiers and a bunch of properties hanging off them. Those properties are very much second-class citizens in the data model, and are often just textual labels. A number of items might share a common value for their technique field, but that common value is not stored anywhere except in the technique field of those items; it has no existence independent of those items.

In Linked Data systems, by contrast, such common values should be treated as first-class citizens, with their own identifiers, and with links that connect each technique to the items which were manufactured using that technique.

What is the LOD service to do? When expressing a technique as a URI, it can simply use the technique’s name itself (“moulded”) as part of the identifier, like so:

http://conaltuohy.com/xproc-z/museum-victoria/resource/technique/moulded

Then when the LOD service is responding to a request for a URI like the above, it can pull that prefix off and have the original world “moulded” back.

At this point the LOD service needs to be able to provide some information about the moulded technique. Because the technique is not a first-class object in the underlying collection database, there’s not much that can be said about it, apart from its name, obviously, which is “moulded”. All that the LOD service really knows about a given technique is that a certain set of items were produced using that technique, and it can retrieve that list using the Museum Victoria search API. The search API allows for searching by a number of different fields, including technique, so the Linked Data service can take the last component of the request URI it has received (“moulded”) and pass that to the search API, like so:

http://collections.museumvictoria.com.au/api/search/search?limit=100&technique=moulded

The result of the search is a list of items produced with the given technique, which the LOD service simply reformats into an RDF representation. As part of that conversion, the identifiers of the various moulded items in the results list (e.g. items/1280928) are converted into HTTP URIs simply by sticking the LOD service’s base URI on the front of them, e.g.

http://conaltuohy.com/xproc-z/museum-victoria/resource/items/1280928

External links

Tim Berners-Lee, the inventor of the World Wide Web, in an addendum to his “philosophical” post about Linked Data, suggested a “5-star” rating scheme for Linked Open Data, in which the fifth star requires that a dataset “link … to other people’s data to provide context”. Since the Museum Victoria data doesn’t include external links, it is tricky to earn this final star, but there is a way, based on MV’s use of the standard taxonomic naming system used in biology. Since many of MV’s items are biological specimens, we can use their taxonomic names to establish links to external sources which also use the same taxonomic names. For this example, I chose to link to biological data in Wikipedia, which, unknown to many people, also publishes a large dataset of Linked Open Data derived from the Wikipedia pages, including a lot of biological taxa. To establisha link to DBpedia, the LOD service takes the Museum’s taxonName field and inserts it into a SPARQL query, which it sends to DBpedia, essentially asking “do you have anything on file which has this binomial name?

select distinct ?species where {
?species dbp:binomial "{MV's taxon name goes here}"@en}

The result of the query is either a “no” or it’s a link to the species in Wikipedia’s database, which the LOD service can then republish.

coming up…

My next post in the series will look at some issues of how the Museum’s data relates to the CIDOC CRM model; where it matches neatly, and where it’s either more, or less specific than the CRM.

One thought on “Names in the Museum”

Make a comment