Proxying: a trick to easily add features to existing websites and applications

At the start of last month I attended the LODLAM (Linked Open Data in Libraries, Archives and Museums) Summit in Sydney, in the lovely Mitchell Library of the State Library of New South Wales.

The Summit is organised as an “un-conference”. There is no pre-defined agenda; it’s organised by the participants themselves at the start of the day. It makes it a very participatory event; your brain is in top gear the whole time and everything is so interesting you end up feeling a bit stunned at the end of the day.

One of the features of the Summit was a series of very brief talks (“speedos”) on a variety of topics. At the last minute I decided I’d contribute a quick rant on a particular hobby-horse of mine: the value of using proxies to build web applications, Linked Open Data, and so on. It’s a bit of a technical point, and perhaps lost on some of the attendees, but I got some good feedback from some people that I think may translate into some paid work, so it was worth the effort for that at least.

One of the problems many institutions have with Linked Open Data is that they are stuck with legacy software systems for collection management and web publication; systems that don’t know about Linked Open Data; can’t publish LOD, and can’t leverage LOD even if they could publish it. Generally these institutions aren’t in a position to modify their software systems: they are commercial, closed-source systems, or if they are open source, the institutions don’t have the technical expertise in house to modify them, and don’t want to bear the cost of maintaining a customised version of their software, or of leading an open source community on a major development. All these issues make for a serious road-block standing in the way of institutions taking their first steps into practical LODLAM, and this is a shame, because there is a much simpler way to implement LODLAM, through a special magic trick known only to software developers: the use of a proxy.

A “proxy” is a piece of software that lives in the cracks between other pieces of software, and acts as an intermediary between them. When a web browser makes a request to a web server, that request, and the response the web server makes, can and often do pass through one or more web proxies, without the user being any the wiser.

Why would you do this? There are a lot of reasons. For example, many websites position a proxy in front of their real website to improve performance: the proxy keeps a cache of popular web pages, images, etc, that it’s received from the web server, and when it receives a request from a web browser for a resource it already has in its cache, it can return that resource directly, without asking the web server again; this takes some of the load off the web server and can improve performance, making the website appear more responsive to the end user.

But proxies can do many other things; in my speedo talk I mention a project of mine in which a proxy converts a couple of web APIs into a metadata harvesting protocol, but this is just one application of a general idea which is to transform the data received from a web server into a different format. In particular, one transformation would be to take a web page produced by a CMS, and transform the content into a form with embedded Linked Data; another would be to transform the page into a page with embedded Javascript code which in turn requests Linked Data to enhance the page in arbitrary ways; adding timelines, graphics, links, and so on.

What makes the proxy pattern such a powerful technique is that it doesn’t require making changes to your underlying system; you can develop a proxy entirely independent of the system it sits in front of. You can experiment and develop without worrying you are breaking anything. You don’t even need to own or control that system, or even to understand how it works; you only need to be able to interpret the web pages it produces.

When your proxy is working to your satisfaction, you can move it so that it sits between your website and the rest of the internet, so that all requests to your website go through the proxy. But if you ever decide to return your website to its old behaviour, you just take the proxy away, and browsers will once again be communicating directly with your web server.

If you’re interested in trying out this technique, feel free to contact me; you may be surprised at how easily you can add LOD-based functionality to your legacy website.

Make a comment