One of the most interesting things I’ve been doing recently has been working with Nick Thieberger, a linguist at Melbourne University, on “Digital Daisy Bates“, a Digital Humanities project based on the pioneering lexicographical research of Daisy Bates. In the early 20th century, Bates spent many years living in outback Australia, researching the languages and cultures of indigenous Australians, and produced dozens of lexicons of indigenous languages, using a common questionnaire. In our project, we are using modern digital methods to analyze all these lexicons. The first step of the project was to digitize and transcribe the questionnaires, so we can then crunch up the digital data and extract knowledge from it.
I will post more on the project later, but in this post I will restrict myself to describing some of the mechanics of the first stage of that data crunching.