Comments on: How to download bulk newspaper articles from Papers Past http://conaltuohy.com/blog/how-to-download-bulk-newspaper-articles-from-papers-past/ The blog of a digital humanities software developer Fri, 10 Feb 2017 14:41:44 +0000 hourly 1 https://wordpress.org/?v=5.1.10 By: Emerson Vandy http://conaltuohy.com/blog/how-to-download-bulk-newspaper-articles-from-papers-past/#comment-4537 Mon, 22 Sep 2014 21:58:02 +0000 http://conaltuohy.com/blog/?p=104#comment-4537 Hi Paul + Conal. We have tested the OAI capability of our instance of Veridian. We’re keen to expose data, but we need to do it well because there’s interplay between a few things across different domains. These range from fundamental partnership restrictions in how we can make some content available, to answering system overhead questions, to getting organisational buy-in to the premise that this is a thing we need to do. Clearly, none of these are insurmountable. However there’s also the fact that we’re a relatively small group and we need to be a bit mindful of how we resource this. For now, we’re happy with having data services delivered at the current level by DigitalNZ, but that’s not to suggest that we don’t want to extend our data services in the future. We think xml-api’s are good things, and I can tell you that we are also investigating options for making datasets available for download.

]]>
By: Conal http://conaltuohy.com/blog/how-to-download-bulk-newspaper-articles-from-papers-past/#comment-4450 Sun, 21 Sep 2014 12:46:49 +0000 http://conaltuohy.com/blog/?p=104#comment-4450 Good questions Paul! It obviously would be great to be able to harvest the native data format (METS ALTO I believe) via an official OAI-PMH service! What we have at the moment is only the Papers Past text after it has been digested by DigitalNZ. The text made available through DigitalNZ is normalised to remove punctuation and capitalisation, etc. because it is intended to serve as the input to a search index. Similarly the terms and conditions are generic DigitalNZ terms and conditions – not specific to Papers Past.

]]>
By: Paul http://conaltuohy.com/blog/how-to-download-bulk-newspaper-articles-from-papers-past/#comment-4448 Sun, 21 Sep 2014 11:25:14 +0000 http://conaltuohy.com/blog/?p=104#comment-4448 Great work – but why does the Papers Past service not have Native OAI-PMH service enabled – I understand it is a native part of the Veridan platform and no doubt it will be a key feature of any new Papers past service – along with text correction. And why is the Papers past data not available for open download insteat of restrictive DNZ rules – its almost all out of copyright so why restrict?

]]>
By: Andy Neale http://conaltuohy.com/blog/how-to-download-bulk-newspaper-articles-from-papers-past/#comment-4351 Fri, 19 Sep 2014 00:45:42 +0000 http://conaltuohy.com/blog/?p=104#comment-4351 It’s great to see this being explored! I thought it also important to note that the DigitalNZ API was not designed to be a front-end to an OAI-PMH service, so apologies that we don’t have all the feature to make this work as well as you would have liked. We do hope to provide download dump services in the future. In the meantime it is also important to understand the terms of use for the DigitalNZ API do not allow for users to permanently keep the data. The data must be refreshed every 30 days so as to support our license agreement with partners. Together with the query limit of 10,000 queries a day it means that you will be able to use this to maintain a subset of the data. If you have any questions about the using the DigitalNZ API take a look at http://www.digitalnz.org/developers

]]>