I have deployed a publicly available service to provide access in bulk to newspaper articles from Papers Past — the National Library of New Zealand’s online collection of historical newspapers — via the DigitalNZ API.
The service allows access to newspaper articles in bulk (up to a maximum of 5000 articles), using OAI-PMH harvesting software. To gain access to the collection, point your OAI-PMH harvester to the repository with this URI:
If you’re looking for a good harvester, let me recommend jOAI.
You can harvest records that match a search. Provide your search query as an OAI-PMH
set, for example to search for “titokowaru”, specify
search:titokowaru as the value of the OAI-PMH
You can harvest records (i.e. articles) in one of three different formats:
html— this format returns the full text of the articles, and is likely to be the most useful format. Note that the text available through DigitalNZ has had punctuation and capitalization removed.
oai_dc— a simple metadata record.
digitalnz— straightforwardly based on DigitalNZ’s own metadata format.