newspapers – Conal Tuohy's blog http://conaltuohy.com The blog of a digital humanities software developer Wed, 28 Jun 2017 23:15:33 +0000 en-AU hourly 1 https://wordpress.org/?v=5.1.10 http://conaltuohy.com/blog/wp-content/uploads/2017/01/conal-avatar-with-hat-150x150.jpg newspapers – Conal Tuohy's blog http://conaltuohy.com 32 32 74724268 Public OAI-PMH repository for Papers Past http://conaltuohy.com/blog/public-oai-pmh-repository-for-papers-past/ http://conaltuohy.com/blog/public-oai-pmh-repository-for-papers-past/#comments Mon, 25 May 2015 05:44:09 +0000 http://conaltuohy.com/?p=233 Continue reading Public OAI-PMH repository for Papers Past]]> I have deployed a publicly available service to provide access in bulk to newspaper articles from Papers Past — the National Library of New Zealand’s online collection of historical newspapers — via the DigitalNZ API.

The service allows access to newspaper articles in bulk (up to a maximum of 5000 articles), using OAI-PMH harvesting software. To gain access to the collection, point your OAI-PMH harvester to the repository with this URI:

https://papers-past-oai-pmh.herokuapp.com/

If you’re looking for a good harvester, let me recommend jOAI.

Searching

You can harvest records that match a search. Provide your search query as an OAI-PMH set, for example to search for “titokowaru”, specify search:titokowaru as the value of the OAI-PMH set parameter:

https://papers-past-oai-pmh.herokuapp.com/?verb=ListRecords&metadataPrefix=oai_dc&set=search:titokowaru

Formats available

You can harvest records (i.e. articles) in one of three different formats:

  • html — this format returns the full text of the articles, and is likely to be the most useful format. Note that the text available through DigitalNZ has had punctuation and capitalization removed.
  • oai_dc — a simple metadata record.
  • digitalnz — straightforwardly based on DigitalNZ’s own metadata format.

https://papers-past-oai-pmh.herokuapp.com/?verb=ListMetadataFormats

Happy harvesting!

]]>
http://conaltuohy.com/blog/public-oai-pmh-repository-for-papers-past/feed/ 1 233