Trove Newspaper & Gazette Harvester updated to version 0.3.3

At the request of a user, I’ve updated the Newspaper & Gazette Harvester to include the snippet field in the results (where available). The ‘snippet’ is the little text preview you see in search results. Note, however, that some categories of article, such as ‘Advertising’ don’t have snippets.

If you haven’t used the Harvester before, it makes it easy to download large quantities of digitised articles from Trove’s newspapers and gazettes. Just give it a search from the Trove web interface, and the harvester will save the metadata of all the articles in a CSV (spreadsheet) file for further analysis. You can also save the full text of every article, as well as copies of the articles as JPG images, and even PDFs. While the web interface will only show you the first 2,000 results matching your search, the Newspaper & Gazette Harvester will get everything .

To give it a spin, head to the Trove Harvester section of the GLAM Workbench where you can use it in your browser without installing any software.

Screen recording of Trove Newspaper Harvester

The harvester code is available on GitHub and PyPi.

1 Like