NSW State Archive Record Series without Index

I want to download the metadata for record series:
NRS-15318 | Files relating to licences for theatres and public halls

There is a link to the 7222 records which displays as a list of up to 50 per page.

I would really like to download the complete list as a CSV of all available metadata. Is it possible to generate an index?

I am a newby to all this so any advice appreciated.



It’s not an easy one. The Primo system that NSWSA now uses is not very friendly to scraping data. You can get data for individual items as XML once you know their id, but I can’t figure out how to get a list of results. The interface uses a Javascript framework to load the search results, so the results aren’t ‘in’ the HTML of the page. It’s so annoying (the new Trove interface does this too…).

I think the only way around this would be to use something like Selenium which mimics a web browser and would allow the Javascript details to load, or leverage Zotero’s Primo translator – that might at least get 50 results at a time…

I’ll keep thinking about it!

Thanks Tim, Glad to know its a hard one for you too. If it had been easy I would have kicked myself that I didn’t work it out. Still frustrating. Oh well I’ll just try downloading 50 at a time and clean as I go.