HoR and Senate debates

I found historic hansard resources going up to 1980 in the GLAM Workbench. I’m trying to gather parliamentary debates of a more recent vintage (2002-2003), and am wondering whether anyone has already done the work of processing those.

I assume I can adapt the scraper script to gather the XML files and then parse those, but if I can avoid that step, I’d like to.

Any pointers appreciated!

I checked up on this again recently (because I couldn’t quite remember why I stopped at 1980 in the original harvest!). My memory was that the XML format changed, but when I checked, I found that the raw XML files for 1981 to 1998 are just not available from ParlInfo. I don’t know why this is.

I’ve had a quick look and it seems that there is XML for 2002-3 – yay! I’ve spun up the harvesting notebook and adjusted a few things so that it should work with dates after 1980. If you have a look you’ll see there’s now a cell where you can set the START_YEAR and END_YEAR. That’s the only thing you should need to change. I successfully harvested 2002. Let me know if you have any problems.

Also, it’s worth noting that Open Australia have harvested Commonwealth Hansard XML from 2006 onwards, and it’s available through their data repository.


Or, if you don’t want to harvest it yourself, give me a few days and I’ll see if I can add 1998-2005 to the repository… :wink:

I’ve now harvested Hansard XML for 1998 to 2005 and added it to the respository. I’ve also confirmed that there’s no XML for 1981 to 1997 available from ParlInfo at present. I’ve asked the folks at the Parliamentary Library why this might be.

I’ve also combined the list of sitting days from 1901 to 2005 into one CSV file.

More details of the harvesting process are available from the GLAM Workbench.

Enjoy! :crazy_face:

