Explore Trove's digitised newspapers by place - Tim Sherratt

I’ve updated my map displaying places where Trove digitised newspapers were published or distributed. You can view all the places on single map – zoom in for more markers, and click on a marker for title details and a link back to Trove.

If you want to find newspapers from a particular area, just click on a location using this map to view the 10 closest titles.

You can view or download the dataset used to construct the map. Place names were extracted from the newspaper titles using the Geoscience Gazetteer.


This is a companion discussion topic for the original entry at https://updates.timsherratt.org/2022/09/05/explore-troves-digitised.html

Hi,

Trying to run the trove harvester in Python - getting this error
Traceback (most recent call last):
File “”, line 198, in _run_module_as_main
File “”, line 88, in run_code
File "C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Scripts\troveharvester.exe_main
.py", line 7, in
File “C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Lib\site-packages\trove_newspaper_harvester\cli.py”, line 211, in main
start_harvest(
File “C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Lib\site-packages\trove_newspaper_harvester\cli.py”, line 80, in start_harvest
harvester.harvest()
File “C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Lib\site-packages\trove_newspaper_harvester\core.py”, line 193, in harvest
self.delete_cache()
File “C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Lib\site-packages\trove_newspaper_harvester\core.py”, line 134, in delete_cache
Path(cache_name).unlink(missing_ok=True)
File “C:\Users\pjthm\AppData\Local\Programs\Python\Python311\Lib\pathlib.py”, line 1147, in unlink
os.unlink(self)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘data-20231114044135.sqlite’

Its as if the python script not relinquishing the file - any ideas?

Hi Peter, not sure if you have solved this or not, I just had the same issue. I’m guessing the caching process is holding onto the temporary sqlite db but I have not dug into it to find a proper fix. A quick and dirty hack that worked for me to get it running is to disable the delete_cache(self): in the core.py file.
It means you will have a stray SQLite db file hanging around after run but you can delete that yourself.

def delete_cache(self):
#cache_name = f"{‘-’.join(self.harvest_dir.parts)}.sqlite"
#Path(cache_name).unlink(missing_ok=True)

Hi Peter, sorry just noticed this. I’ve added the details to a GitHub issue in the Trove Newspaper Harvester repository: Permission error when deleting cache · Issue #7 · wragge/trove-newspaper-harvester · GitHub Seems to be a Windows thing as I can’t replicate myself, but I’ll investigate further.