There are a growing number of non-English newspapers digitised in Trove. However, if you’re only searching using English keywords, you might never know that they’re there. I thought it would be useful to generate a list of non-English newspapers, but it wasn’t quite as straightforward as I thought.
I started by trying to make use of existing language metadata, but this was inconsistent and incomplete. In the end I ran some language detection code over a sample of articles from each digitised newspaper in Trove. After sorting through the results, I ended up with a list of 48 newspapers with non-English content.
The full code and documentation is available in the Trove newspapers section of the GLAM Workbench.