But we are all hampered by the quality of the OCR process. This is not always the fault of the OCR software – quality of the original image, either paper or microfilm, coupled with the quality of the scan itself can lead to less than desirable results.
But there is hope. An emerging capability is being added to the online arsenal – and that is text correction by registered online users.
This “crowdsourcing” of fixes to the OCR output can be quite valuable to improving the indexes that are searched by the newspaper research software. I first became aware of this several years ago when I starting using the California Digital Newspaper Collection housed by the University of California, Riverside. Now when I use it, if I see an obvious error in the OCR output, I correct it – hence “paying it forward” for the next user. This software was created by Veridian Software from New Zealand. It is used by several large collections, as well as many other sites. FYI – Verdian Software is also the creator of Elephind. Elephind is the site where you can search multiple newspaper collections at once for newspapers from around the world.
There are a growing number of sites now who offer this crowdsourcing text correction feature, and not all are Veridian customers. This portends to help us even more in our newspaper research. I hope more sites begin to offer this capability.
After writing this article. I came across an article where the author, Rose Holley, lists the sites that she is aware of that offer this exciting new capability. I won’t copy her list so I offer the link:
Her list is from March, 2013. I suspect that there are others that offer this capability now. Thanks to Rose, a digital library specialist from Australia for all she has done to promote this great addition to our research toolkit. She is a pioneer in using crowdsourcing for libraries and archives.