If you’ve read any prior articles or tips regarding online newspaper research, then you know that the quality of the scanned newspaper as well as the OCR process dictate the quality of the search index. You need to realize that you are searching for combinations of letters, not words. And when you are using those letter combinations as your search criteria, you essentially are trying to match that search criteria against that index.
Here’s a news flash – the search index for older newspapers, especially, may not be very good.
Here’s an example of a 115-year-old newspaper, with the original first, followed by the search index:
There are words missing and several of the words are “misspelled.” It is charitable to assign a “50% correct” value to this representation of the original to the search index.
The thing that many newspaper researchers forget and why they get frustrated is that they think that they are searching against an index that is an EXACT (or near-exact) replica of the original newspaper article. Many researchers quit or get discouraged searching newspapers primarily because of three reasons:
This feeling they have that the search index is an exact replica or near an exact replica (say 90% of words represented correctly) is so stuck in their brain, that when the index is only 50% or less of the exact words in the original, that they just can’t handle this difference. It just becomes “too hard for them to deal with.”
The search criteria that they create is not very detailed and is simplified, such as searching for just a person’s surname. They may get too many results in the case of a common surname, or not enough or none for a more complex surname. And they don’t put in a date range or a first name, or other distinguishing words that would help. So, in this case, their lack of training or lack of desire to learn results in failure.
They can’t find what they are looking for because the information is not available because the dates for an event or a person are not available in the newspaper collection. For example, if you are looking for an event or an event in a person’s life that happened say between 1915 and 1935, and the newspaper collection that is online does not have any newspapers for that date range, guess what the search results will be? Not very many if any.
Let’s look at this from an emotional or attitudinal perspective. Some of these emotions or attitudes are as follows:
- This stuff is in a database so it should be easy to find.
- Why should I spend time learning how to search newspapers – it can’t be that hard.
- Everything is online isn’t it, so how come I can’t find anything?
- Why can’t I just put the name of the person or event in the box and have the system give me the results I want?