If you’ve read any prior articles or tips regarding online newspaper research, then you know that the quality of the scanned newspaper as well as the OCR process dictate the quality of the search index. You need to realize that you are searching for combinations of letters, not words. And when you are using those letter combinations as your search criteria, you essentially are trying to match that search criteria against that index.
Here’s a news flash – the search index for older newspapers, especially, may not be very good.
Here’s an example of a 115-year-old newspaper, with the original first, followed by the search index:
There are words missing and several of the words are “misspelled.” It is charitable to assign a “50% correct” value to this representation of the original to the search index.
The thing that many newspaper researchers forget and why they get frustrated is that they think that they are searching against an index that is an EXACT (or near-exact) replica of the original newspaper article. Many researchers quit or get discouraged searching newspapers primarily because of three reasons:
- This feeling they have that the search index is an exact replica or near an exact replica (say 90% of words represented correctly) is so stuck in their brain, that when the index is only 50% or less of the exact words in the original, that they just can’t handle this difference. It just becomes “too hard for them to deal with.”
- The search criteria that they create is not very detailed and is simplified, such as searching for just a person’s surname. They may get too many results in the case of a common surname, or not enough or none for a more complex surname. And they don’t put in a date range or a first name, or other distinguishing words that would help. So, in this case, their lack of training or lack of desire to learn results in failure.
- They can’t find what they are looking for because the information is not available because the dates for an event or a person are not available in the newspaper collection. For example, if you are looking for an event or an event in a person’s life that happened say between 1915 and 1935, and the newspaper collection that is online does not have any newspapers for that date range, guess what the search results will be? Not very many if any.
Let’s look at this from an emotional or attitudinal perspective. Some of these emotions or attitudes are as follows:
- This stuff is in a database so it should be easy to find.
- Why should I spend time learning how to search newspapers – it can’t be that hard.
- Everything is online isn’t it, so how come I can’t find anything?
- Why can’t I just put the name of the person or event in the box and have the system give me the results I want?
Here’s the deal and you may not like what I am going to say, but here goes:
If you can’t find what you are looking for, then ask yourself these questions – “Have I really tried to learn about successful search techniques, or am I just winging it? Do I have unreal expectations of the software vendor? Am I searching really old newspapers where the quality is most assuredly sub-optimal? Have I really tried to overcome the likely less than optimal search index by trying the many search tips that are available for me to learn about?”
You see the problem is more than likely ATTITUDE. To be successful as a newspaper researcher, you must be DETERMINED. You must LEARN and apply what you have learned to create better search criteria.
You know why this ATTITUDE is important? Because more than likely the SEARCH INDEX is sub-optimal and what I mean by that is it likely does not fit your expectations. And that more than likely is NOT the software vendor’s fault.
So, what do you do? Get determined to OUTSMART the search index. Read about different search techniques and tips and view tutorials. Apply what you have learned with vigor. Then and only then will you have as much success as you can get when searching old newspapers.
Researching old newspapers online is a battle of wits. Positive results are found when you outwit the index.