BUSINESS
Site Search Strategies
If you have spent much time using search engines on the Internet, you have probably experienced search results totaling thousands, maybe tens or even hundreds of thousands of results. You may have heard that the Internet is said to be the world's largest library, but when your results total thousands and thousands of results, you know you're not at the library!
And perhaps you've thought there must be a better way to look for information. Well, the good news is there is a better way, at least some of the time.
In these pages and continuing in the coming weeks, I'll talk about how to find information on the Internet. A major part of this discussion will revolve around search engines: how they work, what's wrong with them and some tips and pointers for using them more effectively. First, a brief introduction to search engines.
More than you ever wanted to know about search engines---Going to the library was never like this
Why is it important to understand search engines? Because unless you already know where to look for information on a certain topic, you have to rely on search engines.
What's wrong with search engines? How much time do you have? Just kidding! Actually I'm not kidding. We don't have enough time or space here to really answer that question. To say that search engines are flawed is to state the obvious. It's like saying that Chicago has the best pizza in the world; it is simply stating an indisputable fact! (Of course, the fact that I live in Chicago is totally irrelevant to this conclusion.)
How do search engines work? Here is an incredibly brief overview.
First some definitions. The term ``search engines" has been used to describe a large grouping of search tools, including two main categories: (1) Keyword search engines, which use special computer software programs to find web pages and (2) Subject directories which are created by real live human beings. While most people lump search engines such as Alta Vista, HotBot, Lycos, Yahoo all together, there are differences. Do you know which one of these doesn't belong with the others and why? The answer is Yahoo, because it is compiled by human beings, whereas the others are computer-based tools. We'll go into more detail about this later. For the rest of this section, when we refer to search engines, we are talking about the computer-based programs.
Computer-based search engines are made up of three interrelated components. The first component is a proprietary software program, called a spider or robot, which ``crawls" the Internet looking for websites. These robots go from hyperlink to hyperlink identifying, reading and analyzing the text of millions of pages that it has found.
These pages are indexed in a giant database, which is the second component. None of the web search engines index all of the Internet, and no two search engines have the exact same database. Some search engines index more pages than others. Also, since the Internet changes so rapidly, search engines continually ``crawl," looking for new and updated sites. Some search engines update their database more frequently than others.
The third component is the actual search engine software that is used to match, as best as it can, your search words against its database. The results of a search usually are listed in order of what is called relevancy ranking, with higher rankings listed first.
How a specific search engine calculates relevance is considered proprietary, but relevance is usually determined by where the term or phrase is found in a certain document: words appearing in titles, headlines and summaries are typically given more weight for deciding relevancy. Also, words repeated a number of times are weighted higher. Different search engines calculates relevance differently, which means that a search in different search engines will generate different results even if the search engine databases were identical (which they are not.)
This brief description, while obviously oversimplified, gives us the essentials for now. Earlier we compared the Internet to a huge library. While it's many times larger than even the largest libraries, in many ways it's not nearly as good. Here's why.
In a library, staff professionals review and evaluate books, periodicals and other media before deciding whether to purchase them for their library. In other words, you can't just walk into a library with a book you've written and expect the library staff to buy it and add it to their collection without a review process.
But that's exactly what happens on the Internet. The Internet has given anyone and everyone a forum to present their information to the world. Anybody can say or write anything about any topic. It doesn't matter whether it is good quality or not, whether it's fair or not, or whether it's true or not. Much of the content available on the Internet would never get past a library professional. And the search engines simply index everything they find without distinguishing between good and poor quality.
And to make it worse, many websites have found ways to trick the search engines into paying more attention to their sites, even if they are totally irrelevant to your information needs. Library professionals are not fooled this way.
There's still another problem with search engines, and this may be the most important one. They can't ``see" everything that's available on the Internet. Content that is: (1) stored in web-based databases, (2) password-protected, or (3) created in certain formats-such as Adobe Acrobat .pdf formatted documents-are hidden from conventional search engines. There isn't any deception by the websites; the search engines simply aren't sophisticated enough to see websites that have been created in these certain ways. Even though these hidden sites have been referred to as the ``Invisible Web," that's really incorrect because there are other tools that can find many of these hidden sites.
Not an encouraging picture, is it? To review, search engines combine the worst of two extremes: (1) They find too many useless and irrelevant sites, and, at the same time, (2) They miss some of the best, most valuable sites. There are hundreds of search engines available, (yes, hundreds) and if that isn't bad enough, they all work somewhat differently. A study last year by the NEC Research Institute, published in Nature July 1999 estimated that at best, only 16% of the Internet could be found using any one search engine. And a new study, released in late July, 2000 by a company called BrightPlanet, (see: http://news.cnet.com/news/0-1005-200-2356979.html)* says that the Web is 500 times larger than the databases created by conventional search engines, such as Alta Vista, Google, etc. *Note: This link works as of August 14, 2000, but depending upon when you click on it, the page may have been moved or deleted.
What does all of this mean to you, an information searcher or seeker? Well, it helps to explain why you may have trouble finding what you're looking for. You find a lot of useless sites because (1) there are a lot of useless sites, and (2) the search engines are not sophisticated enough to match their databases with your search request.
Fortunately, search engines do retrieve good information at least some of the time. And there are ways to use them more effectively. That's the focus of these tips as well as others that will follow.
This rest of this document provides a brief step-by step overview of the searching process, with more specific pointers to follow later. Learning and using these tips may seem time-consuming, but in the long run they will save you time and aggravation. No guarantees here, but try them. A warning here: even with these pointers and any others you may come across, you will still retrieve irrelevant results, but hopefully there will be fewer of them. And even with these pointers, you may still not find what you are looking for, but you are guaranteed frustration without them! So, let's get started.
I: Identify your information goal
When you're looking for information on a topic, it's tempting to go to one of the search engines, type in a word or two, and keep your fingers crossed that you will find what you're looking for. Unfortunately, the result all too often is that you end up with thousands and thousands of unusable, irrelevant hits. Before your next search, spend some time thinking about your topic-it could make a tremendous difference in the relevance of your results.
First, ask yourself what type of information are you looking for?
- Do you need an overview of a subject?
- Do you want to know detailed information on a certain topic?
- Do you need some specific facts?
- How important is it that the information be current? Sometimes recency is important, sometimes not
For example, let's say you have a website and you want information on attracting visitors to it. You can state your question as follows:
``What strategies can I use to promote my website?"
II. Identify keywords
Next, determine the main concepts in the question above and identify synonyms, alternate spellings or other forms of the search words.
Try to use unique or unusual keywords-nouns are better than verbs or adverbs or adjectives-to help narrow your search. A good guideline is to determine which of the ``5W's and H" (who, what, where, why, when and how) apply to your search and use them. Try to think of as many words as you can. You stand a better chance of finding relevant results if you use more, rather than fewer words, but avoid so-called ``stop words" (very common words such as: a, but, on, the, etc.) since search engines typically ignore them. Here are some possible keywords for the above topic:
KEYWORD 1 |
KEYWORD 2 |
KEYWORD 3 |
website |
promotion |
tips |
Internet |
promoting |
strategies |
online |
marketing |
tactics |
III. Create search expression
Create a search expression using what are known as boolean operators (such as AND, OR and NOT). We'll cover boolean terms in more detail later, including how various search engines use them, but for now, some quick basics. Using the term AND in a search between two or more words means that all terms must be found in the search results, whereas OR means that only one of the terms must be listed. Using AND will result in fewer search results than OR, so make sure you use the correct boolean terms for your specific search. OR is useful when trying to find synonyms for a certain term or expression.
One more thing: because some search engines require boolean terms to be capitalized, it's a good idea to always capitalize them-use AND, OR; don't use and, or. Instead of AND, OR, most search engines allow you to use a + in front of the word to indicate that the word must be present and a –to indicate that a word must not be present in a search.
Put quotation marks around phrases, such as: ``venture capital." Otherwise, if you type venture capital in a search engine with no quotes, you will get results like this: (1) in some search engines, both words will be present, but not necessarily next to each other and therefore irrelevant and (2) in other search engines, either word, but not necessarily both words, will be present and thus also useless.
Check your spelling since computer-based systems don't tolerate spelling errors. Use parentheses around common keyword groupings, since they help search engines determine the order in which the search will be processed. Usually, items in parentheses are processed first, and it is usually best to use your subject as the first groupings. Continuing with the above topic, here is a possible search expression using the keywords above:
(website OR Internet OR online) AND (promotion OR promoting OR marketing) AND (tips or strategies or tactics)
As a general rule, don't make your search expressions too complicated. Computerized search engines get confused easily!
IV. Choose Search Engine and enter your search terms
It pays to learn the idiosyncrasies of at least two to three search engines because, as I've said, all of them work somewhat differently. Locate and use the Help selection at the various search engines for assistance on creating effective search expressions. (I'll cover specific search engines in more detail in later weeks.) Again, this is going to take more time initially to learn these tools, but you should get better results in a shorter time.
Search engines can be classified a number of different ways, but probably the most common distinction is between key word search engines, such as Alta Vista, HotBot and Google, and subject directories, such as Yahoo. Subject directories are arranged in a hierarchy, starting with a general term and getting increasingly more specific as you drill down into a subject. For example, if you were looking for information on typing software, this is how your search might look in a subject directory.
First, you would start with a general heading of ``Computers." Once in that category, you would select: ``Software." Within the software category, you would choose ``Educational," and once there, you would select ``Foreign Languages." Finally, within the foreign languages category, you would choose ``French." So the hierarchy would look like this:
Computers: Software--->Educational--->Foreign Languages--->French
I will go into more specifics later, but again, an overview. Keyword search engines are indexed by computers, sometimes called robots, whereas subject directories are compiled by humans. Since computers can work faster than humans, keyword search engines are typically much larger than subject directories and are updated more frequently.
On the other hand, the human factor in subject directories generally means more high quality sites are included. Yahoo, the largest subject directory is an exception. Unlike many other directories, Yahoo's staff of editors does not critically evaluate or review a website's content before deciding whether to include it in its directory.
Finally, as a broad rule, general information goals are best handled with a subject directory, whereas keyword search engines are better for locating rather specific information on a topic. This is not a hard and fast rule and greatly depends on the specific topic you are researching.
The distinction between keyword engines and subject directories is increasingly blurring, as many subject directories now include keyword searching tools and vice versa.
There is also a third category of search engines, called meta search engines. They search a number of search engines at once. This may sound like a great time-saving solution, but, meta search engines have problems of their own. More details about these meta tools will come later. An example of a meta search engines is MetaCrawler.
Here are some of the most well-known search engines
Search Engine: |
URL |
Alta Vista: |
http://www.altavista.com/ |
Excite: |
http://www.excite.com/ |
Fast Search: |
http://www.ussc.alltheweb.com/ |
Google: |
http://www.google.com/ |
HotBot: |
http://www.hotbot.com/ |
Northern Light: |
http://www.northernlight.com/search.html |
Yahoo!: |
http://www.yahoo.com/ |
MetaCrawler: |
http://www.metacrawler.com |
V. View Results
As it turns out, the specific topic in our search example obtains good results from most keyword search engines and subject directories as well. We were lucky. Not all topics are this easily researched on the Internet, but then you probably already know that! Here's how to work with your results. If there are too many results, use more precise, specific words to help narrow and refine your search. If you find too few hits, drop the least important word or phrase to include more sites. Also, by looking at the descriptive summaries of the results you've found, you may find some words or phrases to help you refine your search expression. You may need to resubmit your search to increase your odds of finding the information you need.
VI. Critically Evaluate Results: Judge Information Quality
OK, let's say you have found some information from a web search that specifically addresses your information goal. Now you have to evaluate it for quality.
``Information" is everywhere on the Internet, but how good is most of it? Information is power only if it is accurate, reliable, and helps you take some sort of action. Differentiating between good and poor quality data is especially difficult on the Internet because unlike traditional media, anyone can publish anything on any topic. Properly evaluating information will be covered in more depth later, but some of the areas to consider include:
- Relevance: Does this site give me the type of information that I need?
- Identity: Can the author or web site source be identified? Is there a phone number? Physical address?
- Purpose: What are the motives for publishing this information: to inform, to sell, etc.?
- Date: Can you determine when the information was written? How recent is it? How recent does it need to be?
- Objectivity: Is the information biased or slanted or does it appear to be objective?
- Consistency: Is the information consistent with other available information on this subject?
VII. Try a different search engine
If the results of your search did not produce high-quality and authoritative information that addressed your information goal back in step I, then you need to keep searching. Remember, as we said before, no two search engines have the same exact database, and even if they did, you would still need to watch out for differences in using search phrases. As a result, the same search in different search engines will produce different results. For the best, most relevant information, learn about and use more than one search engine.
The end result
There you have it! A step-by-step review of the searching process. Did you find what you are looking for? Is it of high quality and likely to be accurate? Yes? Congratulations! That's great. Now try to remember what you did, so you can apply some of these tips to your next search. If you weren't successful, stay tuned. More specific pointers and tips are coming in the following weeks. Each of the main search steps above will be flushed out in greater detail.
But even with these tips and the ones that follow, you won't always be able to find high quality information on a topic relevant to your information needs. Why not? One of the most important reasons for this is something not often mentioned: Some of the best information sources are not even available on the so-called ``free Internet." For that you have to use the ``full strength," commercial information databases, (including Dialog, Lexis-Nexis and Dow Jones) which aggregate high quality information from a number of high quality sources. These databases contain (1) more information than you will find on the Internet and (2) information that in many cases is NOT even available on the so-called ``free Internet."
As you can see from the above tips, locating information is a skill which takes time to master. If it's important to your business that you find high-quality information for better decision-making, and you don't want to take the time to do it yourself, you may want to consider using the services of MJS Information Services, a firm specializing in locating critical company/industry/marketing information. Click here to find out more about MJS Information Services.

