How search engines work

Search Engines are tools that allow you to search the Internet using keywords or phrases. They may all look like they work the same but they actually "search" the Internet quite differently. If we first place the search engines in a hierarchy, and then learn how they search the Internet, we can make decisions about which search tool to use. Take a look at the categories below and read through the descriptions. It will help explain why you get different results from different search engines but also help you search the web more effectively and more successfully.

You can put search tools into 5 different hierarchical categories:

Directories

Search tools like Yahoo, Magellan and Look Smart are actually web directories. When you search a directory like Yahoo, you are searching a database of select web sites - not the entire World Wide Web (WWW). Yahoo builds its index by either selecting sites to include or reviewing web sites that are submitted to them for inclusion in their index. Every site is assigned a category (subject) in the directory. You can use Yahoo's search function to do a keyword search or you can click on the appropriate category or subject and move through the directory's hierarchy. You'll notice that when you perform a search in Yahoo and come up with no hits it moves you to the Altavista search engine as the next resource. Yahoo or Look Smart are great places to start if you think that what you are looking for is out there on the WWW. An example would be the Budget of the United States or President Clinton's Inaugural Address. Yahoo is a great resource for government and education web sites.

Search Engines

Search engines like Infoseek, Webcrawler and Lycos build their indexes through the use of software "robots" or "spiders" that crawl around the web indexing and cataloging web site content. Each "robot" is designed differently and behaves differently. That's why a search on Infoseek and a search on Lycos will yield different results. For the most part these robots look for words in the titles, descriptions and "meta tags" (keywords) that a web site producer assigns to a web page. The number of times a key word appears and where that key word appears give rise to the relevancy scoring that we see (the percentage number that appears in your search results). So one robot may assign greater relevancy to key words in the description whereas another may assign relevancy to the number of times a word appears. This would explain why a search on Lycos may yield a relevancy of 100% but on Infoseek it shows up at 90%.

Super Engines

This group of search engines utilizes robots and spiders as well, but there is a big difference. In addition to indexing keywords from the titles, descriptions and meta tags, these robots actually index key words from the text on the pages themselves. These search engines, like Hot Bot, Altavista, Excite and Open Text give you "hits" on keywords that are much deeper into the content of the web page. Relevancy scoring again differs for each search engine. So a search for information on Bill Gates gives you a much higher number of hits within these search engines. When your search results includes the WWW Online Fence Company, you'll see that there are many references to "gates" and "bill". In fact, there may be so many, that this page will score a higher relevance than the "Official Bill Gates Information Page". Commercial producers of web pages are extremely savvy in their positioning tactics. Their jobs are to get their pages seen and they utilize their knowledge of robots to make their web site pop up in the top 5 or ten during a key word search.

Meta Search Engines or Multiple Engine Search

Another class of search engine is the meta search or multiple engine search. These are search engines that allow you to perform a keyword search in many search engines at the same time. So you can use Dogpile, Cyber411, Savvy Search or Metacrawler to search Yahoo, Infoseek, Altavista and many others at the same time. Speed, number, and presentation of results vary. Most will give you the ten top hits from each of the search engines (or allow you to set the number and presentation of your search results). This can be a quick way to survey the web for content on a specific topic or to save you the time from figuring out which search engine might be the best to use for a particular search.

Special Search Engines

There are many search engines out on the web that search for all kinds of content by keyword. There is also a growing number of specialized search engines that are subject specific. An example would be DejaNews, a search engine that indexes the content of the Usenet newsgroups. Another, Infospace, has an index that includes addresses and phone numbers for anyone that is listed in a US phone book. FTP Search searches contents of FTP Archive sites. The number of specialty search engines is growing rapidly and you'll notice that some of the meta search engines (like DogPile) have responded by including them in their list of engines that are searched.


Next page >>