Answers Search Help
Boston University home page
Google: Optimizing Your Site
 
 
    Search Engines
 
 
 
 
 
    Good Content
 
 
 
 
 
    Good Code
 
 
 
 
 
 
 
    Bad Practices
 
 
 
    Additional Benefits
 
    Supplementary
 
 
 
    Sample Pages
 

How Search Engines Work

In a nutshell, search engines use spiders to crawl the World Wide Web and collect information. Then the search engine builds the index, encodes the index data, and stores the data.

Step 1: Crawl the Web

Before a search engine can tell you where to find a page, it has to find the page itself. Search engines use special software robots called spiders that travel the Web to find pages. A spider will start crawling a popular site and then follow all the links on that site to record other sites. And so on and so on.

Step 2: Collect Information

The spider records the words found on each page and where those words were found on the page. Some search engines use spiders that record every word on a page; other engines record only the important words on a page, ignoring common words such as "a," "and," and "the." The spider may pay more attention to words stored in certain locations on a page, such as the page's title.

Step 3: Build Index

Once the robots find information, the search engine must store the information in such a way that it can be found easily. A database index is similar to an index at the back of a book: A book index contains information taken from pages and pointers (page numbers) to the original sources of the information. If an index has been built well, users will be able to find pages quickly.

Different search engines build their indexes differently. The different indexing methods are one of the reasons why the same search may yield different results using different engines. Some possible considerations for building an index include:

  • The number of times an important word appears on a page.
  • Where on a page a word appears.
  • Whether a word is capitalized or not.
  • The number of times a page is linked to from other pages.
  • The importance of other pages from which a page is linked to.

Step 4: Encode Data

Before the index information is stored in a database, the search engine encodes the data to reduce the size of the database and to speed up the search engine's response time.

Step 5: Store Data

The final step in the process is store the search index in a database.

 

WebCentral Using Publishing Learning Training Consulting WebCentral
Answers Search Help
NIS  |  OIT  |  Boston University  |   October 24, 2002