Click Here to get
more OMIX Technical Whitepapers

An overview of what search engines do, how they work, and what site owners can do to get listed.

Website owners are often interested in having their site "rank" higher in the search engines: that is, when someone searches for keywords that relate to their business, they want their site to be listed at the top of the search results. With careful page design and coding, and enrollment in pay-for-performance services this can be achieved. As more and more businesses go online, all of which want to be at the top of the search engine listings, this is becoming increasingly challenging and expensive.

KEY: YH=Yahoo, MSN=MSN, GG=Google, AOL=AOL, AJ=Ask Jeeves, IS=InfoSpace; AV=AltaVista, OVR=Overture, LS=LookSmart, NS=Netscape, DH=Direct Hit; DP=Dogpile. GRAPH: Nielsen/NetRatings
What is a Search Engine?

Strictly speaking, a search engine is a searchable index of all available web pages. However, since the number of web pages is in the hundreds of millions (and growing by the minute), the reality is that no search engine can index them all, and the same search in different engines can yield quite different results.

Furthermore, some search engines are really "directories", lists of website categories as opposed to a searchable index. Yahoo ( and the Open Directory ( are both "directories" that rely on people to compile and organize their listings. To further complicate matters, most of the search engines contain both search engine and directory information, though they tend to emphasize one type of result over the other.

There are hundreds of search engines and directories, but only the 17 shown below really matter.

Of those, seven are responsible for more than 90% of all searches (indicated with an asterisk).

AllTheWeb (FAST Search) Looksmart
AltaVista * Lycos *
AOL Search * MSN Search *
Ask Jeeves Netscape Search
Direct Hit Northern Light
Google * Open Directory
HotBot Overture *
Iwon Yahoo *

Crawler-Based Search Engines

Crawler-based search engines, such as AltaVista, create their listings using software (sometimes termed a "spider") that automatically indexes web pages by following links on homepages or links between sites, and then creating an index of URLs and the text at each URL (only HTML text is indexed, not text shown as graphics). That means that when a page's URL changes, the search engine won't know it until its software returns to the site and re-indexes it. That can take weeks or months.

Human-Powered Directories

A human-powered directory, such as Yahoo, depends on people to create its listings. You submit a brief site description to the directory for your entire site, and the editors review it, decide whether or not it deserves a listing, and then place it in a category in the directory. When a Yahoo user runs a search, the directory looks for matches found in the submitted descriptions. Results are listed alphabetically by site title, with the most popular sites shown first, separately, and a few sponsored listings at the very top.

Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

Hybrid Search Engines

At one time, a search engine either presented crawler-based results or human-powered listings. It is now common for both types of results to be presented. Much of the time a hybrid search engine will favor one type of listing over the other. Yahoo first presents its own directory listings, and then may give crawler-based results from Google.


When you enter a word or phrase into a search engine and request a listing of web pages, what you get back is a list of URLs that contain some or all of those words. A good search engine will give a higher place on the list ("ranking") to those pages that contain all the words, or that mention the words more than once.

This does not mean that the more times the words are present on the page the higher it will rank. Search engines are smarter than that. Programmers "tune" the software to reject such attempts at "spamming" the search engines to get a higher ranking.

Search engines use proprietary, and jealously guarded formulas, or "algorithms", to create their indexes in an attempt to produce good quality results for their users. Each search engine uses a different algorithm, which can change at any time. This is one reason why the same search on different engines produces quite different results.

It's In The Text

One of the main rules in a ranking algorithm involves the location and frequency of the words on a web page. Search engines take into account:

  • Words in the page title (the words shown at the very top of the browser window),
  • Description and keyword "meta" tags (not visible to the user) the visible HTML text on the page, how high up in the page the words are located,
  • And, in some cases even the "alt" tags (hidden text descriptions of images).

What is important to keep in mind is that each search engine weights these different web page text areas differently, and the weightings may change over time.

It's Not All In The Text

Crawler-based search engines know that many web developers constantly rewrite their web pages in an attempt to improve their rankings. Because of this, all major search engines now also make use of "off the page" ranking criteria that developers cannot readily control.

The principal criteria used are link analysis, which was pioneered by Google. By analyzing how web pages link to each other, a search engine can get a sense of what a page is about as well as whether that page is deemed to be "important" and thus deserving of a high ranking. In addition, sophisticated techniques are used to block attempts by developers to build "artificial" links designed to increase their rankings.

Another off the page factor is click-through measurement. This is when a search engine measures the results their users select for a particular search, then drop high-ranking pages that aren't attracting clicks while at the same time promoting lower-ranking pages that generate clicks.


"Search engine submission" refers to the act of getting your web site listed with search engines. Another term for this is search engine registration. Getting listed does not mean that you will necessarily rank well for particular terms, however. It simply means that the search engine knows your pages exist.

Increasingly, search engines are charging money for a site to get listed in a reasonable time frame. The web has grown so huge that it takes months for the search engines to index new submissions. Paying a small fee means that your site will be listed in a more reasonable time frame. However, it does not guarantee any sort of ranking or position in the search results!

Search Engine Positioning

Everyone wants their site to rank at the top of the listings for particular keywords. But remember: no one can "guarantee" a particular ranking on any search engine unless that engine allows you to pay for specific keywords and their ranking position. And even then the costs will vary over time as other businesses bid for those keywords.

"Optimization" vs. Positioning

Terms such as "search engine placement," "search engine positioning" and "search engine ranking" refer to a site actually doing well for particular terms or for a range of terms at search engines. "Search engine optimization" means designing a web page so that when it is indexed, certain keywords are noticed by the search engine.

Optimizing a web page for the search engines means careful consideration of the text on the page, which keywords are contained in the text and how many times they are repeated, the text position on the page relative to the graphical elements of the design, the page title, and the "hidden" page text: the meta description, meta keywords, and even the alt tags.

Doing well with search engines is not just about submitting properly, optimizing well or getting a good rank for a particular term. It's about the overall job of improving how your site interacts with search engines, so that the audience you seek can find you.

It is important to keep in mind that achieving and maintaining high search engine rankings is an ongoing process. Your competitors are making the same effort in their attempts to be number one. Regular monitoring will alert you to changes in your rankings, and corrective action can be initiated.


Search Engine How It Works Notes FAST Technology – Proprietary Indexing technology Large index of the Web, has good wireless compatible indexes and good multimedia.
AltaVista AltaVista Crawler
Overture Index - Pay Listings
One of oldest on Web, has news, shopping and media search. Google give a referral link to AltaVista. Was owned by Digital, now owned by CMGI.
AOL Search

Open Directory – Main Results

Google Crawler – replacing Inktomi Summer 02

Huge member base who search this engine for AOL content as well as Web content. is the external version and does not list AOL content.
Ask Jeeves

ASK Editors - Ask links from own editors, front end accepts "natural language" questions.

Teoma Index - Numbered listings from Teoma (formerly Direct Hit)

Has referral link at Teoma, Netscape and Yahoo.
Direct Hit
(Now Teoma)
DogPile Meta Crawler – Searches Multiple Search Engines This engine searches search engines. Sometimes this is called an all-in-one search engine.
Excite Excite Crawler - Main results
Overture Directory - Directory listings
Main results at WebCrawler, has referral link at Google.
FAST Fast Technology - FAST proprietary technology. Resells to other engines. Main results at AllTheWeb as well as some of the results at Lycos. Some investment relationships between these three companies. Launched May 1999.
(owned by Lycos)
Teoma Index - Main results
Inktomi Crawler - Secondary results
Open Directory – for Directory Listings
Has referral links at Netscape and Yahoo. Launched in May 1996, purchased by Lycos in October 1998.
Inktomi Inktomi Crawler - Inktomi technology Come from UC Berkeley internal tool. Powers other search engines and tuned to each one. There is no way to access or tune to Inktomi except through its partner search engines.
Responsible for main results at iWon, Secondary results at
AOL, HotBot, GoTo, Overture, MSN. Formerly was engine behind Yahoo.
iWon Inktomi Crawler - Main Results
Overture Directory - Directory Listings (pay for clicks)
Teoma Index - Top 10 results
Google Google Crawler - Main Results, proprietary
Open Directory - Directory Listings
Google has crawled the most pages. Its crawler looks for other sites that are linking to your site to weight its search results. Secondary results at Netscape, Yahoo. Has a good image search capability.
(See Overture)
LookSmart LookSmart Editors – Main Results
LookSmart Pay for Clicks – Main Results from advertisers
Inktomi Crawler – Secondary Results
Used to have a single registration fee that would register you with many popular sites. Now is moving to a Pay for Clicks model similar to Overture. Referral links at Netscape. Main results at MSN Search Directory listings at
AltaVista, Excite, iWon. Referral link at Netscape.
Lycos Open Directory - Some main results
FAST Technology - Some main results
Teoma Index – for Some "Popular" results
One of the early engines started out with a Crawler then moved to a directory model like Yahoo. Has referral links at Teoma, HotBot, Google and Netscape. Owns HotBot.
MSN Search
(owned by Microsoft)
MSN Editors - Main results
Overture Pay for Clicks – Also Main Results
Inktomi Crawler – Secondary Results
Teoma Technology - Top 10 Results
Internet Explorer uses MSN as its default search engine.
Netscape Search
(owned by AOL)
Open Directory - Main Results
Netscape Technology – Also Main Results from its database.
Google Crawler - Secondary Results
Netscape users Netscape as its default search engine. Good at listing "official" Web sites.
Northern Light Northern Light Crawler - Main Results  
Open Directory
(owned by AOL)
Open Directory Editors - Main Results This is a free service (see Uses volunteer editors to catalog the Web. Main results at
AOL Search, Netscape Search, Lycos
Directory listings at
Direct Hit, HotBot, Google
Overture (formerly GoTo) GoTo Pay for Clicks - Main Results from Advertisers
Inktomi Crawler - Secondary Results
Many major search engines list advertisers who pay Overture to be listed at a high ranking. Has referral link on Netscape.
Teoma Teoma Index – has own database
Open Directory – Directory Listings
Formerly called DirectHit. Owned by Ask Jeeves. Numbered listings at Ask Jeeves, Some "Popular" results at Lycos
Top 10 results at iWon, MSN Search
Yahoo Yahoo Editors - Main Results
Google Crawler - Secondary Results
Most popular Web search service. Uses humans to create a directory of the Web. Has lots of portal services ala AOL. Google referral link.
Web Crawler Excite Crawler – Main Results Owned by Excite


Search Engine Watch:



For More Information Call:

OMIX, Inc.
Drop a line to subscribe to the OMIX Newsletter.
It covers this and other Web technology topics.