Can search engines reach all of the pages on the web?
The content of the Internet is constantly changing. Search engines continually crawl the web indexing publicly available pages. New pages are added and old pages are deleted or updated everyday. Just when a new or revised webpage will show up in a search engine varies from system to system. There may be a time gap of a few minutes to months. While the commercial search engines update their indexes regularly to reflect the new pages they find, no search engine claims to visit all of the pages available on the web. (See "Search Engine Sizes" by Danny Sullivan, http://searchenginewatch.com/reports/article.php/2156481#current .)
![]() |
Billions of Textual Documents Indexed ( As of Sept 2, 2003 ) Additionally, many pages are hidden from search engines. These pages, named collectively the 'hidden or invisible' web, might be generated on demand (assembled by database query), or published on password-protected systems. Additionally, some pages are intentionally tagged for robotic exclusion by their authors. This means that page authors enter special HTML robot exclusion codes that tell the 'crawlers' of search engines to skip a page and leave it out of the search engine index. Additionally PDF and multimedia files are not indexed by all search engines. © SearchEngineWatch.com 2003 |
KEY: GG=Google, ATW=AllTheWeb, INK=Inktomi, TMA=Teoma, AV=AltaVista.
How many pages are beyond my reach when using one of the popular search engines?
Estimates of hidden web content vary widely. Bright Planet estimated the hidden web to be up to 550 times the size of the public web. A more conservative guess would be from 50 million to 100 billion pages are on the hidden web. To learn more about this topic see the IMSA Micro Module: The Invisible Web.
Why would anyone want to search all of the pages on the web?
Consider the importance of a comprehensive search if you are checking for plagiarism, citation verification, or uncommon or unusual topics. The more comprehensive your sources of data, the better your research. The more pages of information you search, the more likely it is you will find crucial information about your topic. There are no guarantees, but when it comes to searching, the larger the database of relevant pages, the more likely it is that you'll get solid responses to your queries.
Authored by Dennis O'Connor 2003-2004