Crawl frontier

From Wikipedia, the free encyclopedia

A crawl frontier is a data structure used for storage of URLs eligible for crawling and supporting such operations as adding URLs and selecting for crawl. Sometimes it can be seen as a priority queue. ^[1]

Architecture of a Web crawler

A crawl frontier is one of the components that make up the architecture of a web crawler. The crawl frontier contains the logic and policies that a crawler follows when visiting websites. This activity is known as crawling.

The policies can include such things as which pages should be visited next, the priorities for each page to be searched, and how often the page is to be visited.^{[citation needed]} The efficiency of the crawl frontier is especially important since one of the characteristics of the Web that make web crawling a challenge is that it contains such a large volume of data, which is constantly changing.^[2]

Architecture

Components

References

Related Articles

Wikiwand AI