A Novel Focused Crawler

A.C. Tsoi and D. Forsali and M. Gori and M. Hagenbuchner and F. Scarselli

A focused crawler may be described as a crawler which returns relevant web pages on a given topic in traversing the web. There are a number of issues related to existing focused crawlers, in particular the ability to ``tunnel'' through lowly ranked pages to highly ranked pages related to the same topic which might occur further on the search path. In this paper, we will introduce a novel focused crawler, which is described by two parameters, viz., degree of relatedness, and depth. The degree of relatedness concept allows the consideration of pages which are not necessarily highly ranked, and thus, providing the opportunity for the crawler to ``tunnel'' through lowly ranked pages. Depth is the number of hops or distance from the seed page to the current page. It is shown experimentally that the precision and recall of relevant web pages to a particular topic are governed by these two parameters in our proposed focused crawler. It is shown also experimentally that high precision is related to small values of depth, and degree of relatedness.