Searching Relationships Between Enterprise Websites Using Graph Based Web Crawling
Price
Free (open access)
Volume
42
Pages
9
Page Range
61 - 69
Published
2009
Size
660 kb
Paper DOI
10.2495/DATA090071
Copyright
WIT Press
Author(s)
R. C. F. De Souza, G. M. Caputo & N. F. F. Ebecken
Abstract
The objective of this paper is to find explicit web relationships using enterprise websites as seeds. We apply a web crawler to find these relationships in a hierarchy starting from the given seed using the external links to construct a Jaccard Score weighted tree. The proposed methodology aims to search related enterprises from the root node based on the link, which are potential partners, suppliers, clients, etc. We crawl the whole site to find external links using the Breadth First Search (BSF) algorithm and build a tree structure containing just the interesting external links. The applied algorithms were programmed with very simple computational components and may produce interesting results to analyze the domain of sites, their structure, and how they link with each other in their acting range. Keywords: link analysis, BSF algorithm, web crawling.
Keywords
link analysis, BSF algorithm, web crawling.