intelligent internet agents

Breadth first or Depth First.
August 10, 2009, 8:53 pm
Filed under: crawl, strategy | Tags: , ,

Unless you can cache pages, a breadth first is more expensive that depth first. In a dynamic page environment it is more expensive to do a breadth first search (which you would normally do in a pure html environment). In the dynamic environment you must always be in the context of the page that contains the link.

A breadth first will require a permanent rollback to the current position in the FIFO list of the breadth first search tree.

Root -> Link a - current page root
Root -> Link b - current page root
Root -> link c - current page root
Click Link a
Link a -> link d - current page a
Link a -> link e - current page a
Link a -> link f - current page a
Goto Root
Click link b

The deeper the tree get the more rollback needs to be done – from root to wherever the current tree position is.


Leave a Comment so far
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: