Class Crawler
In: lib/analyzer_tools/crawl.rb
Parent: Object

A fast web crawler that stays on the site it started from. Crawler randomly picks a URL from the page retrieved and follows it. If can‘t find a URL for the next page, Crawler starts over from the beginning.

Crawler is multi-threaded and can run as many threads as you choose.

Methods

do_request   extract_url_from   new   run   stop   time   timed_request  

Attributes

times  [R]  Array of response times in seconds.

Public Class methods

Creates a new Crawler that will start at start_url and run threads concurrent threads.

Public Instance methods

Performs a request of url and returns the request body.

Returns a random URL on the same site as original_url from body using original_url to resolve relative paths. If no URL valid is found then the start URL is returned.

Begins crawling.

Stops crawling.

Returns the amount of time taken to execute the given block.

Performs a request of url and records the time taken into times. Returns the body of the request.

[Validate]