Tarantula is a Java Web crawler. Tarantula is Multithreading, Scalable, High Performance, Extensible and Polite and can be used to crawl and index any Web or Enterprise domain and is configurable thro... More
If you're running a Web site, then the last thing you want is to
have a broken link. Broken links look bad, frustrate users, and
confuse search engines. Even when links aren't broken, you can have
pages that contain bad HTML, or server-side programs that fail when
you enter data into them. If this is an important issue to you, then you should take a look at Tarantula,
a Rails plugin that executes a number of simple tasks against your
Web site, producing a detailed report (in HTML, of course)
describing the URLs that it crawled, and the responses it received
from each URL.