TestLinks

TestLinks is a very simple Python program for checking whether or not hyperlinks in an (X)HTML document point to resources that can be accessed. It can test the links in a locally stored document, or a remote document accessible via http, ftp or gopher (using Python's urllib). It can test links to other documents, with absolute or relative addresses, but not links to other tags in the same document which have been given a name or id, or mailto links. It may support these in future. TestLinks caches the results of its tests, so if one document contains many links to the same URL, only on test will be run per instance of TestLinks. It is easy to get TestLinks to be completely silent until broken links are found, making it ideal to run as a cron job.

Requirements

TestLinks depends only upon Python. It is known to work with Python 2.4 and should work with any version that provides the htmllib and urllib modules. It should work on any operating system that Python runs on.

License

TestLinks is distributed under a standard 3-clause BSD license. It's as free as software gets.

Download

Download the latest version of TestLinks, TestLinks 0.1 (released September 8 2007).

Complete Instructions

Note: All the examples here assume you have renamed the file you've downloaded to testlinks and put in somewhere in your $PATH, set it executable with a chmod +x and that /usr/bin/env python will start a Python interpreter on your system (or that you've done the equivalent things on Windows).

Using TestLinks is exceedingly easy. The simplest way you can use it is like this:

$ testlinks http://www.luke.maurits.id.au./links.html

This will download my links page and check all the links in it. It should produce output similar to this:

links.html:31 Testing "http://www.luke.maurits.id.au/"... OK!
links.html:32 Testing "http://www.luke.maurits.id.au/software/"... OK!
links.html:33 Testing "http://www.luke.maurits.id.au/programming/"... OK!
links.html:34 Testing "http://www.luke.maurits.id.au/unix/"... OK!
links.html:35 Testing "http://www.luke.maurits.id.au/maths/"... OK!
links.html:36 Testing "http://www.luke.maurits.id.au/blog/"... OK!
links.html:37 Testing "http://www.luke.maurits.id.au/feeds.html"... OK!
links.html:38 Testing "http://www.luke.maurits.id.au/links.html"... OK!

...and so on, until all the links are done. If any bad links are found, you'll see output like this:

links.html:42 Testing "http://www.not.a.real.link.com"... BROKEN! <==---

All this output is good to let you know that TestLinks is actually working, but once you're sure that it is, you will probably want to run it in ``silent mode'', by using the -s option, i.e. run:

$ testlinks -s http://www.luke.maurits.id.au./links.html

With this option, there will be no output unless any broken links are found. This setting was designed with cron in mind. You can set a cron job to test the links on your page every hour or so and you'll get an email only when broken links are found.

That's pretty much it. The only other thing to note is that you can supply a list of several pages to check, like this:

testlinks page1.html page2.html page3.html

and TestLinks will test each page in turn. If it can't test one page for any reason, it will move to the next one on the list until each page has been attempted.

You can get TestLinks to output a brief summary of the above with the -h option:

$ testlinks.py -h 
TestLinks 0.1
Copyright Luke Maurits, 2007
See http://www.luke.maurits/software/testlinks/ for latest version

usage: testlinks [-s] [-h] file1 file2 file3...
-h prints this usage message.
-s specifies 'silent mode': no output if no broken links are found.