Introducing “SiteTraverser”

SiteTraverser is a JavaScript class1 which you can use to create bookmarklets that crawl websites looking for the presence or lack of certain features.

Here’s an example that utilises SiteTraverser (give it a click!):

Click to traverse!

All it does is list the titles of some pages on this site. The code behind the bookmarklet is quite simple:

var s = document.createElement('script'),
    t = setInterval(function(){
 
        if ( window.SiteTraverser ) {
 
            // Script is loaded
 
            clearInterval(t);
            s.parentNode.removeChild(s);
 
            // Let's start traversing
            new SiteTraverser({
 
                // Just for this demo, we're specifying the URLs.
                // Normally, SiteTraverser would just crawl in all directions.
                urls: ['/foo/1.html','/foo/2.html','/foo/3.html'],
 
                check: function(source, url){
 
                    var titleMatch, title;
 
                    if ( titleMatch = source.match(/<title>(.+?)<//) ) {
                        return this.success(
                            "Title found: " + '"' + titleMatch[1] + '"',
                            "URL: " + url
                        );
                    }
 
                    return this.failure("No title :(");
 
                }
            }).go();
 
        }
 
    }, 100);
 
s.src = 'http://qd9.co.uk/projects/SiteTraverser/sitetraverser.js';
document.body.appendChild(s);

First, we load sitetraverser.js, and then we continue to instantiate (new SiteTraverser()) and run it (.go()).

More complex checks can be performed. For example, you could crawl a site looking for empty image sources (why you’d want to do this), or perhaps to look for unclosed tags, or instances of inline JavaScript or CSS. You could do a whole bunch of things actually, and it’s not just limited to string operations on the source; if you wanted, you can create a DOM structure from the source and run wild!

More information about SiteTraverser can be viewed on Github:

SiteTraverser on Github

1, I was quite reluctant to call it a “class”, since JavaScript doesn’t support classes as they’re commonly known. However, it appears to be the best-fit term in this situation.

Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!