SiteTraverser is a JavaScript class1 which you can use to create bookmarklets that crawl websites looking for the presence or lack of certain features.
Here’s an example that utilises SiteTraverser (give it a click!):
All it does is list the titles of some pages on this site. The code behind the bookmarklet is quite simple:
var s = document.createElement('script'), t = setInterval(function(){ if ( window.SiteTraverser ) { // Script is loaded clearInterval(t); s.parentNode.removeChild(s); // Let's start traversing new SiteTraverser({ // Just for this demo, we're specifying the URLs. // Normally, SiteTraverser would just crawl in all directions. urls: ['/foo/1.html','/foo/2.html','/foo/3.html'], check: function(source, url){ var titleMatch, title; if ( titleMatch = source.match(/<title>(.+?)<//) ) { return this.success( "Title found: " + '"' + titleMatch[1] + '"', "URL: " + url ); } return this.failure("No title :("); } }).go(); } }, 100); s.src = 'http://qd9.co.uk/projects/SiteTraverser/sitetraverser.js'; document.body.appendChild(s); |
First, we load sitetraverser.js, and then we continue to instantiate (new SiteTraverser()
) and run it (.go()
).
More complex checks can be performed. For example, you could crawl a site looking for empty image sources (why you’d want to do this), or perhaps to look for unclosed tags, or instances of inline JavaScript or CSS. You could do a whole bunch of things actually, and it’s not just limited to string operations on the source; if you wanted, you can create a DOM structure from the source and run wild!
More information about SiteTraverser can be viewed on Github:
1, I was quite reluctant to call it a “class”, since JavaScript doesn’t support classes as they’re commonly known. However, it appears to be the best-fit term in this situation.
Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!
It’s perfect!! But why the “go” method run in the constructor? maybe it looks a little good.
Most excellent.
My common use case for something like this is to help identify all of the A tags with only a # href.
As its very common for developers to use those with the faulty expectation of fixing later. 🙂
Very nice!
Drag&drop doesn’t work on Mac OS X since CTRL + clic is just like a right clic on mac.
@Supersha, thanks. The
go()
method is there to give you control over when the crawling begins. In some situations you’ll want to instantiate early but not start until a bit later. Plus, it allows you to goinstance.go().stop().go()
😀@Paul, let me know if you end up using it. 🙂
@mat, Ahh, forgot about Macs. I’ve changed it a bit, so it no longer uses CTRL to initiate dragging – you can now click anywhere on the top or edge of the box to drag it, without needing to press any keys.