A while ago I posted a ‘highlight’ script that could be used to highlight certain matches within a document. It uses a regular expression to replace the innerHTML
property of the specified container. Since then, because of this comment and various other things I’ve read, I’ve come to realize that it’s just not a solid solution and doesn’t cut it for realistically complicated websites.
The only viable solution is to progressively walk the DOM tree, and only stop for text nodes (nodeType = 3
), and then apply the conventional ‘replace’ to each of those nodes.
UPDATE! This post is quite old (2009!) and I’ve since blogged a bit more definitively on the topic. Please read: “Replacing text in the DOM… solved?“.
The process is as follows:
- Loop through child nodes of target node (container).
- On each iteration, check that it’s a text node; if it’s not then call the function again with the encountered node specified as the new ‘searchNode’ (the process begins again). If it is a text node then continue.
- Check for the match (‘searchText’) – if a match exists then replace all occurances with the return value of the ‘replacement’ callback. If a match does not exist then continue on to the next node.
- The resulting string, with HTML in it, is injected, via innerHTML, into a newly created DIV element.
- Each child of the DIV element is then added, one by one, to a document fragment.
- The document fragment is inserted before the current node which is subsequently removed.
- The loop continues, until all child nodes of the ‘searchNode’ have been searched.
Here’s the function itself (download here):
function findAndReplace(searchText, replacement, searchNode) { if (!searchText || typeof replacement === 'undefined') { // Throw error here if you want... return; } var regex = typeof searchText === 'string' ? new RegExp(searchText, 'g') : searchText, childNodes = (searchNode || document.body).childNodes, cnLength = childNodes.length, excludes = 'html,head,style,title,link,meta,script,object,iframe'; while (cnLength--) { var currentNode = childNodes[cnLength]; if (currentNode.nodeType === 1 && (excludes + ',').indexOf(currentNode.nodeName.toLowerCase() + ',') === -1) { arguments.callee(searchText, replacement, currentNode); } if (currentNode.nodeType !== 3 || !regex.test(currentNode.data) ) { continue; } var parent = currentNode.parentNode, frag = (function(){ var html = currentNode.data.replace(regex, replacement), wrap = document.createElement('div'), frag = document.createDocumentFragment(); wrap.innerHTML = html; while (wrap.firstChild) { frag.appendChild(wrap.firstChild); } return frag; })(); parent.insertBefore(frag, currentNode); parent.removeChild(currentNode); } } |
No library or framework is required to use this function, it’s entirely stand-alone. The function requires two parameters, the third one is optional:
searchText
– This can either be a string or a regular expression. Either way, it will eventually become aRegExp
object. So, if you wanted to search for the word “and” then that alone would not be appropriate – all words that contain “and” would be matched so you need to use either the string,\band\b
or the regular expression,/bandb/g
to test for word boundaries. (remember the global flag)replacement
– This parameter will be directly passed to theString.replace
function, so you can either have a string replacement (using $1, $2, $3 etc. for backreferences) or a function.searchNode
– This parameter is mainly for internal usage but you can, if you so desire, specify the node under which the search will take place. By default it’s set todocument.body
.
A typical example would be when highlighting search keywords, here’s how that would work:
// Just an example: var searchMatch = document.referrer.match(/[?&]q=([^&]+)/), searchTerm = searchMatch && searchMatch[1]; if (searchTerm) { findAndReplace('\b' + searchTerm + '\b', function(term){ return '<span class="keyword">' + term + '</span>'; }); } |
As I said, a string can be passed as the second parameter and you can use ‘$1, $2 etc.’ for backreferences:
findAndReplace('(microsoft|apple|sony)', '<a href="http://$1.com">$1</a>'); |
You’ll notice that within the function there’s an ‘excludes’ string that contains a comma-seperated list of node-names to exclude from all searches. You can add and take away from this list as needed.
Porting over to MooTools or jQuery is quite pointless because neither library offers anything in the way of text node traversal, but feel free to wrap it all up in the respective namespace.
One notable limitation is that the function cannot search for text nested between seperate nodes, for example, searching for “pineapple” in the following HTML would not work:
We ate mango, pine<strong>apple</strong> and passion fruit!! |
I’ve tried to find ways around this but it seems a lost cause.
Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!
I’m not sure why (and it may only be chrome), but occasionally it misses some if you search for two terms. Try searching for ‘text’, then search for ‘in’. Most instances of ‘text’ and ‘in’ are highlighted. But try hitting Go! again… now the ‘in’ inside ‘within’ is highlighted where it wasn’t before.
Very nice and useful script though, good work, thanks for sharing.
If you search for “high” it highlights ‘high’ in ‘highlight’
Then if you search for “highlighted” nothing is highlighted.
(I’m using firefox)
And then I just read the second to last sentence in your article… Good Work!
Doesn’t work in IE7 unfortunatly 🙁
nice work
one limitation i found is that it is case sensitive, this is easy regex fix. unless this was your intention.
Nice script, but it frequently destroys white spaces in the text block it is acting on