Parsing URLs with the DOM!

19 Feb 2009

I hate the DOM! The API sucks! Don’t you agree?

Regardless, we should definitely take advantage of what we’ve been given. So, if something is built into the DOM it would be silly not to use it, right?

Well, that’s what I believe and that’s why I think it’s okay to parse URLs via this API instead of trying to accomplish it in a language-agnostic manner (using a tonne of expensive string operations).

This short function returns an object containing all possible information you would want to retrieve from a URL:

parseURL

// This function creates a new anchor element and uses location
// properties (inherent) to get the desired URL data. Some String
// operations are used (to normalize results across browsers).
 
function parseURL(url) {
    var a =  document.createElement('a');
    a.href = url;
    return {
        source: url,
        protocol: a.protocol.replace(':',''),
        host: a.hostname,
        port: a.port,
        query: a.search,
        params: (function(){
            var ret = {},
                seg = a.search.replace(/^?/,'').split('&'),
                len = seg.length, i = 0, s;
            for (;i<len;i++) {
                if (!seg[i]) { continue; }
                s = seg[i].split('=');
                ret[s[0]] = s[1];
            }
            return ret;
        })(),
        file: (a.pathname.match(//([^/?#]+)$/i) || [,''])[1],
        hash: a.hash.replace('#',''),
        path: a.pathname.replace(/^([^/])/,'/$1'),
        relative: (a.href.match(/tps?://[^/]+(.+)/) || [,''])[1],
        segments: a.pathname.replace(/^//,'').split('/')
    };
}

Usage

var myURL = parseURL('http://abc.com:8080/dir/index.html?id=255&m=hello#top');
 
myURL.file;     // = 'index.html'
myURL.hash;     // = 'top'
myURL.host;     // = 'abc.com'
myURL.query;    // = '?id=255&m=hello'
myURL.params;   // = Object = { id: 255, m: hello }
myURL.path;     // = '/dir/index.html'
myURL.segments; // = Array = ['dir', 'index.html']
myURL.port;     // = '8080'
myURL.protocol; // = 'http'
myURL.source;   // = 'http://abc.com:8080/dir/index.html?id=255&m=hello#top'

I’ve tested this solution in all modern browsers (including IE6) and it seems to work perfectly. If you spot any inconsistencies please let me know.

If you don’t feel comfortable using something which relies on the DOM then have a look at this although please note it’s about 12 times slower than the above solution…

Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!

So far there's been 16 Responses to
“Parsing URLs with the DOM!”

Vasili February 19th, 2009 at 10:35 pm

12 times slower!? Um yeah, I’ll be using yours if I need a URL parser. 🙂

Shane February 19th, 2009 at 10:47 pm

That is very nice indeed.

Paul Irish February 19th, 2009 at 11:25 pm

I’d be curious to see the speed difference between your clever approach and the regex-fu of parseUri: http://blog.stevenlevithan.com/archives/parseuri

Bill Beckelman February 20th, 2009 at 2:13 am

It looks nice and clean. Looking forward to trying it out.

Graham Bradley February 20th, 2009 at 10:57 am

This is one of those “why didn’t I think of that?” examples. Nicely done.

Joe McCann February 21st, 2009 at 7:02 am

Well done once again.

zhys9 February 22nd, 2009 at 11:57 am

cool
reprint url http://zhys9.com/blog/?p=104

James February 22nd, 2009 at 9:38 pm

Hey, now that is a really cool use of the DOM API… awesome work.

Mark Perkins February 24th, 2009 at 11:39 am

James –

Great work. Never occurred to me that you could use the parse a URL in this way, and I have to say it is a much neater implementation (and faster too, as you say!) than my plugin. Good stuff! Looks like it’s back to using someone elses code for my url parsing 🙂

@Paul Irish – I would imagine that the speed difference would be about 12 times, as Stephens parseUri function forms the core of my jQuery URL Parser plugin that James’ comparison references.

Elijah Grey February 25th, 2009 at 3:25 am

location.host !== location.hostname (location.host includes any non-normal ports) You should replace line 11 with this:

host: a.host,
hostname: a.hostname,

Steven L. February 25th, 2009 at 6:01 am

@Mark Perkins, re: @Paul Irish’s question, I suspect “jQuery URL Parser” adds performance overhead on top of parseUri, but in any case, I’m interested in the actual test behind James’s performance assertion. E.g., if the test is done in parseUri’s loose parsing mode, that’s apples to oranges because this function offers no “loose” mode, nor does it do any processing to extract user info from a URL, etc. I expect using the DOM like this to be faster, regardless, but we’re talking about the difference between two fast functions that are unlikely to ever present a real bottleneck.

I played around with this DOM-based approach a couple years ago based on a comment on my parseUri post, but I don’t think this approach can offer all the same functionality (strict vs. loose parsing, all parts being optional, etc.) in comparable code length. Note that this function will incorrectly (IMO) return the protocol, host, port, and path of the page running the script if those segments are not included in the URI passed to the function (e.g., with parseURL(“?q=v”)).

@James Padolsey, the regex used for the “relative” property currently only works with protocols that end with “tp” (e.g., not “https”).

Sanbor February 25th, 2009 at 7:22 pm

You’re a javascript rockstar ! 😛

James February 26th, 2009 at 8:53 am

Thanks for the comments. 🙂

Like almost everything I write about, this was just a bit of an experiment. Feature-wise it certainly does not match up to Steven’s work.

@Steven, thanks for the correction on the tp/tps thing. For my test I used a typical scenario – waiting to retrieve all the parts of a URL. It’s worth mentioning that I didn’t test against your original version; I tested against Mark’s jQuery port which was very slow in comparison to my above attempt. Although, like you said, the results are invalidated because of the inherent differences of the returned results.

@Mark, Oddly Steven’s original version is faster than yours because it returns all the information as properties while yours requires each bit of info to be initiated as a method in order to retrieve results. Doing this is probably adding overhead.

Paul Irish April 22nd, 2009 at 11:01 pm

I finally got around to benchmarking this against Steven’s parseUri.

Test code:

var start = +new Date, x = 2000;
while(--x){
  parseURL('http://abc.com:8080/dir/index.html?id=255&amp;m=hello#top');
}
console.log('parseURL: ',+new Date - start);
 
 
var start = +new Date, x = 2000;
while(--x){
  parseUri('http://abc.com:8080/dir/index.html?id=255&amp;m=hello#top');
}
console.log('parseUri: ',+new Date - start);

Results
Firefox 3:
parseURL: 541
parseUri: 157

IE 6:
parseURL: 781
parseUri: 312

Chrome 1:
parseURL: 498
parseUri: 242

So the regex is faster, but I’m still quite fond of the DOM approach. :-]

James April 22nd, 2009 at 11:22 pm

Interesting! Thanks for the tests Paul! 🙂

Supersha November 3rd, 2009 at 1:44 pm

Hi,James,I found sth error in your parseURL method,if uses your method in IE,the “file” property will return a empty string like that:”http://www.jsparadise.cn:8080/index.html?id=123&pages=4″.
so I change the RegExp like that:
“file: (a.pathname.match(//?([^/?#]+)$/i) || [,”])[1]”,it will solve this problem. ^_^

Parsing URLs with the DOM!

parseURL

Usage

So far there's been 16 Responses to “Parsing URLs with the DOM!”

So far there's been 16 Responses to
“Parsing URLs with the DOM!”