I hate the DOM! The API sucks! Don’t you agree?
Regardless, we should definitely take advantage of what we’ve been given. So, if something is built into the DOM it would be silly not to use it, right?
Well, that’s what I believe and that’s why I think it’s okay to parse URLs via this API instead of trying to accomplish it in a language-agnostic manner (using a tonne of expensive string operations).
This short function returns an object containing all possible information you would want to retrieve from a URL:
parseURL
// This function creates a new anchor element and uses location // properties (inherent) to get the desired URL data. Some String // operations are used (to normalize results across browsers). function parseURL(url) { var a = document.createElement('a'); a.href = url; return { source: url, protocol: a.protocol.replace(':',''), host: a.hostname, port: a.port, query: a.search, params: (function(){ var ret = {}, seg = a.search.replace(/^?/,'').split('&'), len = seg.length, i = 0, s; for (;i<len;i++) { if (!seg[i]) { continue; } s = seg[i].split('='); ret[s[0]] = s[1]; } return ret; })(), file: (a.pathname.match(//([^/?#]+)$/i) || [,''])[1], hash: a.hash.replace('#',''), path: a.pathname.replace(/^([^/])/,'/$1'), relative: (a.href.match(/tps?://[^/]+(.+)/) || [,''])[1], segments: a.pathname.replace(/^//,'').split('/') }; } |
Usage
var myURL = parseURL('http://abc.com:8080/dir/index.html?id=255&m=hello#top'); myURL.file; // = 'index.html' myURL.hash; // = 'top' myURL.host; // = 'abc.com' myURL.query; // = '?id=255&m=hello' myURL.params; // = Object = { id: 255, m: hello } myURL.path; // = '/dir/index.html' myURL.segments; // = Array = ['dir', 'index.html'] myURL.port; // = '8080' myURL.protocol; // = 'http' myURL.source; // = 'http://abc.com:8080/dir/index.html?id=255&m=hello#top' |
I’ve tested this solution in all modern browsers (including IE6) and it seems to work perfectly. If you spot any inconsistencies please let me know.
If you don’t feel comfortable using something which relies on the DOM then have a look at this although please note it’s about 12 times slower than the above solution…
Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!
12 times slower!? Um yeah, I’ll be using yours if I need a URL parser. 🙂
That is very nice indeed.
I’d be curious to see the speed difference between your clever approach and the regex-fu of parseUri: http://blog.stevenlevithan.com/archives/parseuri
It looks nice and clean. Looking forward to trying it out.
This is one of those “why didn’t I think of that?” examples. Nicely done.
Well done once again.
cool
reprint url http://zhys9.com/blog/?p=104
Hey, now that is a really cool use of the DOM API… awesome work.
James –
Great work. Never occurred to me that you could use the parse a URL in this way, and I have to say it is a much neater implementation (and faster too, as you say!) than my plugin. Good stuff! Looks like it’s back to using someone elses code for my url parsing 🙂
@Paul Irish – I would imagine that the speed difference would be about 12 times, as Stephens parseUri function forms the core of my jQuery URL Parser plugin that James’ comparison references.
location.host !== location.hostname (location.host includes any non-normal ports) You should replace line 11 with this:
@Mark Perkins, re: @Paul Irish’s question, I suspect “jQuery URL Parser” adds performance overhead on top of parseUri, but in any case, I’m interested in the actual test behind James’s performance assertion. E.g., if the test is done in parseUri’s loose parsing mode, that’s apples to oranges because this function offers no “loose” mode, nor does it do any processing to extract user info from a URL, etc. I expect using the DOM like this to be faster, regardless, but we’re talking about the difference between two fast functions that are unlikely to ever present a real bottleneck.
I played around with this DOM-based approach a couple years ago based on a comment on my parseUri post, but I don’t think this approach can offer all the same functionality (strict vs. loose parsing, all parts being optional, etc.) in comparable code length. Note that this function will incorrectly (IMO) return the protocol, host, port, and path of the page running the script if those segments are not included in the URI passed to the function (e.g., with parseURL(“?q=v”)).
@James Padolsey, the regex used for the “relative” property currently only works with protocols that end with “tp” (e.g., not “https”).
You’re a javascript rockstar ! 😛
Thanks for the comments. 🙂
Like almost everything I write about, this was just a bit of an experiment. Feature-wise it certainly does not match up to Steven’s work.
@Steven, thanks for the correction on the tp/tps thing. For my test I used a typical scenario – waiting to retrieve all the parts of a URL. It’s worth mentioning that I didn’t test against your original version; I tested against Mark’s jQuery port which was very slow in comparison to my above attempt. Although, like you said, the results are invalidated because of the inherent differences of the returned results.
@Mark, Oddly Steven’s original version is faster than yours because it returns all the information as properties while yours requires each bit of info to be initiated as a method in order to retrieve results. Doing this is probably adding overhead.
I finally got around to benchmarking this against Steven’s parseUri.
Test code:
Results
Firefox 3:
parseURL: 541
parseUri: 157
IE 6:
parseURL: 781
parseUri: 312
Chrome 1:
parseURL: 498
parseUri: 242
So the regex is faster, but I’m still quite fond of the DOM approach. :-]
Interesting! Thanks for the tests Paul! 🙂
Hi,James,I found sth error in your parseURL method,if uses your method in IE,the “file” property will return a empty string like that:”http://www.jsparadise.cn:8080/index.html?id=123&pages=4″.
so I change the RegExp like that:
“file: (a.pathname.match(//?([^/?#]+)$/i) || [,”])[1]”,it will solve this problem. ^_^