Regular Expressions in JavaScript

About a month ago I decided to begin on the long and tiresome journey of learning regular expressions. I even bought the book! So, in this post I’m going to share some of the awesome things I’ve learnt so far on my "journey".

The first thing to note is that I’m no way near the end of my journey and I’m still very much a novice in this area, so if an expert happens to stumble across this post I would very much appreciate some light-hearted critique! If you’re a novice like me I hope you can gain something from my ramblings.

Unrelated: I’d like to attribute the top speech bubble in the image to the right to XKCD. It’s from this comic!

Defining Regular Expressions in JavaScript

Most modern programming languages have support for regex (regular expressions) but I’ll be focusing on JavaScript’s implementation because that’s what I’m best at! It doesn’t really matter though because the typical regex notation varies little between implementations (As far as I know).

Like everything in JavaScript you can either create a regex pattern by using literal notation or by calling the constructor function of the ‘RegExp‘ object:

// Using a regex literal:
var myRegexPattern = /Regular expression goes here.../;
// Calling upon the object:
var myRegexPattern = new RegExp('Regular expression goes here...');

More info on the two methods can be found here: developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions

The first method (using a regex literal) is faster than the second but it compiles your pattern when the script is evaluated as opposed to the second method (Calling the object constructor) which compiles your pattern at runtime. So, if you wanted to include some variable data (perhaps from user input) in the pattern then using the object constructor is your only option. e.g:

// Concatenating a string within a regex pattern is only
// possible if you use the RegExp object constructor:
var whatever = 'expression';
var myRegexPattern = new RegExp('Regular ' + whatever + ' goes here...');

The Regular Expression notation

Like everything there is always more than one way to do things with regular expressions. As a simple example, let’s assume we need to test a string to confirm it is in the format of a phone number. Phone numbers vary greatly across continents so let’s use the typical UK mobile number as an example: Mobile phone numbers in the UK will usually be in the following format (d = digit): ddddddddddd. If we were writing a regular expression pattern for this format it would be like this: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]. We can test if it matches a typical mobile phone number by using the regex test() method in JavaScript:

var testPhoneNumber = '07738273772'; // Typical UK mobile phone number
var testPattern = /[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/;
 
// Now we alert out the result of the test:
alert( testPattern.test(testPhoneNumber) ); // This will alert 'true'!

Don’t worry, there are shortcuts available in regex, and in this instance they’re called "shorthand character classes". Instead of writing ‘[0-9]’ we can just use ‘d’ (again, ‘d’ stands for digit). So, now our pattern looks like this: ddddddddddd. One thing to consider before we go any further: mobile phone numbers in the UK always start with 0 and the second digit is always a 7. We can replace the first two shorthand character classes in our pattern with literal characters. So, now our pattern is: 07ddddddddd. You can repeat the above test with the new pattern and it will still alert ‘true':

var testPhoneNumber = '07738273772'; // Typical UK mobile phone number
var testPattern = /07ddddddddd/;
 
// Now we alert out the result of the test:
alert( testPattern.test(testPhoneNumber) ); // This will alert 'true'!

Putting all those shorthand character classes in a row seems a bit bloated! Fortunately there’s a way to shorten the pattern by specifying the amount of times to repeat a particular character or character-class: 07d{8} – as you can see we’ve only included one ‘d’ character class within the pattern, but we’ve followed it with an 8 wrapped in curly braces. This means the preceeding character (in this case, a shorthand character class) will be repeated 8 times. Let’s try it all in one quick test:

alert( /07d{9}/.test('07738273772') ); // This will alert 'true'!
 
// It will work with other numbers too:
alert( /07d{9}/.test('07456566544') ); // This will alert 'true'!
alert( /07d{9}/.test('07000011112') ); // This will alert 'true'!
alert( /07d{9}/.test('07777333432') ); // This will alert 'true'!
alert( /07d{9}/.test('07029300022') ); // This will alert 'true'!
 
// These will all fail:
alert( /07d{9}/.test('0 7 4 5 6 5 66 54 4') ); // This will alert 'false'!
alert( /07d{9}/.test('zero... 7 4756 54 4') ); // This will alert 'false'!
alert( /07d{9}/.test('blah blah blah blah') ); // This will alert 'false'!

If the phone number is being retrieved from user input then there’s a chance they’ll use the national code (+44) instead of the leading zero. To accommodate this possibility we can include a basic ‘this-or-that’ sub-pattern within our regular expression:

// Regular expressions use a single pipe (|) as an OR operator.
// NOTE: we need to escape the plus symbol before 44, this is
// done using a leading backslash.
alert( /(+44|0)7d{9}/.test('+447738273772') ); // This will alert 'true'!

Like I said, there are more than a dozen different ways to test for a format of something even as simple as a phone number. The different methods will vary in efficiency and flexibility. If we hadn’t accommodated for that ‘+44′ prefix then our regular expression might’ve been considered incomplete, but it really depends on where it will be used. Our final pattern is not complete, there are many different variations which we have to accomodate for. At the time of writing this is how far I’ve gotten:

// Define pattern:
var pattern = /^((00|+)44|0)7d{9}$/;
 
// Explanation:
// --------------------------------------------------------------
// ^ - the symbol used to indicate the beginning of the string.
// --------------------------------------------------------------
// 00 OR + - Some people will dial 00 instead of plus (+)
// Pattern: ( 0044 OR +44 ) OR 0 THEN 7 THEN 8 DIGITS
// --------------------------------------------------------------
// $ - the symbol used to indicate the end of the string.
// --------------------------------------------------------------
 
// Define phone number AND get rid of all spaces:
var phonenumber = ('+44 77 83 7 3   728   8').replace(/s/g,'');
 
// Notice the "/s/g" in the line above? Explanation:
// --------------------------------------------------------------
// s is a shorthand character class for a whitespace character.
// --------------------------------------------------------------
// The 'g' added after the closing '/' is a flag which
// indicates that it's a global search and replace.
// --------------------------------------------------------------
 
// Let's test it:
alert( pattern.test(phonenumber) ); // This will alert 'true'!

In the final pattern you’ll notice I used a caret (^) and a dollar symbol ($) to mark the beginning and end of the string. If I hadn’t done this then the following string would also alert ‘true’ when tested: ‘blah blah 07792883884 blah blah’ – there is a valid phone number within the string but it contains other stuff so we don’t want it to be a positive test; we want it to fail. I also got rid of all the whitespace characters within the string before testing it with the regular expression (replace(/s/g,'')) – the position of spaces don’t really matter with phone numbers so there’s little point in retaining them, plus it makes testing a little easier, and probably quicker.

More syntax

Covering every single syntax character of regular expressions would take ages and that wasn’t really the purpose of this post. If you want to find out more then definitely check out this website: www.regular-expressions.info, and particularly this page.

Here are a couple other regular expressions which can be used for validation:

// Valid date in dd/mm/yyyy format:
/^(0[1-9]|[12]d|3[01])/(0[1-9]|1[012])/d{4}$/
 
// Valid email (not as solid as it could be):
/^[w.-_]{1,}@[w.-]{6,}/

Other JavaScript methods

So far we’ve only covered how you can use the test() method within JavaScript to test if a pattern exists within a string. Many JavaScript String methods will accept a regex, for example, you can split a string into an array with a regular expression using the String.split() method:

var theString = 'Blah-Blah_Blah-Blah.Blah,Blah_Blah';
var theArray = theString.split(/[-_,.]/);
// theArray: ['Blah','Blah','Blah','Blah','Blah','Blah','Blah']

Another common method used with regular expressions is the String.match() method which will test for a pattern and return any portions of the string which match that pattern. This can be really useful when you need to extract information from a complicated string. An example:

var complicatedString = '<a href="http://www.google.com" title="google">Google!</a>';
 
// We want to extract the URL
var theMatch = complicatedString.match(/href="(https?[^"]+)"/)
 
// The above match will return an array. The first value [0] of
// the array is always a match of the entire pattern, you can define
// the 1st, 2nd, 3rd etc. by using brackets (groups) within the pattern.
 
alert( theMatch[1] ); // Will alert 'http://www.google.com'

You can also used regular expressions with exec, search and replace. Here’s an example of using it with the replace() String method:

var needsReplacing = 'This is a phone number: 07798836774, hmmm, so is this - 0793367466....';
 
// We want to wrap all the phone numbers in the above string with SPANs,
// each with a class of "phone-number":
 
alert( needsReplacing.replace(/(((00|+)44|0)7d{9})/g, '<span class="phone">$1</span>') );

You’ll notice that to retain the value of the match (the phone number) we used ‘$1′ – this is a replacement pattern which can be used to return a match of a group within the pattern.

Last words

Thanks for bearing with me on this one, I’m only a beginner so I’m bound to have made a couple of mistakes. Anyway, hopefully this post was helpful to some people or at least encouraged some of you to learn regular expressions for yourselves.

Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!