A ux.stackexchange question prompted me to consider how one might implement a more permissive type of input validation. It’s not rare for a form to punish the user if they add an extra space before typing in a date, or accidentally use a comma instead of a period when typing in an IP address. After-all, we employ strict validation to keep the data correct.
Garbage In — Garbage Out. It rings true but maybe, taken too literally, it makes us form strict validation and a no-exceptions policy to rebels. We punish a user typing ’12’ instead of the fully-qualified ‘2012’,… why? Either it’s our thoughtlessness or it’s the very unlikely (depending on context) possibility that the user did in-fact mean the year ‘1912’ or ‘1812’ or ‘1012’…
If we start down the road of permissive input validation then we need to also explore input correction. We can’t allow a rogue comma to slip in and not correct it. It’s probably best to correct it straight away (not too soon — possibly on blur) so that the actual data stored conforms to the correct format.
William Hudson executed a date survey in 2009 to discover all the various ways American users like to enter dates. The results show that users use a variety of formats. It makes perfect sense to accept all these variants and let the computer figure out what is what.
For the specific problem of entering dates, I would like to recommend Date.js, because it can successfully parse most of those variants. However, there is a big caveat when it comes to dates, especially on international forms. The American style of entering a date, MM/DD/YY, is technically impossible to differentiate from the other standard of DD/MM/YY, unless the DD
portion happens to be above 12. For this reason I guess it would be best to cater to your localized users as best as possible.
An alternative is to retain rigidity in your validation but allow for some minor mistakes. For example, insist upon the ISO format of YYYY-MM-DD but don’t make a fuss if the user separates with a slash or a space (or heck, anything) instead of a dash.
My point is: Maybe formal validation with permissive aspects mixed in gives us the best of both worlds. We don’t punish the user for minor mistakes, and we don’t end up with ambiguous data.
In an attempt to practice this technique of mixing rigidity with leniency, I created vic.js.
Currently validation in JavaScript can be quite an ugly affair, plagued with remnants of DHTML and overly invasive input masks. It’s not uncommon to see stuff like this:
someInput.onkeyup = function() { if (!this.value.match(/some rigid regex/)) { alert('Enter the right value, you fool'); } }; |
Typically the rules are strict, the characters non-negotiable, the regular expression unyielding, and the presented invalidation UI annoying.
vic.js (a.k.a Vic, VIC) allows you to define a lenient regular expression, and it expects you to extract your important data from the captured groups.
Vic’s signature goes something like this:
vic( LENIENT_PATTERN_WITH_CAPTURED_GROUPS, PER_GROUP_PROCESSOR, POST_PROCESSOR ); |
The simple example would be a ‘year’ field:
var yearVic = vic( /^s*(d{1,4})s*$/, function(year) { // Let's assume anything between 14 and 99 is from the 1900s: return vic.pad(year > 13 && year <= 99 ? '1900' : '2000' )(year); }, Number // cast full output to a Number ); yearVic('2012'); // => 2012 yearVic('01'); // => 2001 yearVic('hd2kd9'); // => false yearVic('20021'); // => false yearVic('96'); // => 1996 yearVic(' 4'); // => 2004 yearVic('113'); // => 2113 |
The regex used for the year example, /^s*(d{1,4})s*$
, is lenient in that it allows whitespace at the beginning and end, and doesn’t mind if the user enters one, two, three or four digits for the year. For years greater than 13 or less than 100 we assume the user is referring to the previous century, so we apply ‘1900’ as padding, otherwise we assume we should pad with ‘2000’.
Vic offers a couple of helpers for basic tasks like padding, applying lower/upper case, etc. I’ll probably be adding to these as I think of more common use-cases for vic.
Vic allows more atomized per-group processing too. In this example we’ll validate a date in the form YYYY-MM-DD, but we’ll allow any one of ./,:-
(plus spaces) as separators, and we’ll validate the component numbers and pad them too:
var vicDate = vic(/^s*(d{1,4})[./,: -](d{1,2})[./,: -](d{1,2})s*$/, { 1: function(year) { // Year between 50 and 99 assumed to be '19YY', otherwise presumed after 2000 return vic.pad(year >= 50 && year <= 99 ? '1900' : '2000' )(year); }, 2: function(month) { return month >= 1 && month <= 12 && vic.pad('00')(month); }, 3: function(day, i, all) { // Check that there are {day} amount of days in the entered month: return day > 0 && day <= new Date(all[1], all[2], 0).getDate() && vic.pad('00')(day); } }, function(v) { return v.join('-'); }); vicDate('111'); // => false vicDate('2/3/4/5'); // => false vicDate('16.332.2'); // => false vicDate('20 1 20'); // => false vicDate(' 1999.7.0'); // => false vicDate('1999.0.1'); // => false vicDate('1999.9.32'); // => false (no 32 in Sept) vicDate('1999.2.28'); // => '1999-02-28' vicDate('1999.2.31'); // => false (no 31 in Feb) vicDate('1.1.1'); // => '2001-01-01' vicDate('1956.3.2'); // => '1956-03-02' vicDate('16.03-2'); // => '2016-03-02' vicDate(' 20 1 20 '); // => '2020-01-20' vicDate('1999.7.31'); // => '1999-07-31' |
What we’ve done above is execute a rigid validation of the data that’s important to us (YYYY, MM and DD) while letting the user mess with the non-important stuff to their heart’s content (whitespace & separators).
Vic is simple. It’s not a high level abstraction but it’s not complex. It’s a few lines of code.
The fact is: you could easily integrate this methodology into your own validation utilities. The basic principle is to extract the important data, validate it, but allow the user some flexibility in how they give you the important data.
Check out vic.js on Github.
Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!
Thanks for this. Have you seen Ward Cunningham’s paper on the CHECKS Pattern Language of Information Integrity? It relates and you might find it interesting.
http://link.jbrains.ca/UJl7q0
@J.B. Thanks for the link! There’s always useful stuff hidden away on c2.com
I very much like the idea of a domain model attempting to extract the correct information from an input and then deferring validation. It’s a bit of a wonder that we’re so used to the status quo of immediately punishing the user whenever they get something wrong.
Thank you for vic.js. It’s a great idea, and I’ll definitely check out vic.js.
Excellent post and great idea on the code. I have similar issues with most libraries that support validation: they end-up being too strict and interfere with the user flow. Entering IP addresses is often a pain as comments above mentioned.
My other pet hate is fields that simply limit the number of but include spaces in the count. For example phone numbers and credit cards. I should be able to paste or type with spaces between arbitrary character groups (e.g. ‘xxx xx xxxx’) and only the actual characters count towards the limit. I’d expect the whitespace to be removed by the app on blur or submit and only get an error once this has been done.
I’d love to see something like this integrated within common frameworks e.g. AngularJS that I’m playing with at the moment.
I think that the majority of users are annoyed sometimes when trying to enter this type of data into various fields, although in corporate environments, a rigid system is preferable because of the problems that might occur when entering incorrect data.
But for us normal users, this idea can be more than a smooth way to enter data, but also a time saver, skipping the few tries until the correct format is entered.