7 March, 2019
TSH Coding Challenge
At TSH, we have a recurring event called Coding Challenge – you write a piece of code in the language you choose and solve a problem specified by the judges. What are the criteria for winning? You literally write anything that fits into requirements, but also convince the jury that your code is simply the best. (Tina Turner playing in the background). So, if you want to get more votes, do something clever, surprising or elegant in your work (or better, everything at once!).
This edition, the goal was to write a simple function that validates postal codes. The function should take a string to test and return a boolean answer if the provided code is OK: a simple TRUE or FALSE. Every postal code should have five digits with an optional separator between second and third character. Space, dash and underline are the only allowed separators here. In other words, those postal codes are correct:
- 00 111
And those aren’t:
The first solution that came up to everyone’s mind were regular expressions. This is the best practice when it comes to format validation, especially when we have established rules about what’s good and what’s not.
Of course, I didn’t use a regular expression.
Not because it would be the easiest method. I simply expected that most people at the Coding Challenge will use them (I was totally right by the way). So, I took a different approach. I wanted to go against the tide and make my code as complicated as possible. However, making a mess in the code seemed too trivial so I decided to be more specific and… play some golf. Code golf.
Do developers play golf?
Rules of golf are pretty simple: players try to put a small white ball in multiple holes of a grass field, using as few golf club strokes as possible. Code golf is kinda similar. You have to write code with the lowest number of characters. You can say that instead of counting the ball strokes we count the keyboard taps. The shorter the code is, the better is your result, but it still has to work according to rules.
Going with a regular expression is still a good solution here because the code will be super short. That’s true, but also far too easy (and it’s a Coding Challenge, for Pete’s sake!). I decided that a regular expression is a no-no, as it would kill the fun. I also hoped that this would impress the judges.
Enough theory, now let’s write some messy code!
I started working on my code by writing some tests in the Jest framework. I made 18 small unit tests for a single function to cover all edge cases. A perfect Test-Driven Development approach (actually not so much – I’m joking around).
All the tests failed as expected. And I haven’t even started writing mine postal code validator yet. To do it, I had to create the function first.
I used an anonymous arrow function because its signature is shorter compared to a normal one with the function and return keywords. Don’t mind the long variable names. I am keeping them only to make the code more readable for now and I will remove them later.
The digits problem
Now, we need to check if the code has only digits in the appropriate places. Notice that postal code is really just a number with an optional separator between some digits. In other words, if we ignore that separator, we can just treat the postal code as a five digit number! So, let’s use that in our validation. I did it by picking the first two characters, last three characters and checking if they make a number together.
For example, for the code 12-345 we are slicing the “12” from the start, “345” from the end and concatenating it into “12345”. Note that this code is insensitive to how many separator characters you have in between or how long the provided code is – we always use the first two and last three characters. Now, we have to validate the result. I cast it to a number type with the Number function. It returns a proper number value from the string when possible and NaN (not a number type) otherwise.
Side note: we can already make this code a little bit shorter by using a + sign operator instead of the Number function. This code will work exactly the same as the one above:
We’re not done yet because our function is currently returning the number, but it has to return a TRUE or FALSE value. We can easily fix it by using a double exclamation sign which works as a double negation: it changes everything into a boolean value.
Great! But there’s a problem here which I need to address. If the code is, for example, “1a-234” after converting it to a number we will get a NaN, which is FALSE after casting to boolean. Good! 12-345 will be properly interpreted as a number (12345), and we will get TRUE and the end. Fantastic! How about 00-000? The “00000” will be nicely interpreted as a number (zero) but casting it to bool will give as FALSE because zero is false!
That’s why we have to do a trick here with adding 1 to the result. It will take only two more characters in our code, but it will make our number always positive and true. No more problem with the 00-000 edge case.
Now, postal code like 0b-101 will be concatenated into 10b101 which is not a binary number anymore because of the wrong prefix. Neat, we are slowly moving forward! There’s only one more edge case here to solve, for numbers with a dot: “0.123”. They are not a valid postal code, but currently, our validator will return TRUE for them because they are a proper number. So, what can we do? The same thing as last time: somehow break those numbers with a full stop. And I decided to do it by… introducing a second full stop!
Now, whenever a postal code has only digits, the validator will be internally making one full stop-something number, e.g.: 12-345 into 1.12345 which is a proper number. This will return TRUE. However, when the function will be provided with a problematic postal code like “1.-234” then the validator will be making a 1.1.234 which is nowhere proper and will return FALSE. The only downside of this solution is that we lose three more characters in our code golf. This is something I can live with for now.
Phew, the hardest part is over. Now let’s talk about separators.
Meet the separators
In the examples above, we were using mostly dashes but there are three separators allowed: dash, underline, space and an empty string (no separator). If we’d like to define an array with them, we’d get this:
A lot of apostrophes. We are playing code golf here so no way we’re leaving it like that. Do you know the spread operator from ES6? It allows putting elements from one array into one another without manually rewriting all elements. Like this:
Ok, but why am I talking about it? Because strings can also be treated as arrays so – thanks to it, and the spread operator – we can save some characters by spreading the string of allowed “visible” separators.
This array is the same as the one above. Here, however, we use only 12 characters instead of 15. Do you remember three lost characters from fixing the dot problem before? We’ve just found them and made up for them in the code golf.
Of course, we not only need to check if the separator is of an allowed type but, also, if it’s in the proper place. Now, we’ll repeat the trick with splitting the string: we’ll take all the characters from the provided code starting after the second character and ending before the third character from the end. If the separator is correct, it should be included in the array of available separators.
Thanks to it, if someone provides, for example, “12___345” we will be testing “___”. Also, a code without the separator, like 12345, will be tested with an empty string which is in our array of allowed separators.
By gluing everything together, I got this:
The final stroke
Currently, almost everything is done and done. The only way the user can bypass our validator is by passing a string of three or four digits. Let’s analyze this example”: “1234”. The function will concatenate the fix “1.”, the first two characters “12”, and the last three “234”. We will get “1.12234” which is a fine number and our validator will not find the problem. So, we have to additionally check if the string has at least 5 characters. How to do it? By just checking if the fifth character of the string exists. I added the checking at the beginning of the function body.
That was the last thing we had to check. Other incorrect postal codes, such as a code with too many characters, will be rejected by mechanisms I’ve added earlier. Knowing that we can jump straight to removing all unnecessary spaces and characters from the code, gives us more points in the code golf.
The final result? Only 86 characters. And the validator passes each of the 18 unit tests. You can check it out yourself by using the repl.it service.
And the trophy goes to…
Did TSH judges like my crazy solution? Yes. I got most of the votes and I won the Coding Challenge. My award? A bottle of great Belgian beer! You may ask yourself, is this the best solution I could come up with? I am not sure. Maybe someone can rewrite this validator using even fewer characters?
PS. Exponentially growing problems
Almost two weeks passed from publishing this article – it was shared in many placed and sparked some comments on Reddit, Facebook, Twitter and our social e-mail address. The most recurring topic was exponential numbers. Yes, I admit, I forgot about them.
The so-called exponential or scientific notation is made out of two normal numbers separated by the letter “e”. To get the “full value” you have to take the first number (significand) and multiply it by the 10 raised to the power of the second number (exponent). So “12e3” becomes 12 * 10^3 = 12 * 1000 = 12000. This comes in handy when you want to put in a code the distance between planets. If however, you want to describe how small the human body cells are, you can put negation in front of the second number. For example “34e-5” is 34 * 10^-5 = 34 * 0.00001 = 0.00034.
So now we know what’s the problem here, and we can finally solve it. Let’s start with adding a new unit test.
How we will make it pass? We don’t need to do a revolution in our validator code, we only need to use the same trick we’ve used for the full stop problem. We will add an extra “e” character to the number made internally in our validator. Now when someone will use it the postal code, the validator will return FALSE because no number can have two “e” characters. Someone proposed to put it between sliced parts of the postal code but I will add it to the existing prefix to save some characters.
I had to add an extra zero before the character so we wouldn’t get an invalid ‘1.e’ in the number. This solution will make our tests pass but we must be aware of some consequences. At this moment, the validator will always internally create a number written in exponential notation. It means we can have problems with the limitation of numbers range or with the negative exponent. For that, I added four more tests.
The second test is similar, however, it checks what will happen if someone tries to sneak a minus to force the validator to calculate some very, very small number. By accident, this test will also work because internally made “1.0e-9999” will generate a number so small the browser will change it to zero. And zero is FALSE when cast to bool.
Unfortunately, the third extra test will fail. It also has a minus in the first place but the number will not be too small. From an invalid postal code “-0001” the validator will concatenate a string “1.0e-0001” which equals 0.1 and it is a valid number. That’s a problem because our validator would accept this value and return TRUE. How to fix it? Note that minus in the exponent is only valid in the first place. If we somehow disallow the user to put the minus in the first place in exponent then we will make the test green. The easiest way to do it is, of course, adding an extra zero after the “e” character.
It makes the code look even more like black magic! But hey – the test is green now. This trick will also fix the fourth test. Numbers like “1e+23” are also valid and we don’t want any pluses in our postal code. So now all tests pass again and we can call it day.
Currently the full solution takes 89 characters but hopefully, it covers all the edge cases. Test it yourself here!
Thanks to everyone who mentioned this problem on the discussion thread or contacted me with direct messages. I also want to say that many people send me their great ideas on how to improve the code and make it even shorter! Because the contest is still on, I won’t spoil other people’s ideas. But man, I could have saved so many characters… Who knows, maybe even someday I will publish and review those solutions. 🙂