Once upon a time, there was a man named Mr. Null. He was a very nice man, but computers didn’t seem to like him very much.
“Most will accept ‘Null’ without complaint,” writes Christopher Null, the man in question, in Wired. “Some will loop back to the input screen and tell the user to try again, that the last name field can’t be blank (But it’s not blank! That’s just my name!) Some will tell the user that ‘null” is a reserved term that can’t be used. And some will just crash.”
Stories like that of poor Mr. Null aren’t unusual. With the world becoming increasingly computerized, people with atypical names run into problems. Programmers love to trade stories about such situations and how they’ve dealt with them. “I work with a guy who has the last name Null,” writes one. “He constantly gets e-mails from random people and auto-generated e-mails when a system fails to deliver something properly. He spends 30-45 minutes a day cleaning out his inbox.”
Special characters such as hyphens and apostrophes can also screw up data entry and email systems, which are not always graceful about how they acknowledge this. “Don’t blame me for having a last name that your system doesn’t like, whose fault is that?” fumes John Graham-Cumming. “Saying ‘Your last name contains invalid characters’ is plain offensive.”
Even spaces can be a problem. “Motor Vehicle Commission computers can’t handle certain names on driver’s licenses,” writes Karin Price Mueller in the Star-Ledger. “That means New Jerseyans with two-word first names (Mary Ann) or last names (Price Mueller), or those who use an apostrophe (D’Egidio) or a hyphen (Smith-Jones), can’t have driver’s licenses that match their other legal documentation, such as passports and birth certificates.” That’s when things can get complicated. Technically, someone who writes their name differently from what’s depicted on a legal document can be seen as violating the law.
And some names are just too long—a problem that dates back decades. “Many computer systems can only link accounts if the names match exactly,” writes one person with a 26-letter name. “Since my name is truncated differently in different places, and is formatted differently in different systems, in many cases it’s just impossible. For example, right now I’m charged a fee for transferring money between two bank accounts, since the free transfers only apply if the names match exactly. Another example is my cell phone bill. To pay it by CC my name must match exactly, but all my cards have slightly different spellings.”
Programs also run into problems with some other words that are typically reserved for other uses when they run into actual people with those names. For example, there’s NaN, for “not a name” or “not a number.” “We had a customer with the last name ‘Echo’ who couldn’t make a credit card payment,” reminisces one programmer. “Turns out that the card processor was looking for strings which were common Unix commands and not allowing them.” Flickr co-founder Caterina Fake reportedly has trouble buying airline tickets and signing up for Facebook.
Similarly, there’s a number of license plate stories where someone signs up for a personalized plate such as NOPLATE or NO-TAGS, and then finds themselves getting all the tickets for people where the officer didn’t get the license plate.
The guy who used to own the domain donotreply.com apparently got all sorts of interesting email from people who didn’t pay attention when they were responding to corporate mail. (Incidentally, the domain name is now up for sale, if you want to experience the fun yourself.) Then there’s Adam Croot, who holds the Twitter handle “Undefined,” which meant his Twitter feed showed up in unexpected places in response to errors. And in another “null” example, a guy who held the username of “null” on one messaging system got a lot of misdirected texts as well.
There’s also the sample text that programmers use to test systems. “I have joked that I might change my name to Sample User, develop a piece of land in the country, and name my road Example Avenue, taking address 123,” notes one programmer. “This would make me impervious to datamining, because my results would always be thrown out.”
Any discussion about this subject, of course, means people start talking about Little Bobby Tables. (In fact, it’s so well known that there’s an SQL blog named after him.) It’s a reference to the geek comic XKCD, which in one of its earlier strips had a mom getting a call from the school asking if she had really named her son “Robert’); DROP TABLE Students; –.” “Oh, yes. Little Bobby Tables, we call him.” It was, she explained to the school—which had just lost its entire student database—a lesson in input sanitization.
The problem is, programmers say, input sanitization isn’t all that easy. Once you start changing data, all sorts of strange things can happen. E, for example is not the same as É. Letters with accents and other diacritical marks also typically are alphabetized differently from the same letter without an accent, which means that a “sanitized” name won’t end up in the right place in the database.
“You CAN’T ‘sanitize’ for every possible use,” summarizes one programmer. “You can not correctly figure out in advance how to represent an input, because the different possibilities are numerous and actively self-contradictory. To ‘sanitize’ for ‘every possible use’ is pretty much to remove everything that isn’t an ASCII letter.”
“I have never seen a computer system which handles names properly and doubt one exists, anywhere,” writes Patrick McKenzie in the canonical piece “Falsehoods Programmers Believe About Names.” “I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them.” He goes on to cite 40 examples, such as:
- People have exactly one canonical full name.
- People have exactly one full name which they go by.
- People’s names are not written in ALL CAPS
- People’s first names and last names are, by necessity, different.
- People have names.
- People have exactly N names, for any value of N.
“This list is by no means exhaustive,” McKenzie writes. His advice to programmers? Get to know these assumptions, and make fewer of them when writing code that involves names.
Simplicity 2.0 is where we examine the intricate and transitory world of technology—through a Laserfiche lens. By keeping an eye on larger trends, we aim to make software that’s relevant to modern day workers, rather than build technology for technology’s sake.
Subscribe to Simplicity 2.0 and follow us on Twitter. If what we’re saying piques your interest, head over to Laserfiche.com where you’ll see how we apply the lessons learned on Simplicity 2.0 to our own processes, products and industry.