Falsehoods Programmers Believe About Names – With Examples

Falsehoods Programmers Believe About Names – With Examples

In 2010, Patrick McKenzie wrote the now-famous blog “Falsehoods Programmers Believe About Names”, in which he listed 40 things that were not universally true about names.

Did programmers sit up, take notice and change their attitudes to names? Sadly, not really. We still get asked to fill our names out in online forms which assume we have a first name and a last name (in that order) and which refuse to allow us to continue unless we have filled out both. They assume our names can be entered in alphabetic characters, often only ASCII.

I fear that part of the reason that this blog post had less impact than I hoped was that Patrick did not give examples of how each assumption can be false. But having worked in a previous life on IBM’s Global Name Management product, I can assure you that it’s all true.

Still not convinced? In this post I’m going to list all 40 of Patrick’s original falsehoods, but give you an example (or two) drawn from my experiences working in this space. Ready? Let’s go!

  1. People have exactly one canonical full name.
    It seems some people believe that you get a name and it never changes. Not so, even in Western countries, where a person may change their name when they marry. In Catholic tradition a person may get a middle name at time of confirmation.
  2. People have exactly one full name which they go by.
    The author known most often as John Wyndham (author of The Day of the Triffids) bore the name John Wyndham Parkes Lucas Beynon Harris, and published books under the names John Beynon and Lucas Parkes, as well as John Wyndham.
  3. People have, at this point in time, exactly one canonical full name.
    A performer may have a stage name, completely separate from the name on their birth and marriage certificates – they may even have a passport in their stage name.
  4. People have, at this point in time, one full name which they go by.
    Not so, even in Western countries, where a woman may choose to retain her unmarried name at work (where she is already known by that name), and use her husband’s surname on social occasions, and even on legal documents such as mortgages and loans.
  5. People have exactly N names, for any value of N.
    An English name may traditionally consider of two given names (often called a first name and a middle name) and a surname, but that’s not required. A person may have no middle name, or may have several. A Portuguese name, for example, may have one or two given names, and up to four surnames (up to six in the case of a married woman), and those surnames may be phrases, such as da Silva or dos Santos, or even Costa e Silva).
  6. People’s names fit within a certain defined amount of space.
    The renowned painter, best known simply as Picasso, had the full name “Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso”. Try fitting that name into a form which allows 30 characters for a name…
  7. People’s names do not change.
    Given that we have already mentioned a person changing his or her name at the time they marry, this is clearly false. Moreover, Catholics may adopt an extra middle name at the time of their confirmation. It’s also common for a person to add a name, or change their name entirely, when converting to another religion – consider Cat Stevens becoming Yusuf Islam or Cassius Clay becoming Muhammed Ali when converting to Islam.
  8. People’s names change, but only at a certain enumerated set of events.
    It was common, for some people in Thailand to change names to avert bad luck. That might happen without a recognised event. Sometimes a person will change their name when someone else with the same name becomes famous, or infamous – a notable example was people changing their surname from Hitler.
  9. People’s names are written in ASCII.
    Patently false, if we consider that ASCII does not include the accented characters which appear in French and Portuguese names. Nor does it include the Greek alphabet used in Greek names, Cyrillic characters for Russian names. Then there are scripts like Devanagari for Indian names, Chinese characters (hanzi) and Japanese characters (Kanji), and many more.
  10. People’s names can be written in any single character set.
    People have names that mix, for example, Kanji and Latin, or Hanzi and Latin, or Hangul and Latin characters. In many cases this is because they have a “Western given name” to cater for those of us unable to pronounce the given name in their native language.
  11. People’s names are all mapped in Unicode code points.
    The Unicode code standards team continue to add code points to the standard to accommodate rarer and rarer characters, and the vast majority of names are already covered, but there are still exceptions, such as the symbol that “the artist formerly known as Prince” adopted. Even if we eliminate such curiosities, there are (a few) alphabets which are not yet covered by Unicode (perhaps the most realistic example is Aymara, a script for a language spoken by well over a million people in South America; less realistic is Klingon, or the character sets invented by J R R Tolkien for his Middle Earth). Moreover, Unicode only includes a subset of Chinese and Japanese characters, and some of the omitted characters are used in names.
    To complicate matters further, there are languages which do not have associated scripts – they cannot be written down. There are no Unicode code points for such languages. Names in those languages might be captured in phonetic symbols, but that’s not particularly helpful, because the majority of people are unfamiliar with the phonetic alphabet.
  12. People’s names are case sensitive.
    Many character sets are not case sensitive – Chinese and Japanese, for example – uppercase / lowercase is an idea that is simply not applicable.
  13. People’s names are case insensitive.
    Some character sets are case sensitive – Latin, for example. More importantly, there are character sets where characters may be accented in lower case, but not in upper case, so it is not possible to provide a “round trip” from lower case to upper case and back to lower case.
    Correct capitalization can be very important to some people, such as the owners of the surnames Mackenzie and MacKenzie.
    The correct use of case is also important with surnames such as van Gogh, du Barry, da Costa, O’Brien, and D’Agostino, and given names like Jean-Pierre.
  14. People’s names sometimes have prefixes and suffixes, but you can safely ignore those.
    Nothing could be further from the truth. The Dutch name Pieter van der Meer is not the same as Pieter Meer, even though “van der” is a prefix.
    You might consider Junior to be a suffix in Robert Downey Junior, but if you omit it, you are referring to his father, not to him.
    In Arabic names, the suffix al-Din means “of the faith” or “of the religion” – names such as Taj al-Din (“crown of the faith”) or Saif al-Din (“sword of the religion”) are not the same name when the suffix is suppressed. An Italian name such as di Stefano is not the same as Stefano.
    A Spanish woman with the surname “viuda de de la Cruz” is the widow of a man with the patronym “de la Cruz”. Omitting those prefixes changes the meaning of the name.
  15. People’s names do not contain numbers.
    Even if we ignore the cases where the number is a generation (Thurston Howell III, for example), there are cases where a number is part of someone’s legal name. Jennifer 8 Lee chose to give herself the middle name of 8 because 8 is associated with good fortune.
  16. People’s names are not written in ALL CAPS.
    In some countries (notably French speaking) it is convention to write a person’s surname in all caps to make it clear which part of the name is the surname. This convention has solidified to the point that rendering their surname not in all caps may be regarded as impolite.
  17. People’s name are not written in all lower case letters.
    e e cummings preferred his name written in all lower case. So does k d lang. It’s polite to follow the pattern used by the name’s owner.
    There is an Irish / British surname ffrench which is conventionally written in all lower case, although that tradition is suffering from poorly designed software which insists on capitalizing it.
  18. People’s names have an order to them.  Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
    In the Netherlands, Vincent van Gogh would be indexed and sorted under G, for Gogh; in Belgium the same name would be indexed under V, for van Gogh. It’s not possible to adopt a single ordering for names which will yield a universally accepted order. The system used by many libraries is to apply the rule appropriate to the place of birth of the person in question (not a rule I’d want to try to apply in software).
  19. People’s first names and last names are, by necessity, different.
    An Australian businessman and politician called Benjamin Benjamin died in 1905. Jerome K Jerome was an English humorous writer best known for Three Men in a Boat. Owen Owen was a Welshman who founded Owen Owen Ltd, operating a chain of department stores. And let’s not even get started on the wrestlers and actors who adopted repeated names for stage purposes.
  20. People have last names, family names, or anything else which is shared by folks recognised as their relatives.
    In Java it was common for a person to be given a single name, and not to have a surname. The Indonesian presidents Suharto and Sukarno both had no surname, for example.
  21. People’s names are globally unique.
    Tell that to anyone named John Smith! I have a somewhat less common name, yet I discovered a person with the same name working in the same industry in the same country (Australia).
  22. People’s names are almost globally unique.
    Even with the tendency to use unusual spellings of names, it’s extremely common to find people who share a full name – try Googling your own name.
  23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
    The Chinese name Zhang Wei is reported to be shared by over a quarter of a million people.
    If we limit the question to surnames, about 20% of the population of South Korea have the surname Kim. About 10% of the population of northern China share the surname Wang, while more than 10% of the population of southern China share the surname Chen. Li comes next in both northern and southern China, making it the most common surname across the country. And nearly 40% of Vietnamese have the surname Nguyen.
    Names are far from unique.
  24. My system will never have to deal with names from China.
    Migration has spread names from every culture to (almost) every country. The days when immigrants were renamed on entry to a country have passed, mostly (for example, Vietnam still requires that an applicant for citizenship takes a Vietnamese name). It is unrealistic to expect to avoid names from other countries, although you may see them in a transliterated form.
    So a Chinese name like may appear in your system as Chow Yun-fat, or Chow Yun Fat, or even Yun Fat Chow (Chow is his surname).
  25. Or Japan.
    see above.
  26. Or Korea.
    see above.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
    see above.
  28. That Klingon Empire thing was a joke, right?
    It’s difficult to find examples of people using Klingon names as their official names, but should we stop someone doing so? Once we can handle the things required for other cultures (such as the embedded apostrophe for “O’Brien”), we can support Klingon names without extra work.
  29. Confound your cultural relativism!  People in my society, at least, agree on one commonly accepted standard for names.
    And will your software only be dealing with people named by your society?
  30. There exists an algorithm which transforms names and can be reversed losslessly.  (Yes, yes, you can do it if your algorithm returns the input.  You get a gold star.)
    There is no algorithm (short of remembering the original format) which can transform a name in a guaranteed reversible way.
  31. I can safely assume that this dictionary of bad words contains no people’s names in it.
    This is a common mistake – many “bad words” are not bad words in other languages, and some are used in names. Moreover, not every society restricts what words may be used in a name; it’s perfectly possible that someone’s name may have been established in such a jurisdiction.
  32. People’s names are assigned at birth.
    Births are recorded in most countries, but the effectiveness of the system varies.
    The exact rules vary by jurisdiction, but all allow for some delay in registering a birth. The length of the allowed delay varies from at least as short as 3 weeks (Scotland) to at least 2 months (Australia), and there is provision for registering births later.
    The child’s name may be recorded at the time that the birth is registered, but it doesn’t always happen (birth registrations with a name like “Baby Boy” or “Baby Girl” still happen, when the parents have trouble choosing a name, or the child is a foundling, for example).
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You’re kidding me, right?
    There are cultures where a person’s adult name is not chosen until puberty. Prior to this the child may have a “milk name”, or a temporary name.
  37. Two different systems containing data about the same person will use the same name for that person.
    If this were true, then there would be no market for software which reconciles different databases.
    In my own case, some systems contain my formal name, including my middle name. Others have just my first given name and surname, or my nickname and surname. And I’m a simple case. My wife was in some systems with her maiden name, in others with her married name, with or without her middle name, with her full first given name or with either of two spellings of her nickname.
  38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
    Imagine what happens when a name is entered by a person hearing it over the telephone. Consider cases like Thomson and Thompson; or Johnson, Johnston, Johnstone, and Jonsson.
  39. People whose names break my system are weird outliers.  They should have had solid, acceptable names, like 田中太郎.
    No, your system is badly designed.
    This particular example name is perhaps best known as the name of an alien in an anime series (and a manga). There have also been real people with this name.
  40. People have names.
    This one is perhaps the most difficult for which to give solid examples. There was an isolated culture in which no one had names – they referred to everyone in relative terms, such as “my mother’s eldest sister”.

Let’s Wrap This Up

So there you have it: examples for (almost) all forty of Patrick McKenzie’s “Falsehoods Programmers Believe About Names. If you’re feeling a little overwhelmed, here are what I think are the most important facts to consider the next time you’re designing a system that processes names:

  • Do not use terms like “first name” or “Christian name” – “given name” is the most commonly accepted term in English.
  • Keep in mind that half of the world orders names with the family name first.
  • Many cultures use something other than a single surname inherited by all members of the family – some use a patronym or matronym (or even more than one); while some do not have a surname at all.
  • Punctuation can be a vital part of a name – the Irish surname O’Hara is not the same as the Japanese surname Ohara. Jean-Pierre is not the same as Jeanpierre, nor is it the same as Jean Pierre – Jean-Pierre is a single given name, while Jean Pierre is two separate given names.
  • Spaces do not necessarily separate parts of a name – de la Cruz is a single surname, not three separate names; Chinese names written in hanzi are written without any spaces.
  • Capitalization is not as simple as making the first letter of each word uppercase – van der Meer may have a capital V when used without a given name, but has a lowercase v when the given name is present.
  • Use the name as a whole, rather than trying to break it into parts. For example, do not try to address a man using Mr last-word-of-name – this can fail in many different ways:
    • Where the surname is written first (eg: Chinese)
    • Where the correct usage is of the patronym, and it is not last eg: Spanish, Russian
    • Where the surname is more than one word, eg: Spanish, such as de la Torre
    • Where the name contains a suffix, such as Junior

And finally, I highly recommend the guidance in this short article published by the W3C: https://www.w3.org/International/questions/qa-personal-names

  • Koriit
    Posted at 22:29h, 30 November Reply

    Thanks for bringing it up again. This time I’m going to bookmark it and keep it in my collection of important links!

    As in original article, I have again laughed at point 28. 😀

    • Marten
      Posted at 08:34h, 11 December Reply

      Another point that could be made is that the same name can be written in different ways.

      One of the examples used the Chinese family name 周 and wrote it as “Chow”. This is however some bastardized romanization of it. My MIL has the same name and writes it as Chou (wade-giles romanization), me and my son also has it and writes it Zhou (hanyu pinyin romanization).
      Same name just different ways of writing it in systems that don’t support chinese characters.

      • tony rogers
        Posted at 09:57h, 04 January

        A very good point which I did not address. That single character is someone’s family name, and ideally the system should be able to capture it that way. Unfortunately, there are far too many systems which are not able to do so, so people are forced to transliterate it into a character set which the system can handle. Chinese characters can be transliterated in multiple ways, as you say, resulting in strings which can be difficult to match. As an aside: the origin of “Chow” as a romanization of the character is suggested by Wikipedia as originating from the Cantonese pronunciation of the name.

        The problem is exacerbated when we consider that the same character may also be part of a Japanese name written in Kanji, with multiple different romanizations based on different “readings” of the character: I gather that particular character may be read as Shuu, Amane, Kane, Susaki, Suzaki, and several others. This is why systems handling Japanese names may require fields to hold the name in written form (using Kanji characters) and in spoken form (using Hiragana characters). Without something like that it can be difficult telephone someone and ask a question as simple as “Am I speaking to …?”.

        The problem of not knowing how to pronounce someone’s name is not restricted to Japanese names, of course. One of my favourite examples is the English name “Featherstonhaugh” pronounced “Fan-shaw”.

      • nuria
        Posted at 08:17h, 23 February

        Even within the same system. I get a fair amount of Russian clients at work and in their *official* documents you see the same names transliterated differently.

  • HaHa You are Kidding, Right
    Posted at 22:44h, 30 November Reply

    Dont forget “John Null”

    A valid name, that some systems cant handle on exchanging data

  • danrooti7
    Posted at 01:09h, 01 December Reply

    Interesting – I hadn’t thought of some of these. One issue that comes to mind immediately given the guidelines above – how on earth does one properly _sort_ in such a system? Imagine a system that has multiple names for each person, each with a “given name” and “family name” and “preferred name”. It contains people of all cultures. In US culture “given name” is “first name” and “family name” is “last name”. In hanzi the whole name goes in the “given name” field? Now, the user needs an ordered list of all people. I can see Susie Q’s eyes glazing over trying to explain the above when “all she asked for was a report sorted by last name” 😉

    • Bernard Peek
      Posted at 06:38h, 24 February Reply

      The answer is that all data should appear in the first place that a searcher looks for it. That may well mean that a list contains multiple entries for one person. Number 14 wasn’t in the list I saw a few years ago until I mentioned seeing a bunch of Americans trying to find their names in a list generated from a Dutch database.

      • kjw
        Posted at 02:47h, 06 April

        Similar but different: how do you sort musical albums? sort by title? by artist? how do you sort by artist, when it starts with “The Band” or “A Group”. What happens when it’s not ‘A’ the indefinite article, but ‘A’ as a designation? i.e. ‘A B C’. Or “The The”. Good luck!

  • Natalie M. Amery
    Posted at 03:04h, 02 December Reply

    At one point I knew both a Mr Van Den Bos and a Mr van den Bos.

    I knew someone who used a middle initial when writing their name as Ms S P Example but didn’t actually have a middle name so was officially just Ms Susan Example (and used that form if asked for First and Last names).

  • Name
    Posted at 04:53h, 02 December Reply

    Please fill in actual examples for the “there are cultures where…” ones.

  • Eileen Quintero
    Posted at 05:52h, 06 March Reply


    Do you have any recommendations for what this implies for how “search” would work? For example, I think that if a person types Renee it should show results for this name with and without accents; however I have seen some systems that allow the INPUT of accents but then those are excluded from results (to get an accent in the result you need to type an exact match, which sort of defies the search button function).

    Thoughts? We are trying to design an inclusive name database and need to think about search also. Thank you!

  • J
    Posted at 04:21h, 22 March Reply

    “People have names.”

    If you expand this to something like “you know a person’s name(s)”, I think it becomes easier to find examples. Unconscious people brought to hospitals, people who are unable or unwilling to give their name to medical staff or law enforcement, genealogical trees and databases using records in which some names are unavailable or unrecorded, etc.

    • tony rogers
      Posted at 07:23h, 22 March Reply

      True, someone may have a name, but be unwilling to provide it. Or they simply may not have provided it to your application. Not the same as someone not having a name, but it raises similar problems.

      With the growing tendency to use an email address as someone’s “identity” in many applications we can find ourselves with that email address as a user’s “proxy name”. So we can find ourselves with strings like “wombat76359@gmail.com”, for example, as our only means of identifying a user. Still, an email address has a couple of advantages – it’s unique (making a good key in a database, and we can confirm that a user has some claim to it.

      • azurelunatic
        Posted at 06:44h, 12 November

        Only unique if you assume that the email address is used by only one person.

        Two examples I encountered when doing data entry last month, suitably redacted:

        hansonfamily@example.com – used for two adult memberships, presumably after the landline household communication point model

        bcrusher@example.com – adult membership for Beverly, kid-in-tow membership for Wesley, who wasn’t yet old enough to hold his own email address in the US but for planning purposes required his own membership and name label

      • azurelunatic
        Posted at 06:53h, 12 November

        Additionally, there is user error when the email is self-reported but not confirmed. My friend Nadyne was an early arrival to her email provider. She has an unreliable narrator list of some of the other Nadynes in the world who seem to believe that if they say their email address is nadyne@example.com often enough, it will become true.

      • tony rogers
        Posted at 07:46h, 12 November

        By unique, I mean that there is exactly one email account (mailbox if you like) which corresponds to that email address. Sure, you can have multiple people using the same email address, but that’s not the same as having a hundred people all called “Tom Jones”.

      • Hennes
        Posted at 23:50h, 17 November

        email is sadly not unique, when people switch providers they get a new mail and the old email may be handed out to a new client of the old ISP. Granted, that was more common in the 1990 when resources where limited and there are few reason for ISP not to mark the old one as reserved, or even forward mail. But you cannot guarantee that.

        That means we may end up with:
        (unique user 1)
        login user1.isp1.tld
        Old email user1.isp1.tld
        Mail needs to be sent to user1.differentISP.tld

        Same for phone numbers. I even have a practical example for that.
        I got a new phone and migrated my old number to that. THe telefom provided provided my with an account where the loginname was choosen by them and which is the temporaily number which I had for 5 days. It is not the phone number on which I am reachable.

        So basically and contact details should not be ID details.

      • tony rogers
        Posted at 07:45h, 18 November

        True – I have a similar experience, in that I am getting spam for the previous (5 years before I got it) holder of a business email address, but at a given time, a given email address is unique.

        You are right in pointing out that an email address is not permanently unique, but there is no identifier which is permanently unique.

      • Jens
        Posted at 21:22h, 09 June

        About uniqueness of email: I was a freelancer and had 2 accounts at a vendor: a company one and a private one, to separate them for tax purposes. Both had the same contact email. After a while the vendor changed from using customer numbers to email addresses for identifying customers. Chaos.

      • tony rogers
        Posted at 09:38h, 22 June

        I’ve encountered another system that assumed they could use email addresses as user names, ignoring the fact that some customers might be using the same email address on multiple accounts. It’s short-sighted, and a mistake, but it’s not really about uniqueness of emails – it’s about failing to understand customer identity (and that is a topic that could justify another blog!).

  • Peter Crabb
    Posted at 21:52h, 02 April Reply

    Things seem to have gone backwards since I first went to work as a clerk in the early 70s. Then, with a clunky IBM mainframe, we identified and recorded a customer’s name in several formats so that we had the full formal name, with appropriate titles, and the format by which correspondence was to be addressed.

    A simple example of poor system design was the hospital that my mother attended in the last years of her life. Nurses insisted on calling her by the first of her two given names. Two of her four sisters shared the same name and all were known exclusively by their second names. It was all rather confusing for her in the early stages of dementia to suddenly be given a new name.

    • tony rogers
      Posted at 08:56h, 05 April Reply

      There are cultures in which it is common for children to share a first given name – Maria is a common case. So you might have Maria Clare and Maria Beatrice, with the children normally known as Clare and Beatrice (and the force of their full names saved for when they were misbehaving 🙂 ).

      The sensitive way of dealing with this is to have a “known as” field, which can also be used in cases such as a man whose official name is John, but who is always called Jack.

  • gorn
    Posted at 22:04h, 23 May Reply

    Very good article!

    I however disagree with recommendation to replace “first name” with “given name” which is against several observations made. My suggestion would be to use one field only “name” (or “full name”) and preserve it as is.

    If you really need to do some sorting, than the only one hope I see is to introduce whole new field “sorting name” (“surname” in most cases).

    • tony rogers
      Posted at 06:42h, 24 May Reply

      I agree with the idea of using a single “name” or “full name” field, but where someone wants / needs to refer to a person’s given name, I strongly recommend the term “given name” over alternatives. The term “first name” presumes the order of the parts of the name, for example.

      Sorting names is problematic. To establish a complete order on names requires more than the surname. Asking a person to enter their name in a way which will produce a good sort is difficult (few of the people entering their names will have librarian training!). Moreover, if people are able to enter their name in Unicode, we will be faced with some entering a Chinese name in Hanzi, and others entering it in Roman characters, using Pinyin or other transliterations. Sorting these together is challenging.

  • nerosnm
    Posted at 20:50h, 20 July Reply

    Really well written article. One addition that could be made to the last point: rather than trying to break someone’s name into parts in order to address them by their given name only, just ask them if they have a preferred name! If they don’t enter anything, use their full name.

  • Chakat Firepaw
    Posted at 12:38h, 14 August Reply

    Here’s an example from the other side for #37: There is an entire cottage industry in the US based around comparing voter databases with other official records looking for discrepancies¹. Said discrepancies are then used to try and justify changes aimed at gaining electoral advantage.

    Of course, I also have an example of “more than one name known by,” and an atypical name structure. I’m Firepaw some places online, (it’s a ‘what-who’ structure, Firepaw being a Chakat), and Rick {Surname}² others and offline.

    1: Such as the voter rolls having “Bob Q Smith” and the DMV having “Robert Quincy Smith, Jr”.

    2: No, my surname isn’t {Surname}. That’s just a placeholder in stead of the real one.

  • W. Wesley Groleau (伟思礼)
    Posted at 09:07h, 08 October Reply

    #40 is a myth. But here’s one to replace it: A name can’t be only a single character. Counters:
    Harry S Truman
    U Thant
    W. Wesley Groleau (the accursed forms won’t allow the dot, so, …)
    But I can’t be too hard on my fellow programmers—I’ve seen print forms where the zip code field is bigger than the name field.

  • CJ Dennis
    Posted at 15:43h, 20 October Reply

    Points 29 and 32 need to be fleshed out properly.

    29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
    And will your software only be dealing with people named by your society?

    That’s a poor response and misses the point. For example, is there a single standard for English names? If not, point 29 is disproved. E.g. my eldest brother was deliberately named so that his middle name is what he’s called. His first name is only ever used on official documents, and by telemarketers which is useful for him to identify phone calls he doesn’t want to continue! The rest of us use our first names as our common names.

    32. People’s names are assigned at birth.
    Births are recorded in most countries, but the effectiveness of the system varies.

    Traditionally, Jewish children are not named for the first 8 days of their lives. This is different from being given a name and having it not immediately recorded.

  • Some Name
    Posted at 20:03h, 05 December Reply

    In Norway, it is possible to change the full name without any reason or charge to any other name. I could go by Tor Jonson today and by Birk Nilsen tomorrow.
    Also, especially in certain European countries, it becomes more and more common for the husband to adopt his wife’s family name.

  • simpleduckman
    Posted at 02:29h, 10 April Reply

    That E. E. Cummings preferred his name decapitalized is very much in dispute. It’s probably not a great example here.


  • Erica Ginter
    Posted at 03:56h, 30 December Reply

    There’s also the problem of insufficient “room” for a long name. In high school I was Erica Van Dommelen, which was one character too long for the school’s system. And names were entered last name first, so I became Van Dommelen Eric, a problem when gym and health classes were concerned.

    • tony rogers
      Posted at 09:15h, 04 January Reply

      That’s an example of a poor form design resulting in an unfortunate outcome for you. Sorry to hear that, but it does make a good example of the impact of poor name handling.

      Entering names “last name first” makes an assumption about the order of the elements of the name (surname vs given name). At least they handled the surname “Van Dommelen” as a unit.

      Also an assumption of gender based on name, but that’s a separate issue, and one which has become much more sensitive now.

  • Jerzy
    Posted at 13:06h, 05 January Reply

    For #21 through 23: A few cultures use birth order names. For example, AFAIK, (parts of) Balinese names literally mean “eldest” and so on. And they do not generally use family names (though it’s more complicated than that) — so if you only ask for given and family name, you’d get a *lot* of people who appear to be named “eldest.”

  • Pingback:Exploring How To Build Non-Discriminatory Web Applications | Matthew Robbins Kirby
    Posted at 09:13h, 24 March Reply

    […] I’ll never do justice explaining just how wrong you probably think about names. You need to read Patrick McKenzie’s famous 2010 article “Falsehoods Programmers Believe About Names” and then follow that up by taking a look at Tony Roger’s 2018 article “Falsehoods Programmers Believe About Names – With Examples.” […]

  • Aaron T
    Posted at 09:23h, 08 November Reply

    First I fully agree! Sadly there is are so many problems with technically correct and consumers expectations – especially if you’re dealing with the United States. For years I’ve tried to use `given name` and `family name` but have gotten so much push back from Amurikans on what those mean. I’m now just using a single field of `name` which allows for unicode, vanity case, et al. but that has its own set of problems when customer service people see characters they cannot pronounce or are unsure how to address someone. I’ve been considering if there is a graceful way to ask for a persons `name` and `what should we call you?` but I have yet to find something that meets with a marketers perspective on targeting 98% of the United States.

  • Pingback:Gnarly Learnings from November - The Gnar Company
    Posted at 05:26h, 23 December Reply

    […] What's in a name? Falsehoods That Programmers Believe About Names […]

  • Alyssa
    Posted at 05:52h, 23 February Reply

    More examples specifically surrounding transgender people:

    1., 2., 3., 4.
    Many trangsender people, when transitioning socially, will have a period where they go by one name with those who are aware of their transition and their previous name with those who don’t. This may mean multiple accounts.

    It is extremely common for transgender people to change their names. Systems which do not allow for changing names or permanent linking to other systems that do not allow for changing names can be actively hostile to the experience of a trans individual with a significant history on that system, as association of themselves with the former name can be painful. This also extends to concepts like immutable usernames or email-only login.

    Gender transition isn’t a specific event, but is a long-term period spread across years. Usage of a new name may be sudden across all social groups, or gradual, with some groups using the new name before others. You may be able to specify “gender transition” as an enumerated event for your system, but it will only be from the perspective of your system and not the individual.

    Trans people usually have a name assigned at birth. Many trans people call this their “dead name”, as it can carry a painful history and should not be used to refer to them anymore.

    My personal experience with US government systems violate this – I had to send name change info to every agency separately.

    As an aside, the article would benefit from changing “he or she”-like constructs to the singular “they”. Not only is it cleaner, it is more inclusive.

  • lee
    Posted at 17:58h, 23 February Reply

    you know, with a brazilian first name followed by a german surname and a korean surname, i do feel mostly assired that my name is unique 😜

    (jokes aside, this is a great list. i had never considered that there are people who do not have names)

  • Mark Johansen
    Posted at 04:35h, 16 April Reply

    I don’t dispute any of this, but the next question is, “What do we do about it?” Like I’ve seen several people above say that, as not all cultures have “first” and “last” names and that family name is first in some and last in others, etc, that therefore we should just create one long field for “full name”. Yes, that’s more general. But it also means that the routine requirement of producing a list sorted by last name cannot be met, and that searching for people by last name is problematic. If your system is only used in the United States, and only a fraction of 1% of the people listed in your system will have names that don’t follow American English conventions, you’re giving up a lot of convenience and flexibility to handle a tiny number of outliers. If I was building a system for use in China I might well work on the assumption that the majority of names will meet Chinese conventions. Etc. It is perfectly rational to say, Let’s handle the 99.9% as cleanly and efficiently as possible and accept that the remaining 0.1% will be exceptions.
    In some cases I don’t know what we can do besides say, Yes, that’s a possible case, but there’s just no way to handle it. Like the article mentions that not all names can be represented in Unicode. True, of course. So what do you propose doing about it? How would a user even enter a name into the system that cannot be expressed in Unicode? Well, I suppose we could scan it in as a graphic, but that would be very difficult to manipulate. There are times when you just have to say, Sorry, that’s out of scope. We give up.

    • tony rogers
      Posted at 09:23h, 19 April Reply

      I believe you would find that the number of people in the United States with names that do not follow “American English” conventions is much higher than you think. If nothing else, consider the large proportion of the population for whom Spanish is their first language (or it was their parents’ first language). The Spanish name format uses a pair of surnames, and the one you should use if you wish to address them with an honorific and a surname is not the last, but the second last. So if you given them a “last name field”, do they enter just one surname, or both?

      Why should a list of names be sorted by “last name”? In some countries (Iceland, for example), the telephone directory is sorted by given name (because surnames vary within a family). I know it has been conventional to sort by surname, but what function does it serve? Librarians sort book catalogues by authors surname, and they strike an interesting problem – in some countries the surname “Van Gogh” is sorted under V, while in others it is sorted under G – so there is no way to produce a list which is “correctly” sorted by last name for every country (the solution adopted by many librarians is to sort by the convention applicable to the country in which the author was born – so you might find Pieter van Gogh sorted under V and Wilhelm van Gogh sorted under G if they were born in different countries).

      So you ask “What do we do about it?”. The best I have seen (I make no claim to having solved this problem) is to have one field for the full name (perhaps labelled “What is your legal name?” or “Full name”), and additional fields: “How do you want to be addressed?” (allowing me to request to be called Tony, despite my legal name being Antony), and if you must sort by surname, “What is your surname?”.

      Some other points I’d like to make are:

      • do not restrict name length to a small number – there are full names that are over a hundred bytes
      • do not expect to assemble a name from parts by separating them with spaces – Chinese names contain no spaces
      • do not use terms like “first name”, “last name”, “christian name”, “family name” – use “given name” and “surname”, and beware of names that contain multiple surnames (and multiple given names, for that matter)

      Name handling is hard. Pretending that everyone in your country has a name that follows the same pattern does not make it simpler, it just makes your system inflexible.

    • Yuri de Groot
      Posted at 11:37h, 07 January Reply

      Have two fields – one for their full name and one for their sorting name.
      For example, my full name is Yuri Benjamin de Groot. My family name is de Groot and my sorting name is just Groot.

  • jay
    Posted at 01:05h, 20 April Reply

    Well, I disagree.”Why should a list of names be sorted by “last name”?” Because our users expect it. There are times when I have to tell the client that it’s just not possible to do what he wants. But to tell a client, “I’m sorry, but it’s not possible to sort a list of customers by last name” … obviously they know it is possible, because people do it all the time. I’m not going to refuse to meet a client requirement because IF their store in Kansas had a bunch of Kuwaiti and Lesothoan customers than it wouldn’t work.
    I’ve built systems that were primarily for use in the US for decades, and having fields for first and last name has worked very well. Do some number of immigrants struggle with it? Yes. If someone had a solution that worked for the 1% without being awkward for the 99%, I’d be happy to hear it.
    I see someone on here suggested having fields for “Full name” and “Sort name”. Sorry, no. Are you really going to ask your customers to enter their “sort name”? Most people would wonder what that meant. What happens if a customer misunderstands and enters something inconsistent with the full name, like they say their full name is “Sally Jones” and their sort name is “Jnes”? Or “Yes”? Or “George”? You’d have to have instructions on the screen to explain what you mean, and then hope the user actually reads and understands the instructions. At best it would make the seemingly simple task of typing in your name suddenly become complicated. And please don’t tell me that any reasonable person would instantly understand what you mean and enter it correctly. I’ve seen some pretty crazy data entry errors with far more conventional fields. (Like the guy who typed his entire address into the zip code field …)
    I certainly agree we should be as flexible as possible. There’s a system I interface with that has edit rules like “a name cannot include any digits” and “name must be at least two characters”. Most names would meet those rules, but I don’t see any point in rejecting a name that doesn’t. MAYBE it’s a data entry error … or maybe the person has an unusual name. And yeah, I’m always cautious about making fields too short. But what is “too short”? 5 characters is surely too short for a full name. Is 40 enough? Probably not. 100? 200? It might be nice if you could make it unlimited. But then what if someone pastes 20k of text into the name field? I am suddenly reminded of a system I worked on years ago for an insurance company where they had blocks for each child in a family. They allowed for 8, because someone said, “no one would ever have more than 8 children”. Then we had a family with 9 children sign up. Was that so hard to see coming?

    • tony rogers
      Posted at 08:20h, 20 April Reply

      Making the name fields longer (at least 100 characters, and consider 150) is a start, and will address some of the problems. Yes, avoid excluding characters (digits, punctuation, special characters) from names, just as you say.

      I strongly recommend not using the labels “first name” and “last name” – someone from a culture where their surname is listed first may (correctly) enter their surname in the “first name” field, and their given name in the “last name” field. Then your “sort by last name” will result in sorting them by their given name.

      One suggestion I have seen, and liked, was an optional field labelled “How do you prefer to be addressed?” – a user can skip it, or enter a variation of their name (someone whose given name is “John” may prefer to be called “Jack”; someone whose given names are “Mary Josephine” may prefer to be called “Jo”). This makes addressing a letter or email to them more personal, and makes it easier to get around some name issues.

      My objective in writing the article was to provide a greater understanding of how complicated name handling can be. I have learned even more from reading some of the comments – they raise further problems to consider.

      If we know some of the possible problems that can arise in name handling, then we can try to design systems that handle names better, even if we choose not to make the system handle everything. Names are personal, and people can be deeply offended if their name is not handled correctly.

  • Pingback:Vom Rechner in die Welt: Notizen zum ersten großen Import in Factgrid – Katharina Brunner
    Posted at 23:40h, 20 May Reply

    […] Grundsätzlich das Arbeiten mit Namen und unsere impliziten Annahmen und Gewohnheiten dazu prädistiniert, automatisierte Fehlklassifikationen zu generieren. Die Fehlerquellen sind vielfältigst und machen keine Lust auf automatisierte Verabeitung. Wer mehr über die faszinierende Welt, der Falschannahmen zu Namen wissen will: Falsehoods Programmers Believe About Names – With Examples […]

  • Alara Rogers
    Posted at 07:39h, 21 May Reply

    I knew a guy from Canada whose name was, as an example, Joseph Matthew Steven Lastname. His father was Joseph Matthew Brian Lastname, his brother was Joseph Matthew Evan Lastname, you get the picture. He, of course, went by Steven, because that was the only part of his name that differentiated him from his family members.

    But on his passport his name was Joseph Matthew, and that’s what all of his American employers called him, because the passport field couldn’t support *two* middle names.

  • Julie Meridian
    Posted at 08:45h, 22 June Reply

    This is why we need best practices. There are plenty of problems to point out — time to start sharing solutions. Contributions welcome! http://makeitlegit.github.io/ProperName/

  • Mark
    Posted at 00:55h, 23 June Reply

    I agree that the labels “first name” and “last name” are problematic when some cultures (like America) put given name first and then family name, while other cultures (like China) put family name first and then given name. But then that leads us to the question, How do we want to sort? (It occurs to me that, assuming we want to sort by family then given name, the Chinese system is easier to work with. Like I prefer expressing dates as year-month-day because that’s easier to sort. But sadly, most cultures are not willing to change their cultural conventions to make life easier on programmers.)
    I think people add a lot of unnecessary complexity to computer systems by trying to handle rare edge cases. That applies to names and many other things. If I’m building a system for use in the United States where 99+% of names will be “American style” with first, middle, and last names, then it makes sense to primarily code for that and just try to make some provision for the rare cases that don’t fit — accept that they may not work quite right but at least they won’t blow up. If I was building a system for use in China I would have very different requirements. But if I was building a system for use in the US, it would be unproductive to spend a lot of time coding for Hatusi names, when the odds are that we’ll never see one.
    My company develops systems that are used all over the world and systems like that have to be more flexible and comprehensive. I’m thinking of one in particular that I was just working on yesterday that has version in English, Spanish, German, Chinese, Korean, and a bunch of other languages. I don’t think we’re comprehensive enough on that one. Indeed most of our users are from China but the system is really geared to American names. Because of the client system that we interface with, we can’t even store letters that have accent marks. At one point I wrote code to accept the accented letters and convert them to “plain Latin letters”. I was proud of the code but I think it was a counterproductive requirement.

    • tony rogers
      Posted at 07:04h, 23 June Reply

      You are not the first person to assert that 99.9% of names in the United States are “American style”. Permit me to doubt that. We hear about large numbers of people from Spanish-speaking and Portuguese-speaking countries entering the US, bringing with them names that have two surnames, for example. I have worked with teams from the US with members who had names from China, India, Vietnam, Russia, and many more countries. Many of their names can be coerced into systems like those you describe, but that’s forcing the people to fit the system, not the other way around.

      There was a time when people entered the US with surnames like “van der Bilt” and “van den Berg”, and had their names changed to Vanderbilt and Vandenberg. Some of these would be accustomed to having their names filed under B, not V.

      But I think you understand the problems.

    • lordmogul
      Posted at 02:05h, 01 October Reply

      Plain latin letters is a good example how how easy and obvious are not the same as correct.
      My family name contains umlauts, and for those just removing the diacritics into “plain latin” letters would be changing the pronunciation. The correct way would be to turn them into a pair of vowels (ö -> oe, ü -> ue, ä -> ae)
      That can be especially chaotic when there is a name with umlaut and one without. Just turning them into ASCII letters would make two different names the same.

      • tony rogers
        Posted at 05:50h, 03 October

        Agreed. Transliteration ignoring accents is not ideal.

        However, bear in mind that names are entered into the system by people other than their owners, and it’s entirely possible that your name might be typed by someone unable (or unwilling) to add the umlaut. So we’d want to match the correct version of your name with the incorrect version (as a less than perfect match). Whether we do that by transliterating the correct and incorrect versions to the same thing (not the best option, because they would match with a higher score), or by transliterating to a diphthong, is a detail that may be culturally dependent (just as sorting methods depend on collating sequences that can be culturally significant).

    • RJ
      Posted at 14:34h, 17 September Reply

      > It occurs to me that, assuming we want to sort by family then given name, the Chinese system is easier to work with… easier to sort.

      If only it were so! There’s no single canonical way of sorting Chinese characters.

      Chinese dictionaries typically offer multiple indices for looking up characters: by stroke count, by stroke count + stroke order, by radical (components of characters), and by pronunciation – Pinyin in mainland China, Zhuyin/Bopomofo in Taiwan, and some Hong Kong dictionaries provide a Jyutpin (Cantonese transliteration) index as well. Mandarin Chinese is written in Traditional form in Taiwan, Hong Kong and most of the Chinese diaspora, while it’s written in Simplified form in mainland China, and the way a character is written can be very different between these forms, leading to different ordering by stroke count/order and radical.

      (Incidentally, Jyutpin is another falsification of “names don’t contain numerals”: the actor Jackie Chan’s stage name 成龍 is correctly transliterated from Cantonese as Lung4 Sing4, and would be recorded as such in any system that uses Jyutpin.)

      To further complicate matters, other languages use Chinese characters and have their own sorting system. Japanese uses the gojūon system of sorting words by their kana (Japanese syllabary) representation. Similarly in Korean, hanja characters are ordered by their hangul (Korean alphabet) representation. The way to write a personal name depends on where the person is from: in South Korea, most personal names are formally written in hanja, while North Korean personal names are formally written in hangul.

      All three languages write the surname first, but in western contexts, it’s customary to transliterate Japanese names with given name first. 小野 洋子 in Japan is Ono Yoko, surname first; in the US it’s Yoko Ono, given name first. The kanji are never swapped though, even if the transliteration is!

      Singapore is its own thing: many people have both a Chinese and an English given name, and in some contexts Singaporean names are written in a way that respects both orderings, with the surname in the middle. The 7th president of Singapore’s full name is written Tony Tan in English, 陈庆炎 (Tan Keng Yam) in Chinese, and Tony Tan Keng Yam in bilingual contexts. Singapore is multi-ethnic, and the dual naming only applies for Chinese people – his successor Halimah Yacob may have her name transliterated into Chinese characters, but her name is not Chinese.

      So if you’re given some names that are written in Chinese characters, and asked to sort them… yeah, good luck.

  • Mark
    Posted at 01:03h, 23 June Reply

    RE uniqueness of email addresses: Sure. Something that regularly concerns me is when developers assume some field or combination of fields will be unique. Like one system I worked on many years ago, they generated the ID for a user record by concatenating first 7 letters of last name (I think it was 7, whatever), zip code, and birth date. The lead developer said that the probability that two people would have the same last name, zip code, and birth date was remote, so this should be unique. I pointed out that not only could this happen by chance, but in fact there were cases where it would happen not by chance: What about twins? They’d have the same last name, the same birth date, and if they both still lived in the town where they were born or nearby, the same zip code. Of course I was ignored. I left that company soon after so I don’t know if we ever got such a case, but, etc.
    Likewise I’ve seen many cases where someone cobbles together some collection of fields and then adds a sequence number to insure that it’s unique. Except … if you’re going to add the sequence number to insure it’s unique, why not just forget about the rest and just use a sequence number?

    • tony rogers
      Posted at 06:46h, 23 June Reply

      And sometimes such a scheme produces an identifier that the person finds offensive – I will not give examples, because they are mostly rude words.

      Birth date is quite sensitive PII, too – including it in an identifier is a bad idea (although I’ve seen it done in a doctor’s surgery).

  • Boudah Talenka
    Posted at 23:43h, 08 January Reply

    Amazing read!

    I have seen also prefixes *included* in names such as gender, titles (nobiliar, academic, religious, etc.), birth weekday (tibetan), punctuations (coma, official aliases between quotes), mixed alphabets inside an element of the name, paranthesis for futur naming (when you get your name at nubile age for example).

    In some countries where slavery has not been terminated, your surname is the one of your owner, or the figure of the case you sleep in.

    Not to recall those who was born in concentration camps or asylum in nazy germany who had nothing but a number as a name. Some survivors had insisted to keep it on their ID, for no one to forget or forgive.

  • chadrum
    Posted at 07:40h, 08 February Reply

    My guess for the Klingon example would be an online forum where people role-play as Klingons and the forum account system needs to work for those names.

  • kjw
    Posted at 02:54h, 06 April Reply

    #20 I had a friend who legally changed their name from “first last” to just “name”. (i.e. a single word). They managed to get their California driver’s license changed to that, so at least one system handled that ok.

Leave a Reply

%d bloggers like this: