Things Aren't Nothing (By Default)

Posted: 2015-11-08 Updated:2015-11-18 | Categorized in theory

This topic surely has been beaten to death and back, but recently I’ve noticed a strange phenomena: people are arguing against languages introducing optional types and banishing the default-null concept by suggesting that these languages ignore the fundamental fact that you sometimes need a “nothing.”

Addressing the elephant

Let’s start off with explicitly acknowledging that the individuals saying this are right about the concept of nothing being required to accomplish many useful things in programming. Let’s also remind everyone of the fact that no one (that I’m aware of) is actually suggesting that we disallow that concept from existing. If such individuals exist, I have to claim them to be on the most extreme of fringe of this subject. The only thing I’m aware of is that people want to make the implicit run-time error of failing to check a null reference a thing of the past (implicit run-time errors are a Bad Thing™).

Consider the obvious: the proponents of banishing nullability by default are always showing off and highlighting things like Option<T> monads or wrapper types. This is not really about suggesting we remove it from the base language (although certainly that is said in favor of keeping the base language simpler), but ultimately those alternatives are shown as simply a different vocabulary to describe the same underlying concept. You could just as well call it Nullable<T> and only allow null to be assigned to variables of that type. In actuality, that’s the way a lot of languages are trying to go (C# 7.0’s T! & T?, for instance … Note: the current T? syntax only works with value types).

Any time you have a null in your code now, the idea is that you would just use the new vocabulary to describe it (even it’s reading null fields from a database or mallocing low-level memory and getting whether the pointer is 0 or not) and reap the rewards of the compiler giving you compile-time errors instead of your program giving you run-time ones in corner cases or places where your test coverage is lacking.

The idea that individuals rallying against the default-null are trying to eliminate a useful and necessary concept is simply a straw-man. We want it too, we just don’t want it to be by default and we really want to prove a subset of our code to be correct so our brainpower can be better spent figuring out things the compiler couldn’t easily prove on its own.

The Inherently Conceptually Incorrect Default Nullability

Let’s explore one of the main reason why null by default causes so many issues. I think the topic of the numerous issues it causes has been covered quite enough at this point (it would surprise me to find many that aren’t aware of Tony Hoare’s Billion Dollar Mistake or various online discussions of such). I so often see discussions centering around the fact that checking for null everywhere (one of many attributes of Defensive Programming) is verbose, obfuscates the intentions of the programmer, and furthermore incurs performance cost due to the (occasional) constant rechecking of particular properties of some variables. That said, another important aspect that explains why it’s problematic isn’t touched on as much.

Let’s establish one important fact about programming: writing good code is primarily focused on communicating to the guy that follows you. This isn’t some sort of relevation at this point, but let’s restate it anyway.

What kind of communication do you feel type-systems give you? For me, it’s a communication that says “this variable has a T in it.” An interface definition is nothing more than a declaration that “if you give me an X and a Y, I’ll give you a Z.” Let’s continue on this path by imagining a simple exchange between two human beings in a “more natural” setting.

How about an apartment tenant and the landlord?

The landlord can be said to have an interface of PayWith(Check check) (and the return type is unimportant for our discussion).

In the normal case

An apartment tenant writes a check and gives this check to their landlord. The landlord takes this check, cashes it, and everyone in the entire world is happy.

For evil

Consider the case where a tenant gives the landlord a fake check with an invalid (perhaps all 0) check number. The landlord has a check that references an invalid bank account. The landlord attempts to cash the check, the landlord goes to jail, and everyone in the apartment complex becomes homeless including the guy who wrote the bad check. Maybe one of those tenants would have cured cancer one day, but now his life is off-track and millions will die that could have been saved. One of those millions was the very person that would eventually save the world from being hit by a world-destroying meteor. Despite the fact that the jerk that gave out an invalid reference still became homeless, millions of people die from cancer and eventually all life on Earth is extinguished.

Okay, that’s a bit extreme and one-sided. Let’s forgive that and move on.

The landlord may or may not perform sufficient verifications to catch such an issue before they get to the bank, but I believe that it’s clear that the tenant has violated a communicated requirement of paying their landlord.

The violation is that everyone expects it to be implicit that you’re communicating that you’re giving valid references for things

Sure, you could make the claim that the landlord should verify the checks more closely before they get to the bank and have to throw an exception, but maybe that’s only because you expect bad/illegal behavior on behalf of people in general. Code that you write and expect others to consume should avoid bad/illegal behavior in order to be an effective communication tool. Defensive programming is the chosen method used to combat this in times where you have no other choice, but a programming language that expresses the intent and communicates better should do a better job not requiring you to perform such activities.

With the right language constructs

What if we said the landlord had an interface of PayWith(Check? check) instead? It’d be like the tenant coming up and saying “Hey, I maybe have a valid check here, winky face, nudge nudge.” The landlord certainly has enough of a hint that before they cash that check at the bank’s CashCheck(Check check) (in fact, in a well-designed language, it should be impossible to make such attempt, but we’re looking at just the communication made right now) they’ll need to check the check’s reference at the appropriate CheckCheck(Check? check) -> CheckCheckResult (or, alternatively, YoDawg) method at the bank, assuming it’s not obvious that they could verify themselves.

Still not make sense?

Here’s a simpler example that you can try at work: Walk up to a coworker at the office and ask “Critique my business card?” When they say sure, walk away.

async Critique(BusinessCard card) -> Task<BusinessCardReview>

Are they confused as to why you gave them nothing? Would you be in the same situation? Are you using your code to say the same sort of things to your fellow programmers? Maybe you should be trying to make your code explicit about its requirements. But then again, maybe it’s difficult to communicate that idea because of the default-null problem.

Summary

Indeed, a programming language that makes errors as described above impossible makes natural communications between programmers much easier and much clearer; it is simply intrinsically better for writing good code. You don’t lose the ability to declare something as potentially having nothing as some have said. If you intend on an interface handling nullable references and handling the “nothing” case, you can annotate it as such (I don’t intend on making recommendations for syntax, but sure a lighter-weight syntax for the concept is fine) and the communication is established.

Otherwise, you can communicate that the caller is expected to give a valid reference (which is actually technically impossible to do with a default-null language if you’re expecting some reasonable level of compiler assistance). This isn’t to say that this is all you need to cover. For instance, it might be nice to have a compile-time proof that the tenant verified the balance shown on the check and could only write it if their bank account had the correct running balance at the time and that the balance is what they owe to the land lord.

That said, it is an important component to a larger whole. Still many solutions try to do a better job of generalizing covering all cases when getting rid of the default-null, hence why you see discussions about monads and using wrapper types.

P.S. One last thing

“But why not just use a wrapping type of NotNull<T> in a language with nullable by default?” (This has been covered a thousand times as well, but it’s here just for reference)

Because:

It goes against conventions established in that language by the existance of that default, so you’re only going to be using it in your specific user code and must write interfacing code between the standard library and your code, meaning you still are suffering from the cluttering of your intention which means you don’t realize as much benefit as you could from it
- Some languages, like Java are implementing their own standard library version of Optional<T> that helps with this (the case where the Optional<T> contains a value is required to be NotNull), but I find it too little too late as the general consensus is that they went with this approach to maintan backwards compatibility but backwards compatibility implies that you’re still going to have to write the interfacing code with the code you’re backwards compatible with. Also, this approach really doesn’t handle the fact that you often want to express the existance of the value, but there’s @NotNull annotations for that, I think.
Everyone has reinvented the concept a thousand times in that language and they’re all incompatible implementations anyway requiring glue code everywhere for that too further cluttering your original intention, yet again meaning you don’t get the full benefit
- Again, Java’s standardization of Optional<T> years after-the-fact does nothing for the existing code, so it’s still a valid problem that will still exist for years.
More often than not, you’re handling references to something and it doesn’t make sense to have the more verbose syntax be the more common case because it clutters up your original intention
It’s rather useless to represent an NotNull<T> that can be null itself. It doesn’t give you the compile-time guarantees that someone hasn’t passed you a null NotNull<T> (exactly the problem trying to be solved).
- Yes, there exist tools to do such static analysis (and static analysis tools should be used in general), but improving code-quality across libraries you use necessitates compiler intervention because someone else might not be using the same static analysis tools (or any for that matter) as you, so covering a really common issue in the compiler/language definition just makes good sense.

You can successfully emulate a default-null language in a language that has made the correct choice of not having it by default. The other way just does not work. And it’s really impossible to correct the mistake in the language later.