Productive Rage

Dan's techie ramblings

A static type system is a wonderful message to the present and future - Supplementary

This is an extension of my post "A static type system is a wonderful message to the present and future. I initially rolled this all together into a single article but then decided to break it into two to make the first part easier to consume.

So, what else did I want to say? Rather than just saying "static typing is better", I want to express some more detailed "for" and "against" arguments. Spoiler alert: despite the negatives, I still believe that static typing is worth the effort.

FTW

I find that the more that I take advantage of the type system, the more reliable that my code becomes - not only in terms of how well it lasts over the years, but how likely that it is to work the first time that it compiles. Going back to some code that I wrote a few years ago, there are various parts of a particular project that deal with internationalisation - some parts want to know what language that particular content is in while some parts of more specific and want to know what language culture it's in; the difference between "English" (the language) and "English UK" / "en-GB" (the language culture). I wish now that, for that project, I'd created a type (in C#, a struct would have been the natural choice) to represent a LanguageKey and another for a LanguageCultureKey as I encountered several places where it was confusing which was required - some parts of the code had method arguments named "language" that wanted a language key while others had arguments named "language" that wanted a language culture key. The two parts of the project were written by different people at different times and, in both cases, it seemed natural to them to presume that "language" could mean a language key (since nothing more specific was required) or could mean a language culture (since they presumed that nothing less specific would ever be required). This is an example of a place where better argument naming would have helped because it would have been easier to spot if a language culture key was being passed where a language key was required. However, it would have been better again if the compiler would spot the wrong key type being passed - a human might miss it if a naming convention is relied upon, but the compiler will never miss an invalid type.

Another example that I've used in the past is that of React "props" validation - when creating React components (which are used to render DOM elements.. or OS components, if you're using React Native), you must provide specific information for the component; if it's a Label, for example, then you must provide a text string and maybe a class name string. If you're using JavaScript with React then you will probably be providing the props reference using simple object notation, so you will have to be careful that you remember that the text string is named "text" and not "labelText". The React library includes support for a "propTypes" object to be defined for a component - this performs validation at runtime, ensuring that required properties have values and that they are of the correct type. If a strongly-typed language (such as C#) was used to create and consume React components, then this additional runtime validation would not be required as the component's "props" class would be declared as a class and all properties would have the appropriate types specified there. These would be validated at compile time, rather than having to wait until runtime. Returning to the "Sharp Knives" quote, this may be construed as being validation written for "other programmers" - as in, "I don't want other programmers to try to use my component incorrectly" - but, again, I'm very happy to be the one of the "other programmers" in this case, it allows the type system to work as very-welcome documentation.

While we're talking about React and component props, the React library always treats the props reference for a component as being immutable. If the props data needs to change then the component needs to be re-rendered with a new props reference. If you are writing your application in JavaScript then you need to respect this convention. However, if you choose to write your React application in a strongly-typed language then you may have your props classes represented by immutable types. This enforces this convention through the type system - you (and anyone reviewing your code) don't have to keep a constant vigil against accidental mutations, the compiler will tell you if this is attempted (by refusing to build and pointing out where the mistake made).

The common thread, for me, in all of the reasons why static typing is a good thing is that it enforces things that I want (or that I require) to be enforced, while providing invaluable information and documentation through the types. This makes code easier to reason about and code that is easier to reason about is easier to maintain and extend.

What static typing can't solve

It's not a silver bullet. But, then, nothing is. Static typing is a valuable tool that should be used with automated test in order to create a reliable product.

To take a simple example that will illustrate a variety of principles, the following is a LINQ call made in C# to take a set of EmployeeDetails instances and determine the average age (we'll assume that EmployeeDetails is a class with an integer Age property) -

var averageAge = employees.Average(employee => employee.Age);

If we were implementing the "Average" function ourselves, then we would need to populate the following -

public static int Average<T>(this IEnumerable<T> source, Func<T, int> selector)
{
}

Static typing gives us a lot of clues here. It ensures that anyone calling "Average" has to provide a set of values that may be enumerated and they have to provide a lambda that extracts an integer from each of those values. If the caller tried to provide a lambda that extracted a string (eg. the employee's name) from the values then it wouldn't compile. The type signature documents many of the requirements of the method.

However, the type system does not ensure that the implementation of "Average" is correct. It would be entirely possible to write an "Average" function that returned the highest value, rather than the mean value.

This is what unit tests are for. Unit tests will ensure that the logic within a method is correct. It will ensure that 30 is returned from "Average" if a set of employees with ages 20, 30 and 40 are provided.

The type system ensures that the code is not called with inappropriate data. If you didn't have a static type system then it would still be possible to write more unit tests around the code that called "Average" to ensure that it was always dealing with appropriate data - but this is an entire class of tests that are not required if you leverage static analysis*.

Unfortunately, there are limitations to what may be expressed in the type system. In the "Average" example above, there is no way (in C#) to express the fact that it's invalid for a null "source" or "selector" reference to be passed or a "source" reference that has zero elements (since there is no such thing as an average value if there are zero values) or a set of items where one of more of the values is null. Any of these cases of bad data will result in a runtime error. However, I believe that the solution to this is not to run away screaming from static typing because it's not perfect - in fact, I think that the answer is more static analysis. Code Contracts is a way to include these additional requirements in the type system; to say that "source and selector may not be null" and "source may not be empty" and "source may not contain any null references". Again, this will be a way for someone consuming the code to have greater visibility of its requirements and for the compiler to enforce them. I will be able to write stricter code to stop other people from making mistakes with it, and other people will be able to write stricter code to make it clearer to me how it should be used and prevent me from making mistakes or trying to use it in ways that is not supported. I don't want the power to try to use code incorrectly.

I think that there are two other obvious ways that static typing can't help and protect you..

Firstly, when dealing with an external system there may be additional rules that you can not (and would not want to, for the sake of preventing duplication) describe in your code. Perhaps you have a data store that you pass updates to in order to persist changes made by a user - say the user wants to change the name of an employee in the application, so an UpdateEmployeeName action must be sent to the data service. This action will have an integer "Key" property and a string "Name" property. This class structure ensures that data of the appropriate form is provided but it can not ensure that the Key itself is valid - only the data store will know that. The type system is not an all-seeing-all-knowing magician, so it will allow some invalid states to be represented (such as an update action for an entity key that doesn't exist). But the more invalid states that may not be represented (such as not letting the key, which the data service requires to be an integer, be the string "abc" - for example) means that there are less possible errors states to test against and the code is more reliable (making it harder to write incorrect code will make the code more correct overall and hence more reliable).

Secondly, if the type system is not taken advantage to the fullest extent then it can't help you to the fullest extent. I have worked on code in the past where a single class was used in many places to represent variations on the same data. Sometimes a "Hotel" instance would describe the entity key, the name, the description. Sometimes the "Hotel" instance would contain detailed information about the rooms in the hotel, sometimes the "Rooms" property would be null. Sometimes it would have its "Address" value populated, other times it would be null. It would depend upon the type of request that the "Hotel" instance was returned for. This is a poor use of the type system - different response types should have been used, it should have been clear from the returned type what data would be present. The more often we're in a "sometimes this, sometimes that" situation, the less reliable that the code will be as it becomes easier to forget one of "sometimes" cases (again, I'm talking from personal experience and not just worrying about how this may or may not affect "other programmers"). Unfortunately, not even the potential for a strong type system can make shitty code good.

* (It's probably worth stating that a static type system is one way that tooling can automatically identify mistakes for you but it's not the only way - code contracts are a way to go beyond what C# can support "out of the box" but there are other approaches, such as what John Carmack has written about static analysis of C++ or how Facebook is analysing JavaScript without even requiring types to be explicitly declared)

Code Reviews

Another quote that stuck out for me in the "Sharp Knives" post was that

We enforce such good senses by convention, by nudges, and through education

This is very sensible advice. I think that one of the best ways for code quality to remain high is through developers working together - learning from each other and supporting each other. This is something that I've found code reviews to be very effective for. If all code is reviewed, then all code is guaranteed to have been read by at least two people; the author and the reviewer. If the code is perfect, then that's where the review ends - on a high note. If the code needs work then any mistakes or improvements can be highlighted and addressed. As the recipient of a review that identifies a mistake that I've made, I'm happy! Well.. I'm generally a bit annoyed with myself for making the mistake but I'm glad that a colleague has identified it rather than it getting to an end user.

As a code reviewer, I will be happy with code that I think requires no changes or if code needs to be rejected only once. I've found that code that is rejected and then fixed up is much harder to re-review and that bugs more often slip through the re-review process. It's similar to the way in which you can more easily become blind to bugs in code that you've just written than you are to someone else's code - you have a familarity with the code that you are reviewing for a second time and someone has just told you that they have fixed it; I've found that there is something psychological about that that makes it just that little bit harder to pick up on any subsequent mistakes. Thusly, I would prefer to limit the number of times that reviews bounce back and forth.

I have found that a static type system encourages a stricter structure on the code and that conventions are clearer, not to mention the fact that the compiler can identify more issues - meaning that there should be fewer types of mistake that can get through to a review. There is, of course, a limit to what the type system can contribute on this front but any mechanical checks that a computer could perform leave the reviewer more time (and mental capacity) to provide deeper insight; to offer guidance to a more junior developer or to suggest implementation tweaks to a peer.

A "wonderful message"

It's a theme that has picked up more and more weight for me over the years, that the computer should be able to help me tell it what to do - I should be able to leverage its strengths in order to multiply mine. As a developer, there is a lot of creativity required but also a huge quantity of seemingly banal details. The strength of a good abstraction comes from being able to "hide away" details that don't matter, leaving you with larger and more useful shapes to deal with, allowing you to think closer to the big picture. The more details that may be automatically verified, the less that you need to worry about them; freeing up more valuable mental space. Leaning on static analysis aids this, it allows the computer to do what it's good at and concentrate on the simple-and-boring rules, allowing you to become more effective. It's an incredibly powerful tool, the ability to actually limit certain things from being done allows you to do more of what you should be doing.

It can also be an invaluable form of documentation for people using your code (including you, in six months, when you've forgotten the finer details). Good use of the type system allows for the requirements and the intent of code to be clearer. It's not just a way of communicating with the compiler, it's also a very helpful way to communicate with human consumers of your code.

On a personal note, this marks my 100th post on this blog. The first (I love Immutable Data) was written about five years ago and was also (basically) about leveraging the type system - by defining immutable types and the benefits that they could have. I find it reassuring that, with all that I've seen since then (and thinking back over the time since I first started writing code.. over 25 years ago) that this still feels like a good thing. In a time where it seems like everyone's crying about JavaScript fatigue (and the frequent off-the-cuff comments about React being "so hot right now"*), I'm glad that there are still plenty of principles that stand the test of time.

* (Since I'm feeling so brave and self-assured, I'm going to say that I think that React *will* still be important five years from now - maybe I'll look back in 2021 and see how this statement has fared!)

Posted at 21:34