Productive Rage

Dan's techie ramblings

Private / local C# analysers (without NuGet)

(Note: The information here depends upon the "new" .csproj format being used.. but it's not that new any more, so hopefully that's not a limitation for too many people)

I'm a big fan of writing analysers to catch common mistakes at compile time rather than run time. For example, the DanSerialiser, Bridge.Immutable and ProductiveRage.SealedClassVerification libraries that I've published all include some. The way that they're traditionally distributed is as a NuGet package that installs the analyser into the desired project, which is great if you're publishing a public package that you expect to be installed via nuget.org. But what if you wanted to create a non-public analyser for something that you were working on, can you do so without creating a NuGet package? Yes.. but with some caveats.

If you're still interested then read on for the details!

(For anyone who finds themselves in the "too lazy; didn't read" category, hopefully this gives you enough information as to whether to continue or not)

What I wish existed

Actually, before I talk about what I wish already existed (but which, unfortunately, does not exist), I'll get one option out of the way first; nuget.org is not the only place that NuGet packages can be published to. If you decided that you wanted to write an analyser for some conventions internal to your company then you could create a NuGet package and publish it on an internal NuGet feed. It's pretty easy and you have a range of options such as a private NuGet feed service within your network, a private hosted service (possible with MyGet, I believe) or you can even chuck all of your private NuGet .nupkg files into a folder (on your local computer or, I presume, on a network - though I've not tested that option) and then add that as a NuGet feed in Visual Studio. This is straight forward but, still, occasionally I wish that it was possible to include an analyser project as part of a solution and have that analyser added to one of the other projects. Which brings me to..

What I've really wanted, from time to time, is to be able to have one project (say, "MyLibrary") in a solution and another project (say, "MyAnalyser") where the second project is added an analyser reference to the first project.

I'd like it to be as simple as clicking on References on the "MyLibrary" project, then "Add an Analyzer" and then choosing the "MyAnalyser" project. This, however, is not currently possible.

It seems that I'm not the only one that thinks that this would be nice, there is an issue on the .NET Compiler Platform ("Roslyn") repo relating to this: Adding Analyzers Via a Project Reference. The first reply is from a Senior Software Engineer at Microsoft who says:

This would be one of the coolest features ever

.. which sounds like a great and promising start!

However, the issue was raised in March 2017 and I don't think that any progress has been made on it, so I don't know when / if it will be tackled*.

(Having said that, just last month it was recategorised from "Backlog" to "IDE: InternalPriority" and even assigned Priority 1 - so maybe this will change in the near future! We'll have to wait and see)

What does exist

So the bad news is that there is no way in the UI to do what I want. But the good news is that there is a way to move towards it with some manual .csproj editing.

If I opened the MyLibrary.csproj from the example earlier then I could add the following section:

<ItemGroup>
  <ProjectReference Include="..\MyAnalyser\MyAnalyser.csproj">
    <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
    <OutputItemType>Analyzer</OutputItemType>
  </ProjectReference>
</ItemGroup>

.. and the MyAnalyser would now be added to MyLibrary and it would check over the code that I'd written in MyLibrary project - reporting any resulting messages, warnings or error in the VS Error List. Hurrah!

It seems like a pity that something seemingly so simple needs to be done by hand-editing the .csproj file instead of there being something in the VS GUI to do this but there are other features where you have to do the same. For example, if you want a project to target multiple frameworks when it's built then you have to manually edit the .csproj file and rename the "targetframework" node to "targetframeworks" and then type in a semi-colon-delimited list of IDs of frameworks that you're interested in - eg. from this:

<TargetFramework>netcoreapp2.1</TargetFramework>

.. to this:

<TargetFrameworks>netcoreapp2.1;net461</TargetFrameworks>

(It's quite common to do this in BenchmarkDotNet projects so that you can see how the results vary when your library is imported into different frameworks)

The good news is that hand-editing the .csproj file is much easier with the file format that we have now than the old one! So having to do this is not the end of the world.

It's not all rainbows and unicorns, though..

What are the downsides?

The biggest (and only, so far as I can tell) downside is that it seem like Visual Studio will somehow cache the analyser assembly after it loads it. This means that when you first open the solution, the analyser(s) in the MyAnalyser project will be run against the MyLibrary code and any messages, warnings and errors displayed.. but, if you then change the MyAnalyser code and rebuild then those changes won't affect the checks performed against MyLibrary.

Even if you rebuild the entire solution (rebuilding MyAnalyser first and then rebuilding MyLibrary, to try to force the new analyser assembly to be loaded).

Even if you rebuild it and then unload the solution and then reload the solution and build again.

It seems like the only way to get it to reliably load the new analyser assembly is to close the Visual Studio instance entirely and start it again.

A cryptic note in the GitHub issue that I referenced earlier made me wonder if changing the assembly version of the analyser project would help.. but it didn't.

Now, hopefully, in real world usage this isn't as bad as it sounds. The process of writing analysers lends itself very nicely to a test driven development style because you can set up a test suite where every test is of the format "for code snippet, will I get the analyser messages that I expect?" and you can build up a nice suite of tests for middle-of-the-road cases and edge cases and have them all run quickly. I actually find this to be the easiest way for me to debug things when I get myself into a situation where I don't understand why the analyser code isn't doing what I expect; I write a test with a snippet of code and then debug the test to step through the code. So you should be to get your analyser working nicely without having to test it against your "MyLibrary" code over and over.

Of course, sometimes you'll want to run it against your entire code base (otherwise, what was the point of writing it!) and then you will have to close VS and restart it. And this is inconvenient and I wish that it wasn't the case.

I think, though, that you would be in the same situation if you decided to go down the NuGet distribution route (whether from a private or public feed) - in the past, I've found that if a new version of a NuGet package includes a new version of an analyser then Visual Studio won't load the new version of the analyser without me restarting VS. Which is just as frustrating. Maybe this is part of what's delaying the work on Microsoft's side; they know that if they make adding analysers easier then they'll have to fix the cached-analyser-doesn't-get-updated problem at the same time.

To conclude

I'm going to keep my eye on that GitHub issue. It would be great to see some movement on it but I have no idea how much weight "IDE: InternalPriority" cases have, even if they are listed as Priority 1 within that category.. to be honest, I'm presuming that Priority 1 means top priority but it's just as feasible that it means lowest priority. There's a nice view of the "IDE: Internal Priority" category in GitHub here in case you want to join in on the guessing game!

At the end of the day, though, I still think that this is a powerful technology to have access to and I'd still rather have it with these caveats than not have it at all. I really believe that analysers provide a way to improve code quality and I encourage everyone to have a play around with them!

Posted at 21:42

Comments

Type aliases in Bridge.NET (C#)

Back in 2016, I wrote Writing React apps using Bridge.NET - The Dan Way (Part Three) and I talked about trying to tighten up the representation of values in the type system. One of my pet peeves that I talked about was how "no value" is represented in reference types and, in particular, with strings.

As a reminder, I was having a rant about how I hate the uncertainty of wondering "should I expect to get null passed in here / returned from here" and I decided to draw a hard line and say that no, in my code I would never expect a reference type to have a null value - instead I would always use the Optional<T> struct that I included in my NuGet package ProductiveRage.Immutable. This allows me to make it clear when a method may return a null value (because its return type would be something like Optional<PersonDetails>) and it would allow me to make it clear when a method will and won't accept null arguments (it will if the parameter type is Optional<T> and it won't if it's not).

Strings, however, have another "no value" state - when they are blank. If I want to have a method argument whose type indicates "this argument must be a string that is not null AND that is not blank" then we can't communicate that. To address that, my blog post introduced another type; the NonBlankTrimmedString -

public class NonBlankTrimmedString
{
    public NonBlankTrimmedString(string value)
    {
        if (string.IsNullOrWhiteSpace(value))
            throw new ArgumentException("Null, blank or whitespace-only value specified");
        Value = value.Trim();
    }

    /// <summary>
    /// This will never be null, blank or have any leading or trailing whitespace
    /// </summary>
    public string Value { get; }

    /// <summary>
    /// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
    /// that requires a string
    /// </summary>
    public static implicit operator string(NonBlankTrimmedString value)
    {
        if (value == null)
            throw new ArgumentNullException("value");
        return value.Value;
    }
}

This would allow me to have a method that clearly indicates that it needs a string with a real value - eg.

void DoSomething(NonBlankTrimmedString value);

.. and it could be combined with Optional<T> to define a method whose type signature indicates that it will take a string with a real value OR it will accept a "no value" - eg.

void DoSomething(Optional<NonBlankTrimmedString> value);

This method will not accept a blank string because that's just another state that is not necessary; either you give me a real (non-blank) value or you don't. There is no half-way house of "non-null but still with no value".

As another example, I might want to write a Bridge.React component whose Props type can optionally take an additional class name to render as part of the component - in which case, I might write the class a bit like this:

public sealed class Props
{
    public Props(
        /* .. other property values, */
        Optional<NonBlankTrimmedString> className = new Optional<NonBlankTrimmedString>())
    {
        // .. other properties set here
        ClassName = className;
    }

    // .. other public properties exposed here

    public Optional<NonBlankTrimmedString> ClassName { get; }
}

This is all fine and dandy and, pretty much, it just works. If I want to expand this richer type system so that it's used in API requests / responses as well then I can have Optional<T> and NonBlankTrimmedString types defined in the .NET code that runs on the server as well as in my Bridge project. And if I want to avoid code duplication then I can define the types in a Shared Project that is referenced by both the Bridge project and the server API project.

One downside to this approach, though, is that JSON payloads from API calls are going to be larger if I wrap all of my strings in NonBlankTrimmedString instances. And there will be more work for Bridge's version of Newtonsoft Json.NET to do because it has to parse more data and it has to deserialise more instances of types; for every string, instead of just deserialising a value into a string, it needs to deserialise that string value and then create an instance of a NonBlankTrimmedString to wrap it. If you have any API calls that return 100s or 1000s of strings then this can become a non-negligible cost.

The full .NET version of Newtonsoft Json.NET has some flexibility with how types are serialised to/from JSON. For example, if I wanted to tell the serialiser that NonBlankTrimmedString instances should appear in the JSON as plain strings then I could do so using a JsonConverter (there is sample code in the Newtonsoft website that demonstrates how to do it for the Version type and the principle would be exactly the same for NonBlankTrimmedString - see Custom JsonConverter<T>).

The Bridge version of the library has no support for custom JsonConverters, though, so we may appear to be a bit stuck.. if it weren't for the fact that Bridge has some low-level tricks that we can use to our advantage.

In order to allow C# code to be written that interacts with JavaScript libraries, Bridge has a few escape hatches for the type system that we can use in a careful manner. For example, I could rewrite the Bridge version of NonBlankTrimmedString to look like this:

public class NonBlankTrimmedString
{
    protected NonBlankTrimmedString() { }

    /// <summary>
    /// This will never be null, blank or have any leading or trailing whitespace
    /// </summary>
    public extern string Value { [Template("{this}")] get; }

    /// <summary>
    /// Create a NonBlankTrimmedString instance by explicitly casting a string
    /// </summary>
    public static explicit operator NonBlankTrimmedString(string value)
    {
        if (value == null)
            return null;
        value = value.Trim();
        if (value == "")
            throw new ArgumentException("Can not cast from a blank or whitespace-only string");
        return Script.Write<NonBlankTrimmedString>("value");
    }

    /// <summary>
    /// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
    /// that requires a string
    /// </summary>
    [Template("{value}")]
    public extern static implicit operator string(NonBlankTrimmedString value);
}

This changes things up a bit. Now there is no public constructor and the only way to get a NonBlankTrimmedString instance from a plain string is to explicitly cast to it - eg.

var x = (NonBlankTrimmedString)"hi!";

If the source string is blank or whitespace-only then attempting to cast it to a NonBlankTrimmedString will result in an exception being thrown.

What's interesting about this class is that it exists only to provide type information to the C# compiler - there will never be an instance of NonBlankTrimmedString alive runtime in JavaScript. The reason for this is that the explicit cast performs some validation but then, at runtime, returns the string instance directly back; it doesn't wrap it in an instance of a NonBlankTrimmedString class. Similarly, when the "Value" property is requested in C# code, this is translated into JS as a direct reference to "this" (which we know is a plain string). This is sounding complicated as I write this, so let me try to make it clear with an example!

The following C# code:

// Start with a plain string
var source = "Hi!";

// Create a NonBlankTrimmed by explicitly casting the string
var x = (NonBlankTrimmedString)source;

// Write the value of the NonBlankTrimmedString to the console
Console.WriteLine(x.Value);

.. is translated into this JS:

// Start with a plain string
var source = "Hi!";

// Create a NonBlankTrimmed by explicitly casting the string
var x = Demo.NonBlankTrimmedString.op_Explicit(source);

// Write the value of the NonBlankTrimmedString to the console
System.Console.WriteLine(x);

The reference "x" in the JS runtime is actually just a string (and so the C# "x.Value" is translated into simply "x") and the explicit operator (the method call "Demo.NonBlankTrimmedString.op_Explicit") performs some validation but then (if the validation passes) returns the string right back but claims (for the benefit of the C# compiler and type system) that it is now a NonBlankTrimmedString.

This has a couple of benefits - now, plain string values that appear in JSON can be deserialised into NonBlankTrimmedString instances by Bridge (while the Bridge version of Json.NET doesn't support type converters, it does support deserialising types using implicit or explicit operators - so, here, it would see a string in the JSON and see that the target type was a NonBlankTrimmedString and it would use NonBlankTrimmedString's explicit operator to instantiate the target type), so the JSON returned from the server can be cleaner. And it means that the JS runtime doesn't have to actually create instances of NonBlankTrimmedString to wrap those strings up in, which makes the life of the garbage collector easier (again, may be important if you have API responses that need to return 1000s of NonBlankTrimmedString).

This is an interesting concept that I'm referring to as a "type alias" - a type that exists only for the compiler and that doesn't affect the runtime. The phrase "type alias" exists in TypeScript and in F# (and in other languages, I'm sure) but I think that it means something slightly different there.. which may mean that I've chosen a confusing name for this C# / Bridge.NET concept! In TypeScript and F#, I don't believe that they allow the level of compiler validation that I'm talking about - certainly in TypeScript, type aliases are more of a convenience that allow you say something like:

type Vector = number[];
type Vectors = Vector[];

.. so that you can then write a method signature that looks like this:

function process(data: Vectors) {
    // ..
}

.. instead of:

function process(data: number[][]) {
    // ..
}

.. but the two are identical. TypeScript "type aliases" make things more flexible, not more constrained. To make that clearer, if you wrote:

type CustomerID = number;

function process(id: CustomerID) {
    // ..
}

.. then you could still call:

process(1); // Passing a plain number into a method whose signature specifies type CustomerID

In other words, the TypeScript alias means "anywhere that you see CustomerID, you can pass in a 'number'". This is the opposite of what I want, I want to be able to have methods that specify that they want a NonBlankTrimmedString and not just any old string.

I go into this in a little more detail in the section "Type aliases in other languages" at the end of this blog post. My point here was that maybe "type alias" is not the best phrase to use and maybe I'll revisit this in the future.

For now, though, let's get back to the NonBlankTrimmedString definition that I've proposed because it has some downsides, as well. As the type only exists at compile time and not at runtime, if I try to query the type of a NonBlankTrimmedString instance at runtime then it will report that it is a "System.String" - this is to be expected, since part of the benefit of this approach is that no additional instances are required other than the plain string itself - but if you were wanted to do some crazy reflection for some reason then it might catch you off guard.

Another downside is that if I wanted to create specialised versions of NonBlankTrimmedString then I have to duplicate some code. For example, I might want to strongly type my entity IDs and define them as classes derived from NonBlankTrimmedString. With the version of NonBlankTrimmedString from my 2016 blog post, this would be as simple as this:

// If NonBlankTrimmedString is a regular class then creating derived types is easy as this
public class OrderID : NonBlankTrimmedString
{
    public OrderID(string value) : base(value) { }
}

.. but with this "type alias" approach, it becomes more verbose -

// The explicit operator needs to be reimplemented for each derived type with the type alias
// alias approach shown earlier :(
public class ClassName : NonBlankTrimmedString
{
    protected ClassName() { }

    public static explicit operator ClassName(string value)
    {
        if (value == null)
            return null;
        value = value.Trim();
        if (value == "")
            throw new ArgumentException("Can not cast from a blank or whitespace-only string");
        return Script.Write<ClassName>("value");
    }
}

However, we could make this a little simpler by changing the NonBlankTrimmedString type definition to this:

public class NonBlankTrimmedString
{
    protected NonBlankTrimmedString() { }

    /// <summary>
    /// This will never be null, blank or have any leading or trailing whitespace
    /// </summary>
    public extern string Value { [Template("{this}")] get; }

    /// <summary>
    /// Create a NonBlankTrimmedString instance by explicitly casting a string
    /// </summary>
    public static explicit operator NonBlankTrimmedString(string value)
        => Wrap<NonBlankTrimmedString>(value);

    /// <summary>
    /// It's convenient to be able to pass a NonBlankTrimmedString instance as any argument
    /// that requires a string
    /// </summary>
    [Template("{value}")]
    public extern static implicit operator string(NonBlankTrimmedString value);

    protected static T Wrap<T>(string value) where T : NonBlankTrimmedString
    {
        if (value == null)
            return null;
        value = value.Trim();
        if (value == "")
            throw new ArgumentException("Can not cast from a blank or whitespace-only string");
        return Script.Write<T>("value");
    }
}

.. and then derived types would look like this:

public class OrderID : NonBlankTrimmedString
{
    protected OrderID() { }
    public static explicit operator OrderID(string value) => Wrap<OrderID>(value);
}

(Sort-of-)immutability for "free" through type aliases

Another use case where this sort of approach seemed interesting was when I was writing some client-side code that received data in the form of arrays and then did some clever calculations and drew some pretty graphs. The API response data was 10s of 1000s of arrays, where each array was 100 floating point numbers. The calculation logic took those arrays and passed them through a bunch of methods to come up with the results but I got myself in a bit of a muddle when there were one or two places that had to manipulate a subset of the data and I realised that I was confusing myself as to whether the data should be altered in place or whether local copies of those parts of the data should be taken and then changed. To make the code easier to follow, I wanted those methods to take local copies to make the changes, rather than mutating them in-place and risking messing up calculations performed on the data later in the pipeline.

What I really wanted was for those methods to have type signatures that would either take an immutable data type or a readonly data type. Immutable is the ideal because it means that not only can the receiving methods not change the data but nothing can change the data. Having readonly types on the method signatures means that the methods can't change the data but it's still technically possible for the caller to change the data. To try to illustrate this, I'll use the ReadOnlyCollection<T> type from .NET in an example:

public static void Main()
{
    var items = new List<int> { 0, 1, 2, 3 };
    var readOnlyItems = items.AsReadOnly();
    DoSomething(
        readOnlyItems,
        halfwayPointCallback: () => items.RemoveAt(0)
    );
}

static void DoSomething(ReadOnlyCollection<int> readOnlyItems, Action halfwayPointCallback)
{
    Console.WriteLine("Number of readonlyItems: " + readOnlyItems.Count);
    halfwayPointCallback();
    Console.WriteLine("Number of readonlyItems: " + readOnlyItems.Count);
}

Here, the "Main" method declares a mutable list and then it create a readonly wrapper around it. The readonly wrapper is passed into the "DoSomething" method and this means "DoSomething" can not directly alter that list. However, it's still possible for the "Main" method to change the underlying list while "DoSomething" is running.

In practice, this is not something that I find commonly happens. As such, while I would prefer immutable structures at all times (because then "Main" couldn't change the contents of the list while "DoSomething" is working on it), being able to wrap the data in a readonly structure is still a significant improvement.

So, some of the more obvious options available to me were:

  1. Stick with using arrays and be careful not to write code that performs any alteration "in place" (I don't like this situation - C#'s type system has great potential and I want it to help me and save me from myself where possible!)
  2. Pass the arrays into the methods as IEnumerable<float> (this isn't a terrible idea in general - it quite clearly communicates that the provided data should be considered read only - but the calculations that I was doing wanted to get the length of the array and to read particular indexed values from the array in unpredictable orders and this isn't very efficient with enumerable types)
  3. Create an "immutable list" class that takes an array into the constructor, copies the data and then allows access to the copy only through tightly-controlled members; ie. Length and an indexed property (This is the most type-safe way but it felt expensive doing this for the 10s of 1000s of arrays that I had)
  4. Convert each array into a List<float> and then call ".AsReadOnly()" on them (this is very little code but it also felt expensive with the amount of data that I had)
  5. Create a "ReadOnlyArray<T>" type that would be very similar in nature to the ReadOnlyCollection<T> in that it would take an array into its constructor and then provide a read only interface for it, without copying the array (This is a reasonable option and I might have gone this way were it not for liking the idea of option six)
  6. Create a "ReadOnlyArray<T>" type alias that I could use to instruct the type system that the data should not be altered but without having to introduce any new types or instances at runtime

I went with the last one because I was all excited about experimenting with "Bridge.NET type aliases" and I wanted to see how well they could work! (In reality, the fifth option was also a good one and some of the others would also be perfectly fine for smaller data sets.. to be honest, there is a chance that they wouldn't have made too much difference even with the data that I was looking at but, again, sometimes you need to make opportunity to experiment! :)

public sealed class ReadOnlyArray<T> : IEnumerable<T>
{
    [Template("{data}")]
    public extern ReadOnlyArray(T[] data);

    [External] // Required due to https://github.com/bridgedotnet/Bridge/issues/4015
    public extern T this[int index] { [Template("{this}[{index}]")] get; }

    public extern int Length { [Template("length")] get; }

    [External]
    public extern IEnumerator<T> GetEnumerator();

    [External]
    extern IEnumerator IEnumerable.GetEnumerator();

    [Template("{value}")]
    public extern static implicit operator ReadOnlyArray<T>(T[] value);
}

The structure of this class is similar in some ways to that of the NonBlankTrimmedString. Unlike that class, there is no validation that is required - I only want to provide access to an array in a limited manner and so it's fine to expose a public constructor (as opposed to the NonBlankTrimmedString, where it's important to check that the value is neither null nor blank nor whitespace-only and the [Template] attribute on the constructor doesn't easily allow for any validation).

Even though the constructor may be used on this class, there is still an operator to change an array into a ReadOnlyArray so that the deserialisation process is able to read an array of items into a ReadOnlyArray instance. I've chosen to use an implicit operator (rather than en explicit operator) here because there is no validation to perform - the NonBlankTrimmedString has an explicit operator because thatdoes perform some validation and so it's a casting action that could fail and so I want it to be explicit in code.

As with the NonBlankTrimmedString, this type will exist only at compile time and the compiled JavaScript will always be operating directly against the original array. As far as the JS code is aware, there is no wrapper class involved at all. The following C# -

var values = new[] { 1, 2, 3 };
Console.WriteLine(values.Length);

var readOnlyValuesCtor = new ReadOnlyArray<int>(values);
Console.WriteLine(readOnlyValuesCtor.Length);

ReadOnlyArray<int> readOnlyValuesCast = values;
Console.WriteLine(readOnlyValuesCast.Length);

.. is translated into this JS:

var values = System.Array.init([1, 2, 3], System.Int32);
System.Console.WriteLine(values.length);

var readOnlyValuesCtor = values;
System.Console.WriteLine(readOnlyValuesCtor.length);

var readOnlyValuesCast = values;
System.Console.WriteLine(readOnlyValuesCast.length);

Whether the ReadOnlyArray<int> is created by calling its constructor or by an implicit cast in the C# code, the JS is unaware of any change required of the reference and continues to operate on the original array. This is the "free" part of this approach - there is no runtime cost in terms of type conversions or additional references.

The other members of the class need a little more explanation, though. The indexer should be implemented just like the "Length" property, by having an extern property that has a getter with a [Template] attribute on it. However, there is a bug in the Bridge compiler that necessitate an additional [External] attribute be added to the property. Not the end of the world and I'm sure that the Bridge Team will fix it in a future version of the compiler.

The "GetEnumerator" methods require a tiny bit more explanation. In order for the class to implement IEnumerable<T>, these methods must be present. But we don't actually have to implement them ourselves. Whenever Bridge encounters a "foreach" in the source C# code, it translates it into JS that calls "GetEnumerator" and then steps through each value. For example, this C# code:

foreach (var value in readOnlyValuesCtor)
    Console.WriteLine(value);

.. becomes this JS:

$t = Bridge.getEnumerator(readOnlyValuesCtor);
try {
    while ($t.moveNext()) {
        var value = $t.Current;
        System.Console.WriteLine(value);
    }
} finally {
    if (Bridge.is($t, System.IDisposable)) {
        $t.System$IDisposable$Dispose();
    }
}

Because Bridge needs to support enumerating over arrays, the function "Bridge.getEnumerator" knows what to do if it is given an array reference. And since a ReadOnlyArray is an array reference at runtime, we don't have to do anything special - we don't have to provide a GetEnumerator implementation.

And there we go! As I explained above, I originally encountered this problem when passing an array into a complicated calculation process but this type could also be used for deserialising JSON into a richer type model, just like the NonBlankTrimmedString earlier - again, without any overhead in doing so (no instances of wrapper types will be present runtime and there will be no additional references for the garbage collector to track).

Only possible in Bridge.NET?

I was wracking my brains about whether it would be possible to do something similar with C# running in a .NET environment and I couldn't think of anything. People sometimes think "structs!" when trying to concoct ways to avoid adding references that the garbage collector needs to track but structs are only immune to this if they don't contain any object references within their fields and properties (and there are other edge cases besides this but they're not important right now).

At the end of the day, this "type alias" concept might be a bit of a niche technique and it might even be a case of me playing around, more than it being something that you might use in production.. but I thought that it was interesting nonetheless. And it has made me wish, again, that C# had support for something like this - I've written code before that defines all variety of strongly typed IDs (strings) and Keys (integers) to avoid passing the wrong type of value into the wrong place but it's always felt cumbersome (it's felt worth the effort but that didn't stop wishing me that it was less effort).

Type aliases in other languages

I've linked above to an article Using strongly-typed entity IDs to avoid primitive obsession, which is excellent and eloquently expresses some of my thoughts, but I thought that I'd add a summary in here as well (which also gives me an opportunity to go into more detail about the options in TypeScript and F#).

I'll start with an anecdote to set the scene. In a company that I used to work at, we had systems that would retrieve and render different data for different language options. Sometimes data would vary only by language ("English", "French", etc..) but sometimes it would be more specific and vary by language culture (eg. "English - United Kingdom", "English - United States", etc..). An older version of the system would pass around int values for the language or language culture keys. So there might be a method such as:

private string GetTranslatedName(int languageKey)

A problem that occurred over and over again is that language keys and language culture keys would get mixed up in the code base - in other words, it was quite common for someone to accidentally pass a language key into a method where a language culture key was expected (this situation was not helped by the fact that much of the developer testing was done in English and the language key and language culture key values in many of the databases were both 1 for English / English UK). Something that I was very keen to get into a new version of the system was to introduce "strongly typed keys" so that this sort of accident could no longer occur. The method's signature would be changed to something like this:

private string GetTranslatedName(LanguageKey languageKey)

.. and we would not describe language or language culture keys as ints in the code base. They would always be either an instance of LanguageKey or LanguageCultureKey - this way, if you attempted to pass a key of the wrong type into a method then you would get a compile error.

The downside is that each key type had to be defined as its own struct, with the following (quite verbose) structure:

public struct LanguageKey : IEquatable<LanguageKey>
{
    public LanguageKey(int value) => Value = value;

    public int Value { get; }

    public bool Equals(LanguageKey other) => Value.Equals(other.Value);
    public override bool Equals(object obj) => (obj is LanguageKey key) && (key.Value == Value);
    public override int GetHashCode() => Value;

    public static bool operator ==(LanguageKey x, LanguageKey y) => x.Value == y.Value;
    public static bool operator !=(LanguageKey x, LanguageKey y) => !(x == y);

    public static explicit operator LanguageKey(int value) => new LanguageKey(value);
}

Really, though, that is the only downside. As the strongly typed keys are structs without any reference properties or fields, there is no additional work for the garbage collector and there is no memory overhead vs tracking a simple int. But it does still feel a little arduous to have to have these definitions in the code base, particularly when the equivalent F# code looks like this:

[<Struct>] type LanguageKey = LanguageKey of int

It's worth noting that this is not actually referred to as a "type alias" in F#; this is a "single case union type". There is a concept called a "type alias" in F# that looks like this:

type LanguageKey = int

.. but that code simply says "allow me to use the word 'LanguageKey' anywhere in place of int" - eg. if I have the LanguageKey type alias specified as a method argument type in F# method, like this:

let getTranslatedName (language: LanguageKey) =
    // (Real work to retrieve translated name would go here but we'll
    //  just return the string "Whatever" for the sake of this example)
    "Whatever"

.. then the compiler would allow me to pass an int into that method -

// A LanguageKey type alias lets me pass any old int into the method - rubbish!
let name = getTranslatedName 123

.. and that's exactly what I wanted to avoid!

On the other hand, if the type LanguageKey was a "single case union type" then the code above would not compile:

// error FS0001: This expression was expected to have type 'LanguageKey' but here has type 'int'
let name = getTranslatedName 123

// This DOES compile because the types match
let key = LanguageKey 123
let name = getTranslatedName 123

.. and that's exactly what I did want!

(TypeScript's type aliases are like F#'s type aliases - they are more of a convenience and do not add the sort of type checking that I want)

Things get a bit more awkward if we want to deal with reference types, such as strings, because we could create a C# class similar to LanguageKey (or we could create an F# single case union type) but that would introduce a new instance of a type that must be tracked by the garbage collector - every strongly typed ID involves two references; the underlying string value and the strongly typed wrapper. Much of the time, that's no problem - I've had the odd issue with the .NET GC in the past but, on the whole, it's an amazing and reliable tool.. but because I have had these problems before, it makes me more aware of the trade-off when I introduce wrappers like this.

I'm convinced that using strongly typed IDs is the right thing to do in 99% of cases because it improves code quality and can eradicate a class of real-world mistake. But the concept became even more interesting to me as it appeared possible to introduce a form of type alias into Bridge.NET code that enables those compile time checks but with zero runtime cost. Granted, the type erasure that occurs means that runtime type checking is not possible (the Bridge code can not differentiate between a string or a NonBlankTrimmedString or a type that is derived from NonBlankTrimmedString) but the main driver for me was to improve compile time checking and so that wasn't a problem for me. Maybe it would be a problem in other scenarios, in which case these Bridge.NET "type aliases" might not be appropriate.

Posted at 21:59

Comments