Productive Rage - Reflection

26 July 2017

Trying to set a readonly auto-property value externally (plus, a little BenchmarkDotNet)

If you're thinking that you should try changing a readonly property.. well, in short, you almost certainly shouldn't try.

For example, the following class has a property that should only be set in its constructor and then never mutated again -

public sealed class Example
{
  public Example(int id)
  {
    Id = id;
  }
  public int Id { get; }
}

And it is a good thing that we are able to write code so easily that communicates precisely when a property may (and may not) change.

However..

You might have some very particular scenario in mind where you really do want to try to write to a readonly auto-property's value for an instance that has already been created. It's possible that you are writing some interesting deserialisation code, I suppose. For something that I was looking at, I was curious to look into how feasible it is (or isn't) and I came up with the following three basic approaches.

I think that each approach demonstrates something a little off the beaten track of .NET - granted, there's absolutely nothing here that's never been done before.. but sometimes it's fun to be reminded of how flexible .NET can be, if only to appreciate how hard it works to keep everything reliable and consistent.

TL;DR

I'll show three approaches, in decreasing order of ease of writing. They all depend upon a particular naming conventions in .NET's internals that is not documented and should not be considered reliable (ie. a future version of C# and/or the compiler could break it). Even if you ignore this potential time bomb, only the first of the three methods will actually work. Like I said at the start, this is something that you almost certainly shouldn't be attempting anyway!

Approach 1: Reflection (with some guesswork)

C# 6 introduced read-only auto-properties. Before those were available, you had two options to do something similar. You could use a private setter -

public sealed class Example
{
  public Example(int id)
  {
    Id = id;
  }
  public int Id { get; private set; }
}

.. or you could manually create a private readonly backing field for the property -

public sealed class Example
{
  private readonly int _id;
  public Example(int id)
  {
    _id = id;
  }
  public int Id { get { return _id; } }
}

The first approach requires less code but the guarantees that it claims to make are less strict. When a field is readonly then it may only be set within a constructor but when it has a private setter then it could feasibly change at any point in the lifetime of the instance. In the class above, it's clear to see that it is only set in the constructor but there are no compiler assurances that someone won't come along and add a method to the Example class that mutates the private-setter "Id" property. If you have a readonly "_id" backing field then it would not be possible to write a method to mutate the value*.

* (Without resorting to the sort of shenanigans that we are going to look at here)

So the second class is more reliable and more accurately conveys the author's intentions for the code (that the "Id" property of an Example instance will never change during its lifetime). The disadvantage is that there is more code to write.

The C# 6 syntax is the best of both worlds - as short (shorter, in fact, since there is no setter defined) as the first version but with the stronger guarantees of the second version.

Interestingly, the compiler generates IL that is essentially identical to that which result from the C# 5 syntax where you manually define a property that backs onto a readonly field. The only real difference relates to the fact that it wants to be sure that it can inject a readonly backing field whose name won't clash with any other field that the human code writer may have added to the class. To do this, it uses characters in the generated field names that are not valid to appear in C#, such as "<Id>k__BackingField". The triangle brackets may not be used in C# code but they may be used in the IL code that the compiler generates. And, just to make things extra clear, it adds a [CompilerGenerated] attribute to the backing field.

This is sufficient information for us to try to identify the compiler-generated backing field using reflection. Going back to this version of the class:

public sealed class Example
{
  public Example(int id)
  {
    Id = id;
  }
  public int Id { get; }
}

.. we can identify the backing field for the "Id" property with the following code:

var type = typeof(Example);
var property = type.GetProperty("Id");

var backingField = type
  .GetFields(BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static)
  .FirstOrDefault(field =>
    field.Attributes.HasFlag(FieldAttributes.Private) &&
    field.Attributes.HasFlag(FieldAttributes.InitOnly) &&
    field.CustomAttributes.Any(attr => attr.AttributeType == typeof(CompilerGeneratedAttribute)) &&
    (field.DeclaringType == property.DeclaringType) &&
    field.FieldType.IsAssignableFrom(property.PropertyType) &&
    field.Name.StartsWith("<" + property.Name + ">")
  );

With this backingField reference, we can start doing devious things. Like this:

// Create an instance with a readonly auto-property
var x = new Example(123);
Console.WriteLine(x.Id); // Prints "123"

// Now change the value of that readonly auto-property!
backingField.SetValue(x, 456);
Console.WriteLine(x.Id); // Prints "456"

We took an instance of a class that has a readonly property (meaning that it should never change after the instance has been constructed) and we changed that property. Evil.

One more time, though: this relies upon the current convention that the compiler-generated backing fields follow a particular naming convention. If that changes one day then this code will fail.

Enough with the boring warnings, though - let's get to the real nub of the matter; reflection is slooooooooow, isn't it? Surely we should never resort to such a clunky technology??

Approach 2: Using LINQ Expressions to generate fast code to set the field

If Example had a regular private field that we wanted to set - eg.

public sealed class Example
{
  private int _somethingElse;
  public Example(int id, int somethingElse)
  {
    Id = id;
    _somethingElse = somethingElse;
  }

  public int Id { get; }

  public int GetSomethingElse()
  {
    return _somethingElse;
  }
}

Then we could use reflection to get a reference to that field once and build a delegate using LINQ Expressions that would allow us to update that field value using something like this:

var field = typeof(Example).GetField("_somethingElse", BindingFlags.Instance | BindingFlags.NonPublic);

var sourceParameter = Expression.Parameter(typeof(Example), "source");
var valueParameter = Expression.Parameter(field.FieldType, "value");
var fieldSetter =
  Expression.Lambda<Action<Example, int>>(
    Expression.Assign(
      Expression.MakeMemberAccess(sourceParameter, field),
      valueParameter
    ),
    sourceParameter,
    valueParameter
  )
  .Compile();

We could then cache that "fieldSetter" delegate and call it any time that we wanted to update the private "_somethingElse" field on an Example instance. There would be a one-off cost to the reflection that identifies the field and a one-off cost to generating that delegate initially but any subsequent call should be comparably quick to hand-written field-updating code (obviously it's not possible to hand-write code to update a private field from outside the class.. but you get the point).

There's one big problem with this approach, though; it doesn't work for readonly fields. The "Expression.Assign" call will throw an ArgumentException if the specified member is readonly:

Expression must be writeable

SAD FACE.

This is quite unfortunate. It had been a little while since I'd played around with LINQ Expressions and I was feeling quite proud of myself getting the code to work.. only to fall at the last hurdle.

Never mind.

One bright side is that I also tried out this code in a .NET Core application and it worked to the same extent as the "full fat" .NET Framework - ie. I was able to generate a delegate using LINQ Expressions that would set a non-readonly private field on an instance. Considering that reflection capabilities were limited in the early days of .NET Standard, I found it a nice surprise that support seems so mature now.

Approach 3: Emitting IL

Time to bring out the big guns!

If the friendlier way of writing code that dynamically compiles other .NET code (ie. LINQ Expressions) wouldn't cut it, surely the old fashioned (and frankly intimidating) route of writing code to directly emit IL would do the job?

It's been a long time since I've written any IL-generating code, so let's take it slow. If we're starting with the case that worked with LINQ Expressions then we want to create a method that will take an Example instance and an int value in order to set the "_somethingElse" field on the Example instance to that new number.

The first thing to do is to create some scaffolding. The following code is almost enough to create a new method of type Action<Example, int> -

// Set restrictedSkipVisibility to true to avoid any pesky "visibility" checks being made (in other
// words, let the IL in the generated method access any private types or members that it tries to)
var method = new DynamicMethod(
  name: "SetSomethingElseField",
  returnType: null,
  parameterTypes: new[] { typeof(Example), typeof(int) },
  restrictedSkipVisibility: true
);

var gen = method.GetILGenerator();

// TODO: Emit require IL op codes here..

var fieldSetter = (Action<Example, int>)method.CreateDelegate(typeof(Action<Example, int>));

The only problem is that "TODO" section.. the bit where we have to know what IL to generate.

There are basically two ways you can go about working out what to write here. You can learn enough about IL (and remember it again years after you learn some!) that you can just start hammering away at the keyboard.. or you can write some C# that basically does what you want, compile that using Visual Studio and then use a disassembler to see what IL is produced. I'm going for plan b. Handily, if you use Visual Studio then you probably already have a disassembler installed! It's called ildasm.exe and I found it on my computer in "C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6.1 Tools" after reading this: "Where are the SDK tools? Where is ildasm?".

To make things as simple as possible, I created a new class in a C# project -

class SomethingWithPublicField
{
  public int Id;
}

and then created a static method that I would want to look at the disassembly of:

static void MethodToCopy(SomethingWithPublicField source, int value)
{
  source.Id = value;
}

I compiled the console app, opened the exe in ildasm and located the method. Double-clicking it revealed this:

.method private hidebysig static void  MethodToCopy(class Test.Program/SomethingWithPublicField source,
                                                    int32 'value') cil managed
{
  // Code size         9 (0x9)
  .maxstack  8
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  ldarg.1
  IL_0003:  stfld      int32 Test.Program/SomethingWithPublicField::Id
  IL_0008:  ret
} // end of method Program::MethodToCopyTyped

Ok. That actually couldn't be much simpler. The "ldarg.0" code means "load argument 0 onto the stack", "ldarg.1" means "load argument 1 onto the stack" and "stfld" means take the instance of the first object on the stack and set the specified field to be the second object on the stack. "ret" just means exit method (returning any value, if there is one - which there isn't in this case).

This means that the "TODO" comment in my scaffolding code may be replaced with real content, resulting in the following:

var field = typeof(Example).GetField("_somethingElse", BindingFlags.Instance | BindingFlags.NonPublic);

var method = new DynamicMethod(
  name: "SetSomethingElseField",
  returnType: null,
  parameterTypes: new[] { typeof(Example), typeof(int) },
  restrictedSkipVisibility: true
);
var gen = method.GetILGenerator();
gen.Emit(OpCodes.Ldarg_0);
gen.Emit(OpCodes.Ldarg_1);
gen.Emit(OpCodes.Stfld, field);
gen.Emit(OpCodes.Ret);

var fieldSetter = (Action<Example, int>)method.CreateDelegate(typeof(Action<Example, int>));

That's it! We now have a delegate that is a compiled method for writing a new int into the private field "_somethingElse" for any given instance of Example.

Unfortunately, things go wrong at exactly the same point as they did with LINQ Expressions. The above code works fine for setting a regular private field but if we tried to set a readonly field using the same approach then we'd be rewarded with an error:

System.Security.VerificationException: 'Operation could destabilize the runtime.'

Another disappointment!*

* (Though hopefully not a surprise if you're reading this article since I said right at the top that only the first of these three approaches would work!)

But, again, to try to find a silver lining, I also tried the non-readonly-private-field-setting-via-emitted-IL code in a .NET Core application and I was pleasantly surprised to find that it worked. It required the packages "System.Reflection.Emit.ILGeneration" and "System.Reflection.Emit.Lightweight" to be added through NuGet but nothing more difficult than that.

Although I decided last month that I'm still not convinced that .NET Core is ready for me to use in work, I am impressed by how much does work with it.

Update (9th March 2021): I realised some time after writing this post that it is possible to make this work with emitted IL, the only difference required is to take this code:

var method = new DynamicMethod(
  name: "SetSomethingElseField",
  returnType: null,
  parameterTypes: new[] { typeof(Example), typeof(int) },
  restrictedSkipVisibility: true
);

.. and add an additional argument like this:

var method = new DynamicMethod(
  name: "SetSomethingElseField",
  returnType: null,
  parameterTypes: new[] { typeof(Example), typeof(int) },
  m: field.DeclaringType.Module,
  restrictedSkipVisibility: true
);

I haven't recreated the benchmarks to try this code, I'm hoping that the performance difference will be minimal between setting a private readonly field via emitted IL and setting a private non-readonly field via emitted IL (which is benchmarked below). I'm using this approach in my DanSerialiser project.

Performance comparison

So we've ascertained that there is only one way* to set a readonly field on an existing instance and, regrettably, it's also the slowest. I guess that a pertinent question to ask, though, is just how much slower is the slowest?

* (As further evidence that there isn't another way around this, I've found an issue from EntityFramework's GitHub repo: "Support readonly fields" which says that it's possible to set a readonly property with reflection but that the issue-raiser encountered the same two failures that I've demonstrated above when he tried alternatives and no-one has proposed any other ways to tackle it)

Obviously we can't compare the readonly-field-setting performance of the three approaches above because only one of them is actually capable of doing that. But we can compare the performance of something similar; setting a private (but not readonly) field, since all three are able to achieve that.

Ordinarily at this point, I would write some test methods and run them in a loop and time the loop and divide by the number of runs and then maybe repeat a few times for good measure and come up with a conclusion. Today, though, I thought that I might try something a bit different because I recently heard again about something called "BenchmarkDotNet". It claims that:

Benchmarking is really hard (especially microbenchmarking), you can easily make a mistake during performance measurements. BenchmarkDotNet will protect you from the common pitfalls (even for experienced developers) because it does all the dirty work for you: it generates an isolated project per each benchmark method, does several launches of this project, run multiple iterations of the method (include warm-up), and so on. Usually, you even shouldn't care about a number of iterations because BenchmarkDotNet chooses it automatically to achieve the requested level of precision.

This sounds ideal for my purposes!

What I'm most interesting in is how reflection compares to compiled LINQ expressions and to emitted IL when it comes to setting a private field. If this is of any importance whatsoever then presumably the code will be run over and over again and so it should be the execution time of the compiled property-setting code that is of interest - the time taken to actually compile the LINQ expressions / emitted IL can probably be ignored as it should disappear into insignificance when the delegates are called enough times. But, for a sense of thoroughness (and because BenchmarkDotNet makes it so easy), I'll also measure the time that it takes to do the delegate compilation as well.

To do this, I created a .NET Core Console application in VS2017, added the BenchmarkDotNet NuGet package and changed the .csproj file by hand to build for both .NET Core and .NET Framework 4.6.1 by changing

<TargetFramework>netcoreapp1.1</TargetFramework>

<TargetFrameworks>netcoreapp1.1;net461</TargetFrameworks>
<PlatformTarget>AnyCPU</PlatformTarget>

(as described in the BenchmarkDotNet FAQ).

Then I put the following together. There are six benchmarks in total; three to measure the creation of the different types of property-setting delegates and three to then measure the execution time of those delegates -

class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<TimedSetter>();
        Console.ReadLine();
    }
}

[CoreJob, ClrJob]
public class TimedSetter
{
    private SomethingWithPrivateField _target;
    private FieldInfo _field;
    private Action<SomethingWithPrivateField, int>
        _reflectionSetter,
        _linqExpressionSetter,
        _emittedILSetter;

    [GlobalSetup]
    public void GlobalSetup()
    {
        _target = new SomethingWithPrivateField();

        _field = typeof(SomethingWithPrivateField)
            .GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
            .FirstOrDefault(f => f.Name == "_id");

        _reflectionSetter = ConstructReflectionSetter();
        _linqExpressionSetter = ConstructLinqExpressionSetter();
        _emittedILSetter = ConstructEmittedILSetter();
    }

    [Benchmark]
    public Action<SomethingWithPrivateField, int> ConstructReflectionSetter()
    {
        return (source, value) => _field.SetValue(source, value);
    }

    [Benchmark]
    public Action<SomethingWithPrivateField, int> ConstructLinqExpressionSetter()
    {
        var sourceParameter = Expression.Parameter(typeof(SomethingWithPrivateField), "source");
        var valueParameter = Expression.Parameter(_field.FieldType, "value");
        var fail = Expression.Assign(
            Expression.MakeMemberAccess(sourceParameter, _field),
            valueParameter
        );
        return Expression.Lambda<Action<SomethingWithPrivateField, int>>(
                Expression.Assign(
                    Expression.MakeMemberAccess(sourceParameter, _field),
                    valueParameter
                ),
                sourceParameter,
                valueParameter
            )
            .Compile();
    }

    [Benchmark]
    public Action<SomethingWithPrivateField, int> ConstructEmittedILSetter()
    {
        var method = new DynamicMethod(
            name: "SetField",
            returnType: null,
            parameterTypes: new[] { typeof(SomethingWithPrivateField), typeof(int) },
            restrictedSkipVisibility: true
        );
        var gen = method.GetILGenerator();
        gen.Emit(OpCodes.Ldarg_0);
        gen.Emit(OpCodes.Ldarg_1);
        gen.Emit(OpCodes.Stfld, _field);
        gen.Emit(OpCodes.Ret);
        return (Action<SomethingWithPrivateField, int>)method.CreateDelegate(
          typeof(Action<SomethingWithPrivateField, int>)
        );
    }

    [Benchmark]
    public void SetUsingReflection()
    {
        _reflectionSetter(_target, 1);
    }

    [Benchmark]
    public void SetUsingLinqExpressions()
    {
        _linqExpressionSetter(_target, 1);
    }

    [Benchmark]
    public void SetUsingEmittedIL()
    {
        _emittedILSetter(_target, 1);
    }
}

public class SomethingWithPrivateField
{
    private int _id;
}

The "GlobalSetup" method will be run once and will construct the delegates for delegate-executing benchmark methods ("SetUsingReflection", "SetUsingLinqExpressions" and "SetUsingEmittedIL"). The time that it takes to execute the [GlobalSetup] method does not contribute to any of the benchmark method times - the benchmark methods will record only their own execution time.

However, having delegate-creation benchmark methods ("ConstructReflectionSetter", "ConstructLinqExpressionSetter" and "ConstructEmittedILSetter") means that I'll have an idea how large the initial cost to construct each delegate is (or isn't), separate to the cost of executing each type of delegate.

BenchmarkDotNet has capabilities beyond what I've taken advantage of. For example, it can also build for Mono (though I don't have Mono installed on my computer, so I didn't try this) and it can test 32-bit vs 64-bit builds.

Aside from testing .NET Core 1.1 and .NET Framework 4.6.1, I've kept things fairly simple.

After it has run, it emits the following summary about my computer:

BenchmarkDotNet=v0.10.8, OS=Windows 8.1 (6.3.9600)

Processor=AMD FX(tm)-8350 Eight-Core Processor, ProcessorCount=8

Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET

dotnet cli version=1.0.4

[Host] : .NET Core 4.6.25211.01, 64bit RyuJIT [AttachedDebugger]

Clr : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1087.0

Core : .NET Core 4.6.25211.01, 64bit RyuJIT

And produces the following table:

Method	Job	Runtime	Mean	Error	StdDev
ConstructReflectionSetter	Clr	Clr	9.980 ns	0.2930 ns	0.4895 ns
ConstructLinqExpressionSetter	Clr	Clr	149,552.853 ns	1,752.4151 ns	1,639.2100 ns
ConstructEmittedILSetter	Clr	Clr	126,454.797 ns	1,143.9593 ns	1,014.0900 ns
SetUsingReflection	Clr	Clr	158.784 ns	3.1892 ns	3.6727 ns
SetUsingLinqExpressions	Clr	Clr	1.139 ns	0.0542 ns	0.0742 ns
SetUsingEmittedIL	Clr	Clr	1.832 ns	0.0689 ns	0.1132 ns

Method	Job	Runtime	Mean	Error	StdDev
ConstructReflectionSetter	Core	Core	9.465 ns	0.1083 ns	0.0904 ns
ConstructLinqExpressionSetter	Core	Core	66,430.408 ns	1,303.5243 ns	2,104.9488 ns
ConstructEmittedILSetter	Core	Core	38,483.764 ns	605.3819 ns	536.6553 ns
SetUsingReflection	Core	Core	2,626.527 ns	24.1110 ns	22.5534 ns
SetUsingLinqExpressions	Core	Core	1.063 ns	0.0516 ns	0.0688 ns
SetUsingEmittedIL	Core	Core	1.718 ns	0.0599 ns	0.0560 ns

The easiest thing to interpret is the "Mean" - BenchmarkDotNet did a few "pilot runs" to try to see how long the benchmark methods would take and then tries to decide what is an appropriate number of runs to do for real in order to get reliable results.

The short version is that when delegates are compiled using LINQ Expressions and emitted-IL that they both execute a lot faster than reflection; over 85x faster for .NET Framework 4.6.1 and 1,500x faster for .NET Core 1.1!

The huge difference between reflection and the other two approaches, though, may slightly overshadow the fact that the LINQ Expression delegates are actually about 1.6x faster than the emitted-IL delegates. I hadn't expected this at all, I would have thought that they would be almost identical - in fact, I'm still surprised and don't currently have any explanation for it.

The mean value doesn't usually tell the whole story, though. When looking at the mean, it's also useful to look at the Standard Deviation ("StdDev" in the table above). The mean might be within a small spread of values or a very large spread of values. A small spread is better because it suggests that the single mean value that we're looking at is representative of behaviour in the real world and that values aren't likely to vary too wildly - a large standard deviation means that there was much more variation in the recorded values and so the times could be all over the place in the real world. (Along similar lines, the "Error" value is described as being "Half of 99.9% confidence interval" - again, the gist is that smaller values suggest that the mean is a more useful indicator of what we would see in the real world for any given request).

What I've ignored until this point are the "ConstructReflectionSetter" / "ConstructLinqExpressionSetter" / "ConstructEmittedILSetter" methods. If we first look at the generation of the LINQ Expression delegate on .NET 4.6.1, we can see that the mean time to generate that delegate was around 150ms - compared to approx 10ms for the reflection delegate. Each time the LINQ Expressions delegate is used to set the field instead of the reflection delegate we save around 0.16ms. That means that we need to call the delegate around 950 times in order to pay of the cost of constructing it!

As I suggested earlier, it would only make sense to investigate these sort of optimisations if you expect to execute the code over and over and over again (otherwise, why not just keep it simple and stick to using plain old reflection).. but it's still useful to have the information about just how much "upfront cost" there is to things like this, compared to how much you hope to save in the long run.

It's also interesting to see the discrepancies between .NET Framework 4.6.1 and .NET Core 1.1 - the times to compile LINQ Expressions and emitted-IL delegates are noticeably shorter and the time to set the private field by reflection noticeably longer. In fact, these differences mean that you only need to set the field 25 times before you start to offset the cost of creating the LINQ Expressions delegate (when you compare it to updating the field using reflection) and only 14 times to offset the cost of creating the emitted-IL delegate!

BenchmarkDotNet is fun

I'm really happy with how easy BenchmarkDotNet makes it to measure these sorts of very short operations. Whenever I've tried to do something similar in the past, I've felt niggling doubts that maybe I'm not running it enough times or maybe there are some factors that I should try to average out. Even when I get a result, I've sometimes just looked at the single average (ie. the mean) time taken, which is a bit sloppy since the spread of results can be of vital importance as well. That BenchmarkDotNet presents the final data in such a useful way and with so few decisions on my part is fantastic.

Benchmarking across multiple frameworks

I forget each time that I start a new project how the running-benchmarks-against-multiple-frameworks functionality works, so I'll add a note here for anyone else that gets confused (and, likely, for me in the future!) - the first thing to do is to manually edit the .csproj file of the benchmark project so that it includes the following:

<TargetFrameworks>netcoreapp2.0;net461</TargetFrameworks>

It's not currently possible to specify multiple projects using the VS GUI, so your .csproj file will normally have a line like this:

<TargetFramework>netcoreapp2.0</TargetFramework>

(Not only is there only a single framework specified but the node is called "TargetFramework" - without an "s" - as opposed to "TargetFrameworks" with an "s")

After you've done this, you need to run the benchmark project from the command line (if you try to run it from within VS, even in Release configuration, you will get a warning that the results may be inaccurate as a debugger is attached). You do that with a command like this (it may vary if you're using a different version of .NET Core) -

dotnet run --framework netcoreapp2.0 --configuration release

You have to specify a framework to run the project as but (and this is the important part) that does not mean that the benchmarks will only be run against the framework. What happens when you run this command is that multiple executables are built and then executed, which run the tests in each of the frameworks that you specified in the benchmark attributes and in the "TargetFrameworks" node in the .csproj file. The results of these multiple executables are aggregated to give you the final benchmark output.

.NET Core upsets me again

On the other hand, unfortunately .NET Core has been hard work for me again when it came to BenchmarkDotNet. I made it sound very easy earlier to get everything up and running because I didn't want dilute my enthusiasm for the benchmarking. However, I did have a myriad of problems before everything started working properly.

When I was hand-editing the .csproj file to target multiple frameworks (I still don't know why this isn't possible within VS when editing project properties), Visual Studio would only seem to intermittently acknowledge that I'd changed it and offer to reload. This wasn't super-critical but it also didn't fill me with confidence.

When it was ready to build and target both .NET Framework 4.6.1 and .NET Core 1.1, I got a cryptic warning:

Detected package downgrade: Microsoft.NETCore.App from 1.1.2 to 1.1.1

CoreExeTest (>= 1.0.0) -> BenchmarkDotNet (>= 0.10.8) -> Microsoft.NETCore.App (>= 1.1.2)
CoreExeTest (>= 1.0.0) -> Microsoft.NETCore.App (>= 1.1.1)

Everything seemed to build alright but I didn't know if this was something to worry about or not (I like my projects to be zero-warning). It suggested to me that I was targeting .NET Core 1.1 and BenchmarkDotNet was expecting .NET Core 1.1.2 - sounds simple enough, surely I can upgrade? I first tried changing the .csproj to target "netcoreapp1.1.2" but that didn't work. In fact, it "didnt work" in a very unhelpful way; when I ran the project it would open in a window and immediately close, with no way to break and catch the exception in the debugger. I used "dotnet run"* on the command line to try to see more information and was then able to see the error message:

The specified framework 'Microsoft.NETCore.App', version '1.1.2' was not found.

Check application dependencies and target a framework version installed at:

C:\Program Files\dotnet\shared\Microsoft.NETCore.App

The following versions are installed:

1.0.1

1.0.4

1.1.1

Alternatively, install the framework version '1.1.2'.

* (Before being able to use "dotnet run" I had to manually edit the .csproj file to only target .NET Core - if you target multiple frameworks and try to use "dotnet run" then you get an error "Unable to run your project. Please ensure you have a runnable project type and ensure 'dotnet run' supports this project")

I changed the .csproj file back from "netcoreapp1.1.2" to "netcoreapp1.1" and went to the NuGet UI to see if I could upgrade the "Microsoft.NETCore.App" package.. but the version dropdown wouldn't let me change it (stating that the other versions that it was aware of were "Blocked by project").

I tried searching online for a way to download and install 1.1.2 but got nowhere.

Finally, I saw that VS 2017 had an update pending entitled "Visual Studio 15.2 (26430.16)". The "15.2" caught me out for a minute because I initially presumed it was an update for VS 2015. The update includes .NET Core 1.1.2 (see this dotnet GitHub issue) and, when I loaded my solution again, the warning above had gone. Looking at the installed packages for my project, I saw that "Microsoft.NETCore.App" was now on version 1.1.2 and that all other versions were "Blocked by project". This does not feel friendly and makes me worry about sharing code with others - if they don't have the latest version of Visual Studio then the code may cause them warnings like the above that don't happen on my PC. Yuck.

After all this, I got the project compiling (without warnings) and running, only for it to intermittently fail as soon as it started:

Access to the path 'BDN.Generated.dll' is defined

This relates to an output folder created by BenchmarkDotNet. Sometimes this folder would be locked and it would not be possible to overwrite the files on the next run. Windows wouldn't let me delete the folder directly but I could trick it by renaming the folder and then deleting it. I didn't encounter this problem if I created an old-style .NET Framework project and used BenchmarkDotNet there - this would prevent me from running tests against multiple frameworks but it might have also prevented me from teetering over the brink of insanity.

This is not how I would expect mature tooling to behave. For now, I continue to consider .NET Core as the Top Gear boys (when they still were the Top Gear boys) described old Alfa Romeos; "you want to believe that it can be something wonderful but you couldn't, in all good conscience, recommend it to a friend".

Summary

I suspect that, to some, this may seem like one of my more pointless blog posts. I tried to do something that .NET really doesn't want you to do (and that whoever wrote the code containing the readonly auto-properties really doesn't expect you to do) and then tried to optimise that naughty behaviour - then spent a lot more time explaining how it wasn't possible to do so!

However, along the way I discovered BenchmarkDotNet and I'm counting that as a win - I'll be keeping that in my arsenal for future endeavours. And I also enjoyed revisiting what is and isn't possible with reflection and reminding myself of the ways that .NET allows you to write code that could make my code appear to work in surprising ways.

Finally, it was interesting to see how the .NET Framework compared to .NET Core in terms of performance for these benchmarks and to see take another look at the question of how mature .NET Core and its tooling is (or isn't). And when you learn a few things, can it ever really count as a waste of time?

Minor follow-up (8th August 2017)

A comment on this post by "ai_enabled" asked about the use of the reflection method "SetValueDirect" instead of "SetValue". I must admit that I was unaware of this method but it was an interesting question posed about its performance in comparison to "SetValue" and there was a very important point made about the code that I'd presented so far when it comes to structs; in particular, because structs are copied when they're passed around, the property-update mechanisms that I've shown wouldn't have worked. I'll try to demonstrate this with some code:

public static void Main()
{
    var field = typeof(SomeStructWithPrivateField)
      .GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
      .FirstOrDefault(f => f.Name == "_id");

    // FAIL! "target" will still have an "_id" value of zero :(
    var target = new SomeStructWithPrivateField();
    field.SetValue(target, 123);
}

public struct SomeStructWithPrivateField
{
    private int _id;
}

Because the "SetValue" method's first parameter is of type object, the "target" struct will get boxed - any time that a non-reference type is passed as an argument where a reference type is expected, it effectively gets "wrapped up" into an object. I won't go into all of the details of boxing / unboxing here (if you're interested, though, then "Boxing and Unboxing (C# Programming Guide)" is a good starting point) but one important thing to note is that structs are copied as part of the boxing process. This means "SetValue" will be working on a copy of "target" and so the "_id" property of the "target" value will not be changed by the "SetValue" call!

The way around this is to use "SetValueDirect", which takes a special TypedReference argument. The way in which this is done is via the little-known "__makeref" keyword (I wasn't aware of it before looking into "SetValueDirect") -

public static void Main()
{
    var field = typeof(SomeStructWithPrivateField)
      .GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
      .FirstOrDefault(f => f.Name == "_id");

    // SUCCESS! "target" will have its "_id" value updated!
    var target = new SomeStructWithPrivateField();
    field.SetValueDirect(__makeref(target), 123);
}

If we wanted to wrap this up into a delegate then we need to ensure that the target parameter is marked as being "ref", otherwise we'll end up creating another place that the struct gets copied and the update lost. That means that we can no longer use something like:

Action<SomeStructWithPrivateField, int>

In fact, we can't use the generic Action class at all because it doesn't allow for "ref" parameters to be specified. Instead, we'll need to define a new delegate -

public delegate void Updater(ref SomeStructWithPrivateField target, int value);

Instances of this may be created like this:

var field = typeof(SomeStructWithPrivateField)
  .GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
  .FirstOrDefault(f => f.Name == "_id");

Updater updater = (ref SomeStructWithPrivateField target, int value)
  => field.SetValueDirect(__makeref(target), id);

If we want to do the same with the LINQ Expressions or generated IL approaches then we need only make some minor code tweaks to what we saw earlier. The first argument of the generated delegates must be changed to be a "ref" type and we need to generate a delegate of type Updater instead of Action<SomeStructWithPrivateField, int> -

// Construct an "Updater" delegate using LINQ Expressions
var sourceParameter = Expression.Parameter(
  typeof(SomeStructWithPrivateField).MakeByRefType(),
  "source"
);
var valueParameter = Expression.Parameter(field.FieldType, "value");
var fail = Expression.Assign(
  Expression.MakeMemberAccess(sourceParameter, field),
  valueParameter
);
var linqExpressionUpdater = Expression.Lambda<Updater>(
    Expression.Assign(
      Expression.MakeMemberAccess(sourceParameter, field),
      valueParameter
    ),
    sourceParameter,
    valueParameter
  )
  .Compile();

// Construct an "Updater" delegate by generating IL
var method = new DynamicMethod(
  name: "SetField",
  returnType: null,
  parameterTypes: new[] { typeof(SomeStructWithPrivateField).MakeByRefType(), typeof(int) },
  restrictedSkipVisibility: true
);
var gen = method.GetILGenerator();
gen.Emit(OpCodes.Ldarg_0);
gen.Emit(OpCodes.Ldind_Ref);
gen.Emit(OpCodes.Ldarg_1);
gen.Emit(OpCodes.Stfld, field);
gen.Emit(OpCodes.Ret);
var emittedIlUpdater = (Updater)method.CreateDelegate(typeof(Updater));

(Note: There is an additional "Ldind_Ref" instruction required for the IL to "unwrap" the ref argument but it's otherwise the same)

I used BenchmarkDotNet again to compare the performance of the two reflection methods ("SetValue" and "SetValueDirect") against LINQ Expressions and emitted IL when setting a private field on an instance and found that having to call "__makeRef" and "SetValueDirect" was much slower on .NET 4.6.1 than just calling "SetValue" (about 17x slower) but actually marginally faster on .NET Core.

Method	Runtime	Mean	Error	StdDev
ConstructReflectionSetter	Clr	11.285 ns	0.3082 ns	0.8281 ns
ConstructReflectionWithSetDirectSetter	Clr	10.597 ns	0.2845 ns	0.4345 ns
ConstructLinqExpressionSetter	Clr	196,194.530 ns	2,075.5246 ns	1,839.8983 ns
ConstructEmittedILSetter	Clr	170,913.441 ns	2,289.5219 ns	2,141.6200 ns
SetUsingReflection	Clr	142.976 ns	2.8706 ns	3.3058 ns
SetUsingReflectionAndSetDirect	Clr	2,444.816 ns	40.9226 ns	38.2790 ns
SetUsingLinqExpressions	Clr	2.370 ns	0.0795 ns	0.0744 ns
SetUsingEmittedIL	Clr	2.616 ns	0.0849 ns	0.0834 ns

Method	Runtime	Mean	Error	StdDev
ConstructReflectionSetter	Core	10.595 ns	0.2196 ns	0.1946 ns
ConstructReflectionWithSetDirectSetter	Core	10.540 ns	0.2838 ns	0.3378 ns
ConstructLinqExpressionSetter	Core	117,697.478 ns	758.9277 ns	672.7696 ns
ConstructEmittedILSetter	Core	82,080.062 ns	310.8230 ns	275.5365 ns
SetUsingReflection	Core	2,782.834 ns	17.5705 ns	16.4355 ns
SetUsingReflectionAndSetDirect	Core	2,541.563 ns	21.8272 ns	20.4172 ns
SetUsingLinqExpressions	Core	2.421 ns	0.0227 ns	0.0212 ns
SetUsingEmittedIL	Core	2.655 ns	0.0090 ns	0.0080 ns

It's worth noting that the LINQ Expressions and emitted-IL approaches are slightly slower when working with a "ref" parameter than they were in the original version of the code. I suppose that this makes sense because there is an extra instruction explicitly required in the emitted-IL code and the LINQ-Expression-constructed delegate will have to deal with the added indirection under the hood (though this happens "by magic" and the way that LINQ Expressions code doesn't need to be changed to account for it).

I guess that it's possible that "SetValueDirect" could be faster if you already have a TypedReference (which is what "__makeRef" gives you) and you want to set multiple properties on it.. but that wasn't the use case that I had in mind when I looked into all of this and so I haven't tried to measured that.

All in all, this was another fun diversion. It's curious that the performance between using "SetValue" and "__makeRef" / "SetValueDirect" is so pronounced in the "classic" .NET Framework but much less so in Core. On the other hand, if the target reference is a struct then the performance discrepancies are moot since trying to use "SetValue" won't work!

Some more notes about .NET Core

If you want to try to reproduce this for yourself in .NET Core then (accurate as of 8th August 2017) you need to install Visual Studio 2017 15.3 Preview 2 so that you can build .NET Core 2.0 projects and you'll then need to install the NuGet package "System.Runtime.CompilerServices.Unsafe"*. Without both of these, you won't be able to use "__makeRef", you'll get a slightly cryptic error:

Predefined type 'System.TypedReference' is not defined or imported

* (I found this out via Ben Bowen's post "Fun With __makeref")

Once you have these bleeding edge bits, though, you can build a project with BenchmarkDotNet tests configured to run in both .NET Framework and .NET Core with a .csproj like this:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>netcoreapp2.0;net461</TargetFrameworks>
    <PlatformTarget>AnyCPU</PlatformTarget>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" Version="0.10.8" />
  </ItemGroup>

</Project>

and you can run the tests in both frameworks using this:

dotnet run --framework netcoreapp2.0 --configuration release

Don't be fooled by the fact that you have to specify a single framework - through BenchmarkDotNet magic, the tests will be run for both frameworks (so long as you annotate the benchmark class with "[CoreJob, ClrJob]") and the results will be displayed in one convenient combined table at the end.

Posted at 20:31

Tags:

Comments

14 October 2014

Writing run-time compiled LINQ Expressions

In September, I was talking about Implementing F#-inspired "with" updates for immutable classes in C# - in which I mentioned that

The UpdateWithSignature delegate itself is a compiled LINQ expression so, once the cost of generating it has been paid the first time that it's required, the calls to this delegate are very fast

In the past, I've made use of LINQ expressions for writing code that is generated at runtime that will be executed many times and would benefit from being as fast as code written at compile time. I'm by no means an expert, but I've put it to use a few times and feel like I'm slowly getting the hang of it. What struck me, the first few times that I tried to find out how to do things, was how sparse the information seems to be out there. There's reference material from Microsoft - which is helpful if you basically have a good grasp on what you're doing and need to look up some minutiae of the implementation. You can find quite advanced articles, but these also tend to assume deep knowledge and jump right into the deep end. Then there's beginner-oriented overview articles - these can be excellent at giving an introduction into what LINQ expressions can mean and what they can be used for, but they don't tend to go beyond fairly simple examples and don't (I found) help you get into the frame of mind that you need to build up complex expressions.

I thought it might be helpful for an "intermediary level" article to exist that takes the basic concepts and walks through creating something useful with it.

Now, I should maybe preface this with an admission that in a subsequent post to the one I quoted above (see "A follow-up to "Implementing F#-inspired 'with' updates in C#"), I said that I probably wouldn't end up using this dynamic run-time-compiled code in real world projects - but it still seems like a good sized example of something real-world-ish. So I'm going to walk through recreating it. If you haven't read that post, no worries, I'm going to act like you haven't and explain everything as I go (and after you're finished, if you haven't read that post, you can go do so :)

The (slightly-contrived) example

Here's the setup, which corresponds roughly to what I was trying to do last month. I've got a class RoleDetails -

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    Title = title;
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  public string Title { get; private set; }
  public DateTime StartDate { get; private set; }
  public DateTime? EndDateIfAny { get; private set; }
}

I wanted to generate a method at runtime which I could pass the arguments "title", "startDate" and "endDateIfAny" to and have it return a new instance of the class, using those values to define the new instance. The equivalent of

public static RoleDetails UpdateWith(
  RoleDetails source,
  Optional<string> title,
  Optional<DateTime> startDate,
  Optional<DateTime?> endDateIfAny)
{
  if (source == null)
    return new ArgumentNullException("source");

  if (!title.IndicatesChangeFromValue(source.Title)
  && !startDate.IndicatesChangeFromValue(source.StartDate)
  && !endDateIfAny.IndicatesChangeFromValue(source.EndDateIfAny))
    return this;

  return new RoleDetails(
    title.GetValue(source.Title),
    startDate.GetValue(source.StartDate),
    endDateIfAny.GetValue(source.EndDateIfAny)
  );
}

where the Optional type is a struct

public struct Optional<T>
{
  private T _valueIfSet;
  private bool _valueHasBeenSet;

  public T GetValue(T valueIfNoneSet)
  {
    return _valueHasBeenSet ? _valueIfSet : valueIfNoneSet;
  }

  public bool IndicatesChangeFromValue(T value)
  {
    if (!_valueHasBeenSet)
      return false;

    if ((value == null) && (_valueIfSet == null))
      return false;
    else if ((value == null) || (_valueIfSet == null))
      return true;

    return !value.Equals(_valueIfSet);
  }

  public static implicit operator Optional<T>(T value)
  {
    return new Optional<T>
    {
      _valueIfSet = value,
      _valueHasBeenSet = true
    };
  }

The structure we need to form is a "body expression" that performs the method's internals, and for this to be packaged up with method arguments into a compiled lambda expression.

Before going too wild, a simple example to illustrate this is a good start:

private Func<RoleDetails, string> GetRoleTitleRetriever()
{
  var sourceParameter = Expression.Parameter(typeof(RoleDetails), "source");
  var titlePropertyRetriever = Expression.Property(
    sourceParameter,
    typeof(RoleDetails).GetProperty("Title")
  );
  return
    Expression.Lambda<Func<RoleDetails, string>>(
      titlePropertyRetriever,
      sourceParameter
    ).Compile();
}

The "titlePropertyRetriever" is an expression that will return the value of the "Title" property from a RoleDetails instance that is represented by an expression passed into it. In this case, that expression is the "sourceParameter" which will be combined with the "titlePropertyRetriever" (which, here, represents the entirety of the "body" expression) to create a lambda expression. The "sourceParameter" is the single argument for this particular lambda.

A lamba expression can be used in conjunction with other expressions if you need to form a complicated construct. Here we have a very simple action and so we call the "Compile" method and return the result. The static method "Expression.Lambda" returns an Expression<Func<RoleDetails, string>>. Calling "Compile" on it returns a Func<RoleDetails, string>. This is a delegate that we can call directly in our code.

This code could be used thusly -

var role1 = new RoleDetails("Head Honcho", DateTime.Now, null);
var role2 = new RoleDetails("Dogsbody", DateTime.Now, null);

var titleRetriever = GetRoleTitleRetriever();
var title1 = titleRetriever(role1); // This equals "Head Honcho"
var title2 = titleRetriever(role2); // This equals "Dogsbody"

While obviously very simplistic, this gives us some sense of the structure we want. Note that there is no explicit representation of "return" in there - the body expression is expected to return a value and the result of the last expression is taken as the return value.

What I'm trying to get towards, though, is something that produces a new instance of the same type - not just a property retrieved from it. So instead of a Func<RoleDetails, string>, we want a Func<RoleDetails, RoleDetails> which does some sort of work.

The next (small) step is to illustrate calling a constructor. The following will return a new instance of RoleDetails using all the same property values.

private Func<RoleDetails, RoleDetails> GetCloner()
{
  var sourceParameter = Expression.Parameter(typeof(RoleDetails), "source");
  var constructor = typeof(RoleDetails).GetConstructor(new[] {
    typeof(string),
    typeof(DateTime),
    typeof(DateTime?)
  });
  return
    Expression.Lambda<Func<RoleDetails, RoleDetails>>(
      Expression.New(
        constructor,
        Expression.Property(
          sourceParameter,
          typeof(RoleDetails).GetProperty("Title")
        ),
        Expression.Property(
          sourceParameter,
          typeof(RoleDetails).GetProperty("StartDate")
        ),
        Expression.Property(
          sourceParameter,
          typeof(RoleDetails).GetProperty("EndDateIfAny")
        )
      ),
      sourceParameter
    ).Compile();
}

Reflection is used to get a reference to the constructor that takes arguments with types string, DateTime and DateTime?. The body expression consists of the result of "Expression.New", which takes a constructor and expressions for the values and returns an expression for the new instance.

I'm not going to go into too much detail about using reflection to look up constructors, properties, methods, etc.. since it's easy to good find resources that cover this topic at all levels. In my original post, I was talking about code that would analyse the target type and match the best constructor to available properties (for cases where there might be multiple constructors) - here, I'm going to leave all that out and "hard code" the mappings. It shouldn't be a big deal to take what I'm (hopefully!) going to explain here and then complicate it by bolting on some reflection-based type analysis.

It's worth noting here, that the code above has hard-coded strings for property names and relies upon a particular constructor signature being available. As soon as you introduce reflection like this, you give up much of what the compiler can offer you and if the target types get refactored in a manner that changes the facts relied on then you won't find out until runtime that you've got a problem. But that's the cost of runtime craziness like this. I think it's a fair assumption going in that you're well aware of all this! Often this sort of code gymnastics is not appropriate, but other times runtime convenience is very useful - I tend to use AutoMapper as the canonical example since people seem to find it very easy to grasp the pros and cons of it but it's an prime example of something that you mightn't find issues with until runtime (as opposed to less "dynamic" code that issues can be identified with at compile time, through static analysis in the IDE).

The next step in our example code is the introduction of additional arguments. To stick to baby steps, all we'll do is define a delegate the takes a source reference and returns a new instance by overriding a single property - specifically the "Title":

private Func<RoleDetails, string, RoleDetails> GetTitleUpdater()
{
  var sourceParameter = Expression.Parameter(typeof(RoleDetails), "source");
  var titleParameter = Expression.Parameter(typeof(string), "title");
  var constructor = typeof(RoleDetails).GetConstructor(new[] {
    typeof(string),
    typeof(DateTime),
    typeof(DateTime?)
  });
  return
    Expression.Lambda<Func<RoleDetails, string, RoleDetails>>(
      Expression.New(
        constructor,
        titleParameter,
        Expression.Property(
          sourceParameter,
          typeof(RoleDetails).GetProperty("StartDate")
        ),
        Expression.Property(
          sourceParameter,
          typeof(RoleDetails).GetProperty("EndDateIfAny")
        )
      ),
      sourceParameter,
      titleParameter
    ).Compile();
}

Here, it's clear how to affect the Func signature - by changing the type param for "Expression.Lambda" and by specifying the corresponding number of parameters. Instead of the body expression and a single argument, we now specify two arguments. The types of the arguments in the Func are consistent with the types of the "sourceParameter" and "titleParameter". The third Func type param is also RoleDetails since that is what is returned by the RoleDetails constructor that "Expression.New" is using.

It's also clear how expressions can be interchanged - before the "Expression.New" call was taking a constructor reference and then expressions for the constructor arguments that all consisted of type MemberExpression (since this the return type of "Expression.Property"). Now we've switched the first MemberExpression out for a ParameterExpression. "Expression.New" doesn't care if the constructor argument expressions are property retrievals, method arguments, constant values - they can be anything, so long as they can be described by a type of Expression.

Introducing Optional arguments

Let's address two issue now. Firstly, there should be three arguments passed in - for each of the three constructor arguments - instead of one. And these arguments should be of type Optional<T>, so that they can effectively have "no value" and default to the value they have on the "source" reference.

private delegate RoleDetails RoleDetailsUpdater(
  RoleDetails source,
  Optional<string> title,
  Optional<DateTime> startDate,
  Optional<DateTime?> endDateIfAny
);

private RoleDetailsUpdater GetSimpleUpdater()
{
  var sourceParameter = Expression.Parameter(typeof(RoleDetails), "source");
  var titleParameter = Expression.Parameter(typeof(Optional<string>), "title");
  var startDateParameter = Expression.Parameter(typeof(Optional<DateTime>), "startDate");
  var endDateIfAnyParameter = Expression.Parameter(typeof(Optional<DateTime?>), "endDateIfAny");
    var constructor = typeof(RoleDetails).GetConstructor(new[] {
      typeof(string),
      typeof(DateTime),
      typeof(DateTime?)
    });
    return
      Expression.Lambda<RoleDetailsUpdater>(
        Expression.New(
          constructor,
          Expression.Call(
            titleParameter,
            typeof(Optional<string>).GetMethod("GetValue"),
            Expression.Property(sourceParameter, "Title")
          ),
          Expression.Call(
            startDateParameter,
            typeof(Optional<DateTime>).GetMethod("GetValue"),
            Expression.Property(sourceParameter, "StartDate")
          ),
          Expression.Call(
            endDateIfAnyParameter,
            typeof(Optional<DateTime?>).GetMethod("GetValue"),
            Expression.Property(sourceParameter, "EndDateIfAny")
          )
        ),
        sourceParameter,
        titleParameter,
        startDateParameter,
        endDateIfAnyParameter
      ).Compile();
}

Now we're using the return values of "GetValue" method calls for all of the constructor arguments. Each method call is taking a property value extracted from the "source" reference as an argument (since "GetValue" takes a single argument - as per the definition of Optional earlier). I've also snuck in another change. I found that it was getting unwieldy specifying a Func with five arguments (RoleDetails, string, DateTime and DateTime? as inputs and another RoleDetails as the output) and so defined a delegate to instead. This delegate can be used with "Expression.Lambda" just as well as any Func can.

There are two concepts missing still, though, that are key to the original intention. We need to throw an exception if the "source" reference is null. And we need to return the source reference straight back out if none of the arguments represent a change; if the "title", "startDate" and "endDateIfAny" values all match those on the source reference then we may as well return that source reference straight back, rather than creating a new instance that we don't need - this only makes sense because RoleDetails is an immutable type (but since it is, it does make sense).

To deal with branching, there is a method "Expression.IfThenElse" which takes an expression for the condition (this expression must represent a boolean-returning operation) and then expressions for if-true and if-false. There is something to be aware of here, though - in C# (and in LINQ expressions) an "If" (or "If..Else") statement is not an "expression" where "expression" means "something that returns a value". It branches execution but, unlike with the property accessor or methods calls we've seen so far, it doesn't return a value. To make it work as required here, at the end of each branch we need to return via a "label" that marks the end of the block and has a type corresponding to the block's return type.

A label indicates an exit point in a block. If the block has a return type, then the label must have a compatible return value (if the block's return type is void then the label needn't specify a return value since the block will not be returning any value). The label's return value may be null, but if a type other than System.Object is being returned then that null constant must be described with the actual return type.

Once this label is defined, "Expression.Return" can be used to terminate the branches on an if-then-else construct. This is all illustrated in the next example.

The source-argument-null-check is easier; we'll combine "Expression.IfThen" (rather than "IfThenElse") with "Expression.Throw". There is no return label nonsense to worry about since, once an exception has been thrown, there's no return value involved! "Expression.Throw" takes a single argument, which is the exception that it should throw.

private RoleDetailsUpdater GetUpdater()
{
  // These are the parameters for the RoleDetailsUpdater delegate that will be generated
  var sourceParameter = Expression.Parameter(typeof(RoleDetails), "source");
  var titleParameter = Expression.Parameter(typeof(Optional<string>), "title");
  var startDateParameter = Expression.Parameter(typeof(Optional<DateTime>), "startDate");
  var endDateIfAnyParameter = Expression.Parameter(
    typeof(Optional<DateTime?>),
    "endDateIfAny"
  );

  // When evaluated, these expressions will extract the property values from the "source" reference
  var sourceTitleRetriever = Expression.Property(
    sourceParameter,
    typeof(RoleDetails).GetProperty("Title")
  );
  var sourceStartDateRetriever = Expression.Property(
    sourceParameter,
    typeof(RoleDetails).GetProperty("StartDate")
  );
  var sourceEndDateIfAnyRetriever = Expression.Property(
    sourceParameter,
    typeof(RoleDetails).GetProperty("EndDateIfAny")
  );

  // When evaluated, these will determine whether any of the argument values differ from the
  // property values on the "source" reference
  var isTitleValueNew = Expression.Call(
    titleParameter,
    typeof(Optional<string>).GetMethod("IndicatesChangeFromValue"),
    sourceTitleRetriever
  );
  var isStartDateValueNew = Expression.Call(
    startDateParameter,
    typeof(Optional<DateTime>).GetMethod("IndicatesChangeFromValue"),
    sourceStartDateRetriever
  );
  var isEndDateIfAnyValueNew = Expression.Call(
    endDateIfAnyParameter,
    typeof(Optional<DateTime?>).GetMethod("IndicatesChangeFromValue"),
    sourceEndDateIfAnyRetriever
  );
  var areAnyValuesNew = Expression.OrElse(
    isTitleValueNew,
    Expression.OrElse(isStartDateValueNew, isEndDateIfAnyValueNew)
  );

  // This is the where the real work takes place: If "source" is null then throw an exception.
  // If any of the arguments require that a new instance being created, then construct that
  // instance with the new data and return it. Otherwise just return the source reference.
  var returnTarget = Expression.Label(typeof(RoleDetails));
  return
    Expression.Lambda<RoleDetailsUpdater>(
      Expression.Block(
        Expression.IfThen(
          Expression.Equal(sourceParameter, Expression.Constant(null)),
          Expression.Throw(Expression.Constant(new ArgumentNullException("source")))
        ),
        Expression.IfThenElse(
          areAnyValuesNew,
          Expression.Return(
            returnTarget,
            Expression.New(
              typeof(RoleDetails).GetConstructor(new[] {
                typeof(string),
                typeof(DateTime),
                typeof(DateTime?)
              },
              Expression.Call(
                titleParameter,
                typeof(Optional<string>).GetMethod("GetValue"),
                sourceTitleRetriever
              ),
              Expression.Call(
                startDateParameter,
                typeof(Optional<DateTime>).GetMethod("GetValue"),
                sourceStartDateRetriever
              ),
              Expression.Call(
                endDateIfAnyParameter,
                typeof(Optional<DateTime?>).GetMethod("GetValue"),
                sourceEndDateIfAnyRetriever
              )
            )
          ),
          Expression.Return(
            returnTarget,
            sourceParameter
          )
        ),
        Expression.Label(returnTarget, Expression.Constant(null, typeof(RoleDetails)))
      ),
      sourceParameter,
      titleParameter,
      startDateParameter,
      endDateIfAnyParameter
    ).Compile();
}

Note the use of "Expression.OrElse" above. We use this since we're dealing with boolean logic (eg. is the start-date-value new or is the end-date-value new). There is an "Expression.Or" method, but that is for numeric operations (eg. 1 & 4).

Winning!

At this point, we've achieved what I laid out as the original intention.

I'm not going to pretend that it's particularly beautiful or succinct - especially compared to the code that you would write by hand. But then, if you had a scenario where you could write this by hand then you probably would, rather than having to resort to writing code that generates more code! And when I think about how LINQ expressions compare to the "old school" alternative of directly generating IL then it's a lot more read-and-write-able. Perhaps "maintainable" is a better word for it :)

IL generation, unfortunately, still has its place - it's been a while since I looked into this, but I think that if you wanted to generate an entire class, rather than a delegate, then you have to resort to emitting IL.

Debugging

Debugging compiled expressions can be a mixed bag. If you make mistakes that compile but cause errors at runtime then you may get a helpful error or you may get something fairly cryptic.

The safest approach, I've found, is to construct the code in the smallest functional units you can and to then test it with code that exercises every path in the generated expression. In the example above, this was done by starting with code that simply extracted a single property value from the source argument. Then multiple property values were extracted and passed into a constructor method. Then the delegate signature was changed to take multiple arguments and to use these with the constructor. Then these arguments were changed to the use Optional type and use its "GetValue" method to fall back to the source properties if required. Finally a guard clause was added for a null source reference and a condition added to return the source reference straight back if none of the arguments indicate that a new instance is required. If any of these steps introduced a new error, it should have been relatively easy to work out what went wrong.

To illustrate, if you were adding the code that will "exit early" if no new instance is required, and you forgot to specify a type for the null constant - ie. instead of

Expression.Label(returnTarget, Expression.Constant(null, typeof(RoleDetails)))

you wrote

Expression.Label(returnTarget, Expression.Constant(null))

then you would get an error (if you called the code with arguments that executed the no-new-instance-required code path):

Expression of type 'System.Object' cannot be used for label of type 'Tester.RoleDetails'

I would say that this could be considered half way between helpful and cryptic.. if you realise what it means then it's perfectly sensible, but if you can't see what it's referring to (and it's not like you will get the line number of the incorrect Expression call to help you - you'll just get an exception when "Expression.Block" is executed) then it can appear somewhat incomprehensible.

As I said before, an expression block must have a consistent return type, and the block in the code above should return a RoleDetails instance - but "Expression.Constant(null)" defaults to type System.Object since it is not instructed otherwise. If the code is built and tested step-by-step, then it should be fairly simple to trace the error back to its source.

Another approach for examining the behaviour of expressions is to look at the "Body" property of the lambda expression. This must be done on the return value of "Expression.Lambda", before "Compile" is called. It also requires that the expression be valid. So this will not apply to the above example, where the label is of an incorrect type, since that will result in an exception being thrown when "Expression.Block" is called (since that method does work to ensure that the content described obeys its rules for consistency).

It may be useful, though, in a case where you have an expression that is valid but that does not behave as you expect. If you take the code above and tweaked it a bit by changing

return
  Expression.Lambda<RoleDetailsUpdater>(
    // .. the rest of the expression generation code still goes here
  ).Compile();

into

var lambda = Expression.Lambda<RoleDetailsUpdater>(
  // .. the rest of the expression generation code still goes here
);
return lambda.Compile();

then you could insert a break point before the return statement and look at the "Body" property of the "lambda" reference. It would have the following value:

.Block() {
  .If ($source == null) {
    .Throw .Constant<System.ArgumentNullException>(
      System.ArgumentNullException: Value cannot be null.Parameter name: source
    )
  } .Else {
    .Default(System.Void)
  };
  .If (
    .Call $title.IndicatesChangeFromValue($source.Title)
      || .Call $startDate.IndicatesChangeFromValue($source.StartDate)
      || .Call $endDateIfAny.IndicatesChangeFromValue($source.EndDateIfAny)
  ) {
    .Return #Label1 { .New Tester.RoleDetails(
    .Call $title.GetValue($source.Title),
    .Call $startDate.GetValue($source.StartDate),
    .Call $endDateIfAny.GetValue($source.EndDateIfAny)) }
  } .Else {
    .Return #Label1 { $source }
  };
  .Label
    null
  .LabelTarget #Label1:
}

This isn't exactly C# but it does illustrate the code paths in a way that is fairly comprehensible and may highlight any logic errors you've made.

Performing other manipulations

At this point, I'd say we've covered a lot of the basic and - hopefully - you're set up to break into the reference material (such as the MSDN docs). Most of the static "Expression" methods are sensibly named and so usually fairly easy to find information for with a little searching (or just relying on intellisense).

If, for example, you had an Expression whose type was an array and you wanted an element from that array, you would use "Expression.ArrayAccess" (whose first parameter is the target Expression and the next is a set of index expression - the number of required index expressions will depend upon the number of dimensions the array has).

Something I particularly remember finding difficult to track an example for was code where you needed local variables within a block. There are examples for accessing arguments and setting lambda return values and altering the values of the arguments, but setting a local variable within a block scope.. not as easy to find.

I may well have been having a bad day when I was struggling with it - once you see how it's implemented, it looks easy! But I thought I'd use it as an excuse for another example. Because I'm still not an expert in writing this sort of code, I prepared this example in Visual Studio in the manner in which I explained above; bit-by-bit, starting with an implementation that returned a constant, then one that returned the hash code of a single property, then extended it to cover all of the variables and then deal with the special cases like a null "source" reference and then ValueType properties and then static properties.

The example I had in mind was a way to generate a method that would calculate a hash code for a given type. As it stands, in isolation, it's not quite a real-world requirement - but hopefully you can conceive of how something not completely dissimilar could be useful. Plus it's a nice size in that it's not quite trivial but not enormous - and it reiterates some of the same techniques seen above.

So.. if this was to be written using just reflection, it could be something like this:

public Func<T, int> GetHashCodeGenerator<T>()
{
  // ToArray is called against these properties since the set is going to be enumerated every time
  // the returned hash code generator is executed - so it makes sense to do the work to retrieve
  // this data only once and then stash it away
  var properties = typeof(T).GetProperties()
    .Where(p => p.CanRead)
    .OrderBy(p => p.Name)
    .ToArray();
  var firstIndexedProperty = properties.FirstOrDefault(p => p.GetIndexParameters().Any());
  if (firstIndexedProperty != null)
  {
    throw new ArgumentException(string.Format(
      "Indexed property encountered, this is not supported: {0}",
      firstIndexedProperty.Name
    ));
  }

  return source =>
  {
    if (source == null)
      throw new ArgumentNullException("source");

    var hashCode = 0;
    foreach (var property in properties)
    {
      // Even if it's a static property, there is no problem executing property.GetValue(source),
      // it will return the same value for any instance, but it won't throw an exception
      var propertyValue = property.GetValue(source);
      var propertyValueHashCode = (propertyValue == null) ? 0 : propertyValue.GetHashCode();
      hashCode = 3 * (hashCode ^ propertyValueHashCode);
    }
    return hashCode;
  };
}

This is called (using our faithful RoleDetails example class) in the manner:

// Retrieve a delegate that takes a RoleDetails instance and returns a hash code
var roleDetailsHashCodeGenerator = GetHashCodeGenerator<RoleDetails>();

// Use this delegate to generate some hash codes
var role1HashCode = roleDetailsHashCodeGenerator(role1);
var role2HashCode = roleDetailsHashCodeGenerator(role2);

Note: If you look carefully you'll see something odd - when I call "GetProperties" I use "OrderBy" on the results. This is because I included the code above (which relies on reflection for every call) and the code that I'll get to below (which uses reflection to generate a compiled expression to perform the same work) in the same test program and wanted them to return the same hash code for any given reference. Which doesn't seems unreasonable. But the algorithm requires that the properties be reported in a consistent order if the two implementations of that algorithm are to return matching hash codes. However, the MSDN article for "Type.GetProperties" states that

Your code must not depend on the order in which properties are returned, because that order varies.

So if I was just writing a single version of this method then I probably wouldn't bother with the OrderBy call, but since I wanted to write a LINQ-expressions-based version and a non-expressions-based version and I wanted the two versions to return identical results then I do need to be explicit about the property ordering.

Returning to the matter in hand

If you're happy with everything covered so far, then the code coming up won't pose any problem.

There are some new Expression methods calls - "Expression.MultiplyAssign" corresponds to the C# statement "x *= y" or "x = x * y" and "Expression.ExclusiveOrAssign" corresponds to "x ^= y" or "x = x ^ y" (the XOR operation).

It's worth being aware that LINQ expressions can be used to define looping constructs - such as a for, foreach or do-while loop - using "Expression.Loop" but you'll have to implement some of the logic of yourself. For a "for" loop, for example, you'll need to declare a local variable that is altered each iteration and then checked against a particular condition each time to determine whether the loop should be exited. For a "foreach" loop, you'll need to call the "GetEnumerator" method on the loop target and use the methods on that to proceed through or exit the loop.

Here, though, I don't need any loops within the expressions. Since I know what properties must be considered when generating the expressions, I'm uneffectively unrolling the loop to generate a set of statements that retrieve a hash code for each property value and then combine them with an accumulator to come up with the final value.

public Func<T, int> GetCompiledHashCodeGenerator<T>()
{
  var properties = typeof(T).GetProperties().Where(p => p.CanRead).OrderBy(p => p.Name);
  var firstIndexedProperty = properties.FirstOrDefault(p => p.GetIndexParameters().Any());
  if (firstIndexedProperty != null)
  {
    throw new ArgumentException(string.Format(
      "Indexed property encountered, this is not supported: {0}",
      firstIndexedProperty.Name
    ));
  }

  var sourceParameter = Expression.Parameter(typeof(T), "source");
  var accumulatorVariable = Expression.Variable(typeof(int), "accumulator");

  var blockExpressions = new List<Expression>();

  // Check for a null "source" reference (can be skipped entirely if T is a ValueType)
  if (!typeof(T).IsValueType)
  {
    blockExpressions.Add(
      Expression.IfThen(
        Expression.Equal(sourceParameter, Expression.Constant(null)),
        Expression.Throw(Expression.Constant(new ArgumentNullException("source")))
      )
    );
  }

  // Calculate a combined hash by starting with zero and then enumerating the properties -
  // performing an XOR between the accumulator and the current property value and then
  // multiplying before continuing
  blockExpressions.Add(
    Expression.Assign(accumulatorVariable, Expression.Constant(0))
  );
  var getHashCodeMethod = typeof(object).GetMethod("GetHashCode");
  foreach (var property in properties)
  {
    // Static properties must specify a null target, otherwise there will be an exception
    // thrown at runtime: "Static property requires null instance, non-static property
    // requires non-null instance."
    var isStaticProperty = property.GetGetMethod().IsStatic;
    var propertyValue = Expression.Property(isStaticProperty ? null : sourceParameter, property);
    if (property.PropertyType.IsValueType)
    {
      // If the property is a ValueType then we don't have to worry about calling GetHashCode
      // on a null reference..
      blockExpressions.Add(
        Expression.ExclusiveOrAssign(
          accumulatorVariable,
          Expression.Call(propertyValue, getHashCodeMethod)
        )
      );
    }
    else
    {
      // .. otherwise we need to check for null and default to a zero hash code if this is the
      // case (I picked zero since it's what Nullable<T> returns from its GetHashCode method
      // if that Nullable<T> is wrapping a null value).
      //
      // Proof-reading update: I've just realised that an XOR-assign operation with zero is
      // equivalent to no-operation and so we could do nothing if the property value is null.
      // But by this point, I've already completed the first draft of the post and done some
      // performance comparisons and I'm too lazy to do that all again! Maybe no-one will pick
      // up on the algorithm mistake and then also not read this comment. If you *are* reading
      // it.. well, hi! :)
      blockExpressions.Add(
        Expression.IfThenElse(
          Expression.Equal(propertyValue, Expression.Constant(null)),
          Expression.ExclusiveOrAssign(accumulatorVariable, Expression.Constant(0)),
          Expression.ExclusiveOrAssign(
            accumulatorVariable,
            Expression.Call(propertyValue, getHashCodeMethod)
          )
        )
      );
    }
    blockExpressions.Add(
      Expression.MultiplyAssign(accumulatorVariable, Expression.Constant(3))
    );
  }

  // The last expression in the block indicates the return value, so make it the
  // "accumulatorVariable" reference
  blockExpressions.Add(accumulatorVariable);

  // This Expression.Block method signature takes a set of local variable expressions (in this
  // case there's only one; the accumulatorVariable) followed by the expressions that form the
  // body of the block
  return
    Expression.Lambda<Func<T, int>>(
      Expression.Block(
        new[] { accumulatorVariable },
        blockExpressions
      ),
      sourceParameter
    ).Compile();
}

I wrote this in the way I recommended earlier; in bite-size chunks. Maybe one day I'll be able to just sit down and reel out complex trees of expressions in one fell swoop and instinctively know where any errors I introduce originate.. actually, if I've got a fictional "one day" then I might as well fantasise that I won't make any mistakes that need tracking down in the first place - which will be even better! However, today, I need to break it down and confirm each step.

The first step was to take a source argument, call GetHashCode on it and return that directly - using the builtin "GetHashCode" method on System.Object, rather than implementing the logic myself. The next step was to loop through each property and generate expressions that would retrieve that property's value from the source reference and combine it with a local accumulator variable, returning this accumulator variable's final value at the end of the block. Then I added null checks to the property retrievals, defaulting to a hash code of zero to prevent trying to call GetHashCode on a null reference (zero is consistent with the hash code returned from Nullable<T> when it wraps a null value). Then I realised that this check could be skipped entirely if the property's PropertyType was a value-type, since that could never be null! The logic that dealt with static properties was added next. Finally, an if-null guard clause was added so that an ArgumentNullException would be raised if the source reference is null. Again, I realised that if the source type was a value-type then this was unnecessary - there's no need to check something for null if you know that it never can be null.

Each step was small enough that any problems could easily be traced to their source but each one contributed real functionality.

It was interesting that the expression-generating code lent itself to these "short cuts" in the final code (the don't-check-for-null-if-the-type-is-such-that-it-never-can-be-null short cuts). In the reflection version, we could write something like

if (!typeof(T).IsValueType && (source == null))
  throw new ArgumentNullException("source");

but there's no advantage to that over

if (source == null)
  throw new ArgumentNullException("source");

In the LINQ expression version, the advantage is that if "T" is a value type then the hash code generation will not include the if-null clause at all!

This is a micro-optimisation, perhaps, especially when we're hoping for much greater saves overall due to avoiding reflection each time our version of GetHashCode is called. And, in a way, it probably is a micro-optimisation, but I would also argue that it's correct and it makes sense to do it regardless of any performance improvement, since it matches the object graph more closely and shows that you've thought about what you're actually doing.

Talking of performance.. the point of this article was to talk about how to write LINQ expressions, it wasn't about when they should be used. But since we now have two implementations of a generic GetHashCode method - one requiring reflection for each call and one using a compiled expression - surely it would be silly not to take it as anecdotal evidence of the potential performance improvements??

Here's the meat of a console app that should illustrate it. To be as accurate as possible, it needs to be built in release configuration and then run several times. Run at the command line (rather than through Visual Studio) to hook in as little debugging jiggery pokery as possible -

var role1 = new RoleDetails("Penguin Cuddler", DateTime.Now, null);

var reflectionBasedHashCodeGenerator = GetHashCodeGenerator<RoleDetails>();
var compiledHashCodeGenerator = GetCompiledHashCodeGenerator<RoleDetails>();

var timerReflection = new Stopwatch();
var timerCompiled = new Stopwatch();
for (var outerLoop = 0; outerLoop < 100; outerLoop++)
{
  timerReflection.Start();
  for (var innerLoop = 0; innerLoop < 10000; innerLoop++)
    reflectionBasedHashCodeGenerator(role1);
  timerReflection.Stop();

  timerCompiled.Start();
  for (var innerLoop = 0; innerLoop < 10000; innerLoop++)
    compiledHashCodeGenerator(role1);
  timerCompiled.Stop();
}
Console.WriteLine("Total Reflection: " + timerReflection.ElapsedMilliseconds + "ms");
Console.WriteLine("Total Compiled: " + timerCompiled.ElapsedMilliseconds + "ms");
Console.WriteLine(
  "Improvement: {0}x",
  ((double)timerReflection.ElapsedMilliseconds / timerCompiled.ElapsedMilliseconds).ToString("0.00")
);

I ran this three times and got an average 3000ms for the reflection-only approach and 186ms for the compiled-expression version. That's a 16.1x improvement - not bad!

Now, there is a cost to building and compiling these expressions. I would say that if you're not expecting to execute them hundreds of thousands of times, then what's the point of going through the pain of writing the expression-generating code - it would be much easier to just write the reflection-based version. And if you're running it (at least) hundreds of thousands of times, then the overhead of compiling the expressions will be negligible.

But maybe that rationalisation would be a cop-out.

So I changed the above code to consider the time taken to call GetHashCodeGenerator and GetCompiledHashCodeGenerator, such that it contributed to the timerReflection and timerCompiled totals.

With this change, the averages over three runs were now 3008ms and 204ms, giving an average performance multiplier of 14.8 times (where each run performed a million calls per implementation). Still a very respectable performance bump for somewhere that you know it will make a difference (again, why bother with all this hard work if you don't already know that it's worth it - or, rather, if a profiler hasn't shown you that it will be).

A couple of years ago, I wrote a library that basically aimed to be "like AutoMapper but supporting instantiate-by-constructor" (actually, I started it with the intention of it being an extension to AutoMapper to support instantiate-by-constructor - which it can still be used as - but then it took on a bit of a life of its own and became happy to stand on its own two feet). Earlier this year, I updated it to support optional constructor arguments and thought I'd look at how the performance of the library compared to AutoMapper. My library has less functionality than AutoMapper (though it does, of course, have the automated instantiate-by-constructor feature, which AutoMapper does not - the reason I wrote the library!) but by doing less and using compiled expressions for the entire translation, it tackled the example I was using (which was not chosen to try to game the system in any way, incidentally) 100x faster for each translation. Which just goes to show that if you avoid work you don't need to do (like the micro-optimisations above) and speed up the big slow bits (replacing repeated reflection with compiled expressions) you can get serious gains! You can read about it in Reflection and C# optional constructor arguments.. just in case you want to pick holes in my reasoning :)

Posted at 23:07

Tags:

Comments

17 September 2014

Implementing F#-inspired "with" updates for immutable classes in C#

I've been prototyping a data service for a product at work that communicates with immutable types and one of the feedback comments was a question as to whether the classes supported a flexible F#-esque "with" method that would allow multiple properties to be changed without the garbage collection churn of creating intermediate references for each individual property (since, of course, the property values aren't actually changed on an instance, a new instance is generated that reflects the requested changes).

To pull an example straight from the excellent F# for fun and profit site:

let p1 = {first="Alice"; last="Jones"}
let p2 = {p1 with last="Smith"}

This creates a new record p2 that takes p1 and changes one of the fields. Multiple fields may be altered in one use "with" statement

let p2 = {p1 with first="John";last="Smith"}

To start with a very simple example in C#, take the following class:

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    Title = title;
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  public string Title { get; private set; }
  public DateTime StartDate { get; private set; }
  public DateTime? EndDateIfAny { get; private set; }
}

This is a very close parallel to the F# record type since it just assigns read-only properties (they're not strictly read-only since they don't use the "readonly" keyword but they're not externally alterable and are only set once within the class so it's close enough).

If I was writing something like this for real use, I would probably try to make more guarantees.. or at least, document behaviour. Something like:

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    if (string.IsNullOrWhiteSpace(title))
      throw new ArgumentException("title");
    if ((endDateIfAny != null) && (endDateIfAny <= startDate))
      throw new ArgumentException("title");

    Title = title.Trim();
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  /// <summary>
  /// This will never be null or blank, it will not have any leading or trailing whitespace
  /// </summary>
  public string Title { get; private set; }

  public DateTime StartDate { get; private set; }

  /// <summary>
  /// If non-null, this will greater than the StartDate
  /// </summary>
  public DateTime? EndDateIfAny { get; private set; }
}

As I've said before, this validation and commenting is really a poor substitute for code contracts which would allow for compile time detection of invalid data rather than relying on runtime exceptions (speaking of which, I need to give the .net code contracts solution another go - last time I got stuck in I hit some problems which hopefully they've ironed out by now).

Another variation on the "aggressive validation" illustrated above would be a type that represents a non-blank string to prevent duplicating calls to IsNullOrWhiteSpace and trim. This concept could be taken even further to "strongly type" string values so that a "Title" can not be passed into a function that expects a "Notes" string value, for example. This is far from an original idea but it was something I was experimenting again with recently.

Incidentally, there is talk of a future version of C# getting a record type which would reduce boilerplate code when defining simple immutable types. For example (from the InfoQ article Easier Immutable Objects in C# 6 and VB 12) -

public record class Cartesian(double x: X, double y: Y);

This will define an immutable class with two read-only properties that are set through a constructor call. This future C# specification is also apparently going to allow read-only auto properties - so in my RoleDetails class above instead of "get; private set;" properties, which are externally unalterable but could actually be changed within the instance, the properties could be truly readonly. This is possible currently but it requires a private readonly field and a property with a getter that returns that field's value, which is even more boring boilerplate.

The obvious, verbose and potentially more GC-churny way

To prevent callers from having to call the constructor every time a property needs to be altered, update methods for each "mutable" property may be added (they don't really mutate the values since a new instance is returned rather than the value changed on the current instance). This prevents the caller from having to repeat all of the constructor arguments that are not to be changed whenever one property needs altering. Forcing callers to call constructors in this way is particularly annoying if a constructor argument is added, removed or re-ordered at a later date; this can result in a lot of calling code that needs correcting.

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    Title = title;
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  public string Title { get; private set; }
  public DateTime StartDate { get; private set; }
  public DateTime? EndDateIfAny { get; private set; }

  public RoleDetails Update(string title)
  {
    return (title == Title)
      ? this
      : new RoleDetails(title, StartDate, EndDateIfAny);
  }
  public RoleDetails UpdateStartDate(DateTime startDate)
  {
    return (startDate == StartDate)
      ? this
      : new RoleDetails(Title, startDate, EndDateIfAny);
  }
  public RoleDetails UpdateEndDateIfAny(DateTime? endDateIfAny)
  {
    return (endDateIfAny == EndDateIfAny)
      ? this
      : new RoleDetails(Title, StartDate, endDateIfAny);
  }
}

To update two properties on a given instance, you would need to call

var updatedRoleDetails = existingRoleDetails
  .UpdateStartDate(new DateTime(2014, 9, 21))
  .UpdateEndDateIfAny(new DateTime(2014, 11, 21));

If either of the new values is the same as the property value that it should be replacing, then no new instance is required for that property update - since the Update*{Whatever}* method will return back the same instance. But if both properties are changed then two new instances are required even though the first, intermediate value is immediately discarded and so is "wasted".

There could be an Update method that takes multiple parameters for the different properties but then you're basically just mirroring the constructor. Or there could be various Update methods that took combinations of properties to try to cover either the most common cases or all combinations of cases, but neither of these are particularly elegant and they would all result in quite a lot of code duplication.

A better way

It struck me that it should be possible to do something with named and optional method arguments (support for which was added to C# when .net 4 came out, if I remember correctly). Something like

public RoleDetails UpdateWith(
  string title = Title,
  DateTime startDate = StartDate,
  DateTime? endDateIfAny = EndDateIfAny)
{
  if ((title == Title) && (startDate == StartDate) && (endDateIfAny == EndDateIfAny))
    return this;
  return new RoleDetails(title, startDate, endDateIfAny);
}

would allow for only a subset of the arguments to be specified and for those that are left unspecified to default to the current property value of the instance. So the earlier update code becomes

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate: new DateTime(2014, 9, 21), endDateIfAny: new DateTime(2014, 11, 21));

However, this won't fly. The compiler gives the errors

Default parameter value for 'title' must be a compile-time constant

Default parameter value for 'startDate' must be a compile-time constant

Default parameter value for 'endDateIfAny' must be a compile-time constant

That's a bummer.

Another thought that briefly crossed my mind was for the default argument values to all be null. This would work if the arguments were all reference types and would result in the method body looking something like

if ((title == null) && (startDate == null) && (endDateIfAny == null))
  return this;
return new RoleDetails(title ?? Title, startDate ?? StartDate, endDateIfAny ?? EndDateIfAny);

But that is too restrictive a constraint since in this case we have a non-reference type argument (startDate) and we also have a reference type for which null is a valid value (endDateIfAny).

So what we really need is a wrapper type around the arguments that indicates when no value has been specified. Since we're being conscious of avoiding GC churn, this should be a struct since structs essentially avoid adding GC pressure since they are always copied when passed around - this means that no struct is referenced by multiple scopes and so they don't have to be traced in the same way as reference types; when the scope that has access to the struct is terminated, the struct can safely be forgotten as well. This is not a particularly precise description of what happens and more details can be found in the MSDN article Choosing Between Class and Struct. Particularly see the paragraph

The first difference between reference types and value types we will consider is that reference types are allocated on the heap and garbage-collected, whereas value types are allocated either on the stack or inline in containing types and deallocated when the stack unwinds or when their containing type gets deallocated. Therefore, allocations and deallocations of value types are in general cheaper than allocations and deallocations of reference types.

The other guidelines in that article around cases where structs may be appropriate (if the type "logically represents a single value", "has an instance size under 16 bytes", "is immutable" and "will not have to be boxed frequently") are followed by this type:

public struct Optional<T>
{
  private T _valueIfSet;
  private bool _valueHasBeenSet;

  public T GetValue(T valueIfNoneSet)
  {
    return _valueHasBeenSet ? _valueIfSet : valueIfNoneSet;
  }

  public bool IndicatesChangeFromValue(T value)
  {
    if (!_valueHasBeenSet)
      return false;

    if ((value == null) && (_valueIfSet == null))
      return false;
    else if ((value == null) || (_valueIfSet == null))
      return true;

    return !value.Equals(_valueIfSet);
  }

  public static implicit operator Optional<T>(T value)
  {
    return new Optional<T>
    {
      _valueIfSet = value,
      _valueHasBeenSet = true
    };
  }
}

This type allows us to write an UpdateWith method

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  if (!title.IndicatesChangeFromValue(Title)
  && !startDate.IndicatesChangeFromValue(StartDate)
  && !endDateIfAny.IndicatesChangeFromValue(EndDateIfAny))
    return this;

  return new RoleDetails(
    title.GetValue(Title),
    startDate.GetValue(StartDate),
    endDateIfAny.GetValue(EndDateIfAny)
  );
}

The Optional type could have exposed properties for has-a-value-been-set and get-value-if-any but since each property comparison (to determine whether a new instance is actually required) would have to follow the pattern if-value-has-been-set-and-if-value-that-has-been-set-does-not-equal-current-value, it made sense to me to hide the properties and to instead expose only the access methods "IndicatesChangeFromValue" and "GetValue". The "IndicatesChangeFromValue" method returns true if the Optional describes a value that is different to that passed in and "GetValue" returns the wrapped value if there is one, and returns the input argument if not. This enables the relatively succinct "UpdateWith" method format shown above.

The other method on the struct is an implicit operator for the wrapped type which makes the "UpdateWith" calling code simpler. Instead of having to do something like

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate = Optional<DateTime>(new DateTime(2014, 9, 21)));

the implicit conversion allows you to write

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate = new DateTime(2014, 9, 21));

because the DateTime will be implicitly converted into an Optional<DateTime>. In fact, I went one step further and made it such that this is the only way to create an Optional that wraps a value. There is no constructor that may be used to initialise an Optional with a value, you must rely upon the implicit conversion. This means that it's very clear that there's only one way to use this type. It also happens to be very similar to the most common way that the Nullable type is used in C# - although that does have a public constructor that accepts the value to wrap, in practice I've only ever seen values cast to Nullable (as opposed to the Nullable constructor being passed the value).

Turning it up to eleven

Now this is all well and good and I think it would be a solid leap forward simply to leave things as they are shown above. Unnecessary GC pressure is avoided since there are no "intermediary" instances when changing properties, while the use of structs means that we're not generating a load of property-update-value references that need to be collected either.

But I just couldn't resist trying to push it a bit further since there's still quite a lot of boring code that needs to be written for every immutable type - the UpdateWith method needs to check all of the properties to ensure that they haven't changed and then it needs to pass values into a constructor. If a class has quite a lot of properties (which is not especially unusual if the types are representing complex data) then this UpdateWith method could grow quite large. Wouldn't it be nice if we could just write something like:

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return magicUpdater(title, startDate, endDateIfAny);
}

Wouldn't it?? Yes it would.

And we can.. if we dip into some of the .net framework's crazier parts - reflection and stack tracing. With some LINQ expressions thrown in to make it work efficiently when called more than once or twice.

What this "magicUpdater" needs to do is take the names and values of the arguments passed to it and then analyse the target type (RoleDetails in this example) to find the constructor to call that will allow all of these named values to be passed into a new instance, using existing property values on the source instance for any constructor arguments that are not provided by the update arguments. It also needs to do the same work to determine whether the update arguments actually require a new instance to be generated - if only the StartDate is being provided to change but the new value is the same as the current value then no new instance is required, the source instance can be returned directly by the "magicUpdater".

This is handled by two steps. The first based around this line:

var callingMethod = new StackFrame(1).GetMethod();

It returns a MethodBase with metadata about the method that called the "magicUpdater" (the "1" in the call above is how many steps back to go in the call stack). From this the names of the arguments can be extracted and a delegate returned which will take the argument values themselves. So the call would actually look more like (if this "magicUpdater" method return a delegate which then must itself be called):

return magicUpdater()(title, startDate, endDateIfAny);

Before we move on to the second step, there are some important considerations in relation to the use of StackFrame. Firstly, there is some expense to performing analysis like this, as with using reflection - but we'll not worry about that here, some optimisations will be covered later which hopefully mean we can ignore it. What's more important is that analysing the call stack can seem somewhat.. unreliable, in a sense. In the real world, the code that gets executed is not always the code as it appears in the C# source. A release build will apply optimisations that won't be applied to debug builds and when code is manipulated by the JIT compiler more optimisations again may occur - one of the more well-known of which is "method inlining". Method inlining is when the compiler sees a chain of Method1 -> Method2 -> Method3 -> Method4 and observes that Method2 is so small that instead of being a distinct method call (which has a cost, as every method call does - the arguments have to be passed into a new scope and this must be considered by the garbage collector; as a very basic example of one of these costs) the code inside Method2 can be copied inside Method1. This would mean that if Method3 tried to access Method2's metadata through the StackFrame class, it would be unable to - it would be told it was called by Method1!

There's a short but informative article about this by Eric Gunnerson: More on inlining. In a nutshell it says that -

Methods that are greater than 32 bytes of IL will not be inlined.
Virtual functions are not inlined.
Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while.
Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
If any of the method's formal arguments are structs, the method will not be inlined.

This means that we shouldn't have to worry about the UpdateWith method being inlined (since its arguments are all Optional which are structs), but the "magicUpdater" method may be a concern. The way that my library gets around that is that the method "GetGenerator" on the UpdateWithHelper class (it's not really called "magicUpdater" :) has the attribute

[MethodImpl(MethodImplOptions.NoInlining)]
public UpdateWithSignature<T> GetGenerator<T>(int numberOfFramesFromCallSite = 1)

which tells the JIT compiler not to inline it and so, since the caller isn't inlined (because of the structs), we don't have to worry about stack "compressing".

This "GetGenerator" method, then, has access to the argument names and argument types of the method that called it. The generic type param T is the immutable type that is being targeted by the "UpdateWith" method. UpdateWithSignature<T> is a delegate with the signature

public delegate T UpdateWithSignature<T>(T source, params object[] updateValues);

This delegate is what takes the property update values and creates a new instance (or returns the source instance if no changes are required). It does this by considering every public constructor that T has and determining what constructor arguments it can satisfy with update arguments. It does this by matching the update argument names to the constructor argument names and ensuring that the types are compatible. If a constructor is encountered with arguments that don't match any update arguments but T has a property whose name and type matches the constructor argument, then that will be used. If a constructor argument is encountered that can't be matched to an update argument or a property on T but the constructor argument has a default value, then the default value may be used if the constructor.

If a constructor does not have at least one argument that can be matched to each update argument name, then that constructor is ignored (otherwise an update argument would be ignored, which would make the UpdateWith somewhat impotent!). If there are multiple constructors that meet all of these conditions, they are sorted by the number of arguments they have that are fulfilled by update arguments and then sorted by the number of arguments that are satisfied by other properties on T - the best match from this sorted set it used.

The return UpdateWithSignature<T> delegate itself is a compiled LINQ expression so, once the cost of generating it has been paid the first time that it's required, the calls to this delegate are very fast. The "GetGenerator" method caches these compiled expressions, so the method

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return DefaultUpdateWithHelper.GetGenerator<RoleDetails>()(this, title, startDate);
}

can be called repeatedly and cheaply.

Note that in the above example, the DefaultUpdateWithHelper is used. This is a static wrapper around the UpdateWithHelper which specifies a default configuration. The UpdateWithHelper takes arguments that describe how to match update argument names to constructor argument names, for example (amongst other configuration options). The implementation in the DefaultUpdateWithHelper matches by name in a case-insensitive manner, which should cover the most common cases. But the relevant UpdateWithHelper constructor argument is of type

public delegate bool UpdateArgumentToConstructorArgumentComparison(
  ParameterInfo updateArgument,
  ConstructorInfo constructor,
  ParameterInfo constructorArgument);

so a custom implementation could implement any complex scheme based upon target type or constructor or update argument type.

The UpdateWithHelper also requires a cache implementation for maintaining the compiled expressions, as well as matchers for other comparisons (such as property name to constructor argument name, for constructor arguments that can't be matched by an update argument). If a custom UpdateWithHelper is desired that only needs to override some behaviour, the DefaultUpdateWithHelper class has a static nested class DefaultValues with properties that are the references that it uses for the UpdateWithHelper constructor arguments - some of these may be reused by the custom configuration, if appropriate.

I considered going into some detail about how the LINQ expressions are generated since I think it's hard to find a good "how-to" walkthrough on these. It's either information that seems too simple or fine-grained that it's hard to put it together into something useful or it's the other extreme; dense code that's hard to get to grips with if you don't know much about them. But I feel that it would balloon this post too much - so maybe another day!

Incidentally, the DefaultUpdateWithHelper's static "GetGenerator" method inserts another layer into the call stack, which is why the UpdateWithHelper's method requires an (optional) "numberOfFramesFromCallSite" argument - so that it can be set to 2 in this case, rather than the default 1 (since it will need to step back through the DefaultUpdateWithHelper method before getting to the real "UpdateWith" method). This also means that DefaultUpdateWithHelper has the "MethodImplOptions.NoInlining" attribute on its "GetGenerator" method.

It's also worthy of note that the "GetGenerator" methods support extension methods for "UpdateWith" implementations, as opposed to requiring that they be instance methods. So the following is also acceptable

public static RoleDetails UpdateWith(
  this RoleDetails source,
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return DefaultUpdateWithHelper.GetGenerator<RoleDetails>()(source, title, startDate);
}

The analysis detects that the first argument is not an OptionalType<T> and asserts that its type is assignable to the type param T and then ignores it when generating the translation expression. The extension method will pass through the "source" reference where "this" was used in the instance method implementation shown earlier.

Further performance optimisations

Although the compiled "generator" expressions are cached, the cache key is based upon the "UpdateWith" method's metadata. This means that the cost of accessing the StackFrame is paid for every "UpdateWith" call, along with the reflection access to get the UpdateWith argument's metadata. If you feel that this might be an unbearable toll, a simple alternative is something like

private static UpdateWithSignature<RoleDetails> updater
  = DefaultUpdateWithHelper.GetGenerator<RoleDetails>(typeof(RoleDetails).GetMethod("UpdateWith"));
public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return updater(this, title, startDate);
}

The "GetGenerator" methods have alternate signatures that accept a MethodBase reference relating to the "UpdateWith" method, rather than relying upon StackFrame to retrieve it. And using a static "updater" reference means that "GetGenerator" is only ever called once, so subsequent calls that would require reflection in order to check for a cached expression are avoided entirely. The trade-off is that the method must be named in a string, which would break if the method was renamed. Not quite as convenient as relying upon stack-tracing magic.

If you really want to get crazy, you can go one step further. If part of the reason for this experiment was to reduce GC pressure, then surely the params array required by the UpdateWithSignature<T> is a step backwards from the less-automated method, where the number of update arguments is known at compile time? (Since that didn't require a params array for a variable number of arguments, there were no method calls where the precise number of update arguments was unknown). Well that params array can be avoided if we make some more trade-offs. Firstly, we may only use an approach like above, which doesn't rely on expression caching (ie. use a static property that requests a generator only once). Secondly, there may only be up to nine update arguments. The first reason is because the cache that the UpdateWithHelper uses records UpdateWithSignature<T> references, which are no good since they use the params array that we're trying to avoid. The second reason is because a distinct delegate is required for each number of arguments, as is a distinct method to construct the generator - so there had to be a limit somewhere and I chose nine. The methods are

public UpdateWithSignature1<T> GetUncachedGenerator1<T>(MethodBase updateMethod)
public UpdateWithSignature2<T> GetUncachedGenerator2<T>(MethodBase updateMethod)
public UpdateWithSignature3<T> GetUncachedGenerator3<T>(MethodBase updateMethod)
// .. etc, up to 9

and the delegates are of the form

public delegate T UpdateWithSignature1<T>(T source, object arg0);
public delegate T UpdateWithSignature2<T>(T source, object arg0, object arg1);
public delegate T UpdateWithSignature3<T>(T source, object arg0, object arg1, object arg2);
// .. etc, up to 9

They may be used in a similar manner to that already shown, but you must be careful to match the number of arguments required by the "UpdateWith" method. In a way, there is actually a compile-time advantage here - if you choose the wrong one, then the compiler will warn you that you have specified three update arguments when the delegate requires four (for example). With the generic form (the non-numbered "GetGenerator" method), the params array means that you can specify any number of update arguments and you won't find out until runtime that you specified the wrong amount.

So, to illustrate -

private static UpdateWithSignature3<RoleDetails> updater
  = DefaultUpdateWithHelper.GetUncachedGenerator3<RoleDetails>(
    typeof(RoleDetails).GetMethod("UpdateWith"));

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return updater(this, title, startDate, endDateIfAny);
}

If I'm being honest, however, if you really think that this optimisation is beneficial (by which, I mean you've done performance analysis and found it to be a bottleneck worth addressing), you're probably better replacing this automated approach with the hand-written code that I showed earlier. It's not all that long and it removes all of this "magic" and also gives the compiler more opportunity to pick up on mistakes. But most importantly (in terms of performance) may be the fact that all update arguments are passed as "object" in these delegates. This means that any value types (ints, structs, etc..) will be boxed when they are passed around and then unboxed when used as constructor arguments. This is explained very clearly in the article 5 Basic Ways to Improve Performance in C# and more information about the use of the heap and stack can be found at Six important .NET concepts: Stack, heap, value types, reference types, boxing, and unboxing - I'd not seen this article before today but I thought it explained things really clearly.

Chances are that you won't have to worry about such low level details as whether values are being boxed-unboxed 99% of the time and I think there's a lot of mileage to be had from how convenient this automated approach is. But it's worth bearing in mind the "what ifs" of performance for the times when they do make a difference.

Any other downsides to the automagical way?

I can't claim to have this code in production anywhere yet. But I'm comfortable enough with it at this stage that I intend to start introducing it into prototype projects that it will be applicable to - and then look to using it in real-world, scary, production projects before too long! My only concern, really, is about making silly mistakes with typos in update argument names. If I mistyped "tittle" in the RoleDetails "UpdateWith" example I've been using, I wouldn't find out until runtime that I'd made the mistaken - at which point, the "GetGenerator" call would throw an exception as it wouldn't be able to match "tittle" to any argument on any accessible constructor. I think the trade-off here would be that every "UpdateWith" method that used this library would need a unit test so that discovering the problem at "runtime" doesn't mean "when I hit code in manual testing that triggers the exception" but rather equates to "whenever the test suite is run - whether locally or when pushed to the build server". I doubt that Update methods of this type would normally get a unit test since they're so basic (maybe you disagree!) but in this case the convenience of using the automated "GetGenerator" method still wins even with the (simple) unit test recommended for each one.

Now that I think about it, this is not a dissimilar situation to using a Dependency Injection framework or using AutoMapper in your code - there is a lot of convenience to be had, but at the risk that configuration errors are not exposed until the code is executed.

In summary, until I find a good reason not to use this library going forward, I intend to do so! To revisit my (F#) inspiration, how can it not be enticing to be able to write

// F#
let p2 = {p1 with first="Jim";last="Smith"}

// C#
var p2 = p1.UpdateWith(first:"Jim",last:"Smith");

with so little code having to be written to enable it?!

Go get the code at bitbucket.org/DanRoberts/updatewith!

Or alternatively, pull the NuGet package straight down from nuget.org/packages/CSharpImmutableUpdateWith.

Update (19th September 2014): There's been quite a lot of interest in this post and some good comments made here and at the discussion on Reddit/implementing-f-sharp-inspired-with-updates-for-immutable-classes-in-c-sharp. I intend to write a follow-up post that talks about some of the observations and includes some performance stats. In summary, though, I may have to admit to considering a slight about-turn in the crazy magical approach and strike that up as a convenience for rattling out code quickly but probably something that won't make it into production code that I write. The idea of using an "UpdateWith" method with named, optional arguments (using the Optional struct) will make it into my "real world" code, though! It's also strikingly similar to some of the code in Roslyn, it was pointed out (I'll touch on this in the follow-up too). I still had a lot of fun with the "Turning it up to eleven" investigation and I think there's useful information in here and in the library code I wrote - even more so when I get round to documenting how I approach writing the LINQ expression-generating code. But maybe it didn't result in something that should always be everyone's immediate go-to method for writing this sort of code. Such is life! :)

Update (2nd October 2014): See A follow-up to "Implementing F#-inspired 'with' updates in C#".

Posted at 23:12

Tags:

Comments

27 February 2014

Entity Framework projections to Immutable Types (IEnumerable vs IQueryable)

Last month I was looking back into one of my old projects that was inspired by AutoMapper but for mapping to immutable types, rather than mapping to mutable types (that typically have a parameterless constructor and are initialised by setting individual properties)* - see Bonus provocative headline: Like AutoMapper, but 100x faster.

When I was working on it, I had a fairly good idea of what scenarios I wanted to use it in and had no intention of trying to replicate the entire range of functionality that AutoMapper supports. However, something I was particularly taken with was the recently(ish) added support to AutoMapper for LINQ auto-projections.

* Note from the future: There was a time when AutoMapper didn't have good support for mapping to immutable types, it wouldn't apply automatic its name / type matching logic to the case where property values are read from the source type and used to provide constructor arguments on the destination type (and it was to fill that feature gap that I started writing the Compilable Type Converter). However, that situation changed at some point and now AutoMapper does have good support for mapping to immutable types - though I wasn't able to track down from the release notes when precisely that was.

Projections are when data is retrieved from an IQueryable source and mapped onto either a named class or an anonymous type - eg.

using (var context = new AlbumDataEntities())
{
  var albums = context.Albums.Select(album => new {
    Key = album.AlbumKey,
    Name = album.Name
  });

The clever thing about IQueryable and these projections is that the data source (Entity Framework, for example) can construct a query to retrieve only the data required. Without projections, the database querying and the mapping are completely separate steps and so it is very common for the ORM to have to retrieve much more data than is strictly necessary (and often to introduce the dreaded N+1 SELECT problem). Projections allows the mapping to directly affect how the database query is constructed so that precisely the right amount of data may be requested (and all in one query, not in N+1).

The code shown above could be referred to as a manual projection (where the properties - "Key" and "Name" in that example - are explicitly mapped). Auto-projection is the utilisation of a mapping library to automatically create this projection / mapping. This means that you don't have to write the boring "Key = album.AlbumKey, Name = album.Name" code for each property in a mapping. In essence, something like:

using (var context = new AlbumDataEntities())
{
  var albums = context.Albums.MapTo<ImmutableAlbum>();

When you've only got a couple of properties (like Key and Name in the earlier code), it's not too bad but it's still the sort of manual work that gets boring (and error prone) very quickly.

IQueryable vs IEnumerable

Allow me to briefly go off on a tangent..

The IEnumerable type in .net allows for lazy initialisation such that data is only processed as it is requested. For a really simple example, the following code starts off with a data "source" that will generate up to ten million objects, returned as an enumerable set. On this we call Take(5), which returns another enumerable set that will only enumerate the first five items. Until ToArray is called, none of the data is actually delivered and none of the objects are created. Even when ToArray is called, only five of the source objects are actually initialised as that is how many are actually required - the remaining 9,999,995 objects that could have been created are not since they are not required (this is lazy evaluation in action).

var fiveItemsFromLargeEnumerableRange =
  Enumerable.Range(1, 10000000).Select(i => new { Name = "Item" + i })
    .Take(5)
    .ToArray();

IEnumerable can be thought to operate on in-memory data sets. The data itself may not originate from an in-memory source. Above it does, but the source could also be something delivering lines from a log file or rows from a database. Each entry in the data, though, is fully loaded when required and then processed in memory. Although each entity is only loaded when required, if the loading of each entity is expensive and only a subset of its data is required for the operation at hand, then even this form of lazy evaluation can become a burden. IEnumerable sets do not inherently expose a way to "partially load" the entities.

It's worth noting at this point that many ORMs (including Entity Framework) support "lazy loading" of data for child properties to try to address this very point; the data for the properties of the returned objects is not loaded until the properties are accessed. At this point the database (or whatever data source is being used) is hit again to retrieve this information. The downside to this is that the data that is accessed may require multiple database hits for each entity when only a single query may have been required if "eagerly" loading all of the data for the entity. But if "eager loading" is used and only a subset of the data is required then too much data was being pulled down!

IQueryable sets have similar intentions to IEnumerable but a different approach, they are more tightly tied to the data source. Where IEnumerable sets may be considered to be in-memory (for each entity), IQueryable sets are all prepared in the data source and filtering may be applied there to prevent too much data from being sent.

To illustrate with an example, say we have data about albums. There's an Albums table with AlbumKey, Name, Year and ArtistKey fields. There's a Tracks table with TrackKey, AlbumKey, TrackNumber and Name fields. And there's an Artists table with fields ArtistKey and Name.

If I point Entity Framework at this then it will generate a model to dip into all this data. The simplest retrieval is probably for all Album names -

using (var context = new AlbumDataEntities())
{
  var allAlbumNames = context.Albums
    .Select(album => album.Name)
    .OrderBy(name => name);

  // Calling ToString on an Entity Framework IQueryable pointing at a SQL database
  // returns the SQL that will be executed to perform the query
  var allAlbumNamesQuery = allAlbumNames.ToString();
  var allAlbumNamesResults = allAlbumNames.ToArray();

  Console.WriteLine("Query:");
  Console.WriteLine(allAlbumNamesQuery);
  Console.WriteLine();

  Console.WriteLine("Results:");
  Console.WriteLine(string.Join(Environment.NewLine, allAlbumNamesResults));
}

This shows that the SQL executed was

SELECT
  [Extent1].[Name] AS [Name]
  FROM [dbo].[Albums] AS [Extent1]
  ORDER BY [Extent1].[Name] ASC

Which is pretty much what you would hope for.. but clever when you think about it. It's done some analysis of the request we've described and realised that it only needs to consider one particular column from that one table, even though it's all configured to potentially do so much more.

If instead we request

var allCombinedAlbumAndTrackNames = context.Albums
  .SelectMany(album => album.Tracks.Select(track => new {
    AlbumName = album.Name,
    TrackName = track.Name,
    TrackNumber = track.TrackNumber
  }))
  .OrderBy(combinedEntry => combinedEntry.AlbumName)
  .ThenBy(combinedEntry => combinedEntry.TrackNumber)
  .Select(combinedEntry => combinedEntry.AlbumName + "/" + combinedEntry.TrackName);

then the following SQL is executed:

SELECT
  [Project1].[C1] AS [C1]
  FROM ( SELECT
    [Extent1].[Name] + N'/' + [Extent2].[Name] AS [C1],
    [Extent1].[Name] AS [Name],
    [Extent2].[TrackNumber] AS [TrackNumber]
    FROM  [dbo].[Albums] AS [Extent1]
    INNER JOIN [dbo].[Tracks] AS [Extent2]
    ON [Extent1].[AlbumKey] = [Extent2].[AlbumKey]
  )  AS [Project1]
  ORDER BY [Project1].[Name] ASC, [Project1].[TrackNumber] ASC

This was not such a simple translation to make - this query got mapped into an interim anonymous type, there are multiple sorts and the final values are constructed by concatenating two of the fields in the interim type. Nonetheless, the SQL that was generated was very efficient and a good reflection of the data that was requested.

One more, for fun..

var namesOfTheFiveAlbumsWithTheGreatestNumberOfTracks = context.Albums
  .OrderByDescending(album => album.Tracks.Count())
  .Select(album => album.Name)
  .Take(5);

results in:

SELECT TOP (5)
  [Project1].[Name] AS [Name]
  FROM ( SELECT
    [Extent1].[Name] AS [Name],
    (SELECT
      COUNT(1) AS [A1]
      FROM [dbo].[Tracks] AS [Extent2]
      WHERE [Extent1].[AlbumKey] = [Extent2].[AlbumKey]) AS [C1]
    FROM [dbo].[Albums] AS [Extent1]
  )  AS [Project1]
  ORDER BY [Project1].[C1] DESC

This not only performed an aggregate operation (by considering the number of Tracks per Album) but also incorporated the "Take(5)" into the query. This is an example of how a request may be translated into something handled by the data source that ensures that it can deliver the bare minimum data; if the "Take(5)" call had not been translated into part of the query then more rows might have been returned than we cared about. (If the "Take(5)" call could not have been translated into part of the database query then the first five results could have been isolated by a similar "in-memory" operation to that illustrated by the 1,000,000 item IEnumerable example earlier, but it wouldn't be as efficient to do so since the additional rows would have had to have been delivered from the database and then filtered out.. which would have been wasteful).

These examples demonstrate some of the ways in which use of IQueryable can ensure that the minimum amount of data required is transmitted from the data source. None of them even touch the Artists table since none of the requests asked for Artist data! The IQueryable implementation is what performs this magic, whether that be provided by Entity Framework, NHibernate, SubSonic or whatever - it is responsible for translating expressions into SQL (or whatever language the backing data source uses; it could be another SQL-like database or it could be a document database such as MongoDB).

Applying this to mappings

In the above examples, ToArray() was used to force the retrieval / evaluation of the information. This could just as easily have been a call to ToList() or been a loop that enumerated through the data.

With IEnumerable sets, the source data is not run through until it is explicitly enumerated. With IQueryable, the data is not retrieved from the source until the IQueryable reference is treated as an IEnumerable. This is possible since IQueryable implements IEnumerable and so any method that can operate on IEnumerable may also operate on IQueryable. But what's important here is that as soon as this is done, the IQueryable reference will then "become" an IEnumerable reference and the underlying data request will have been made in order for this to happen.

The clever thing above, where the "Take(5)" method resulted in "SELECT TOP (5)" becoming part of the SQL query, comes about as LINQ has a load of extension methods for operating against IQueryable as well IEnumerable - so as well as

public static IEnumerable<TSource> Take<TSource>(
  this IEnumerable<TSource> source,
  int count
);

there is also

public static IQueryable<TSource> Take<TSource>(
  this IQueryable<TSource> source,
  int count
);

The latter ensures that an IQueryable remains as an IQueryable and so postpones its evaluation.

By the way, I am finally approaching the point of this post now, so bear with me! :)

The LINQ "Select" extension method similarly has alternative method signatures. The more common version is

public static IEnumerable<TResult> Select<TSource, TResult>(
  this IEnumerable<TSource> source,
  Func<TSource, TResult> selector
);

where a particular transformation is performed upon each item in a IEnumerable set.

But there is a corresponding signature

public static IQueryable<TResult> Select<TSource, TResult>(
  this IQueryable<TSource> source,
  Expression<Func<TSource, TResult>> selector
);

where an Expression will be translated by the IQueryable provider into the language of the underlying data source (but since the IQueryable reference remains as an IQueryable this translation won't happen yet).

The difference between Expression<Func<TSource, TResult>> and Func<TSource, TResult> is subtle but important. The compiler is clever enough that often you needn't even be aware that you're passing an Expression. Above we were performing various manipulations (such as wrapping data up in anonymous types and combining fields with string concatenation) without having to think about it. But if we tried to do something like

var nextId = 0;
var allAlbumNamesWithExternallGeneratedIds = context.Albums
  .Select(album => new { Name = album.Name, Id = ++nextId })
  .OrderBy(name => name);

we'd get a compiler error

An expression tree may not contain an assignment operator

So, unfortunately, it's not just any old lambda (aka anonymous function) that can be translated into an Expression. A different problem is encountered if we attempt to use AutoMapper to process the data - eg.

Mapper.CreateMap<Album, AlbumStub>();
var allAlbumKeyAndNames = context.Albums
  .Select(album => Mapper.Map<Album, AlbumStub>(album))
  .OrderBy(name => name);

where the target class is

public class AlbumStub
{
  public int AlbumKey { get; set; }
  public string Name { get; set; }
}

This will result in a NotSupportedException being raised by Entity Framework with the following message:

LINQ to Entities does not recognize the method 'AlbumStub Map[Album,AlbumStub](ProjectionExamples.AlbumStub)' method, and this method cannot be translated into a store expression.

What has happened here is that the compiler has recognised

album => Mapper.Map<Album, AlbumStub>(album)

as a valid Expression but when the query provider has tried to work its magic and translate it into SQL, it doesn't know what to do.

We could try a different approach and call:

Mapper.CreateMap<Album, AlbumStub>();
var allAlbumKeyAndNames = context.Albums
  .Select(Mapper.Map<Album, AlbumStub>)
  .OrderBy(name => name);

But here the Select method that has been called is the Select method that works against IEnumerable and so all of the data in the context.Albums object graph has been evaluated. Even though we only want the Album Keys and Names, all of the Album, Track and Artist data has been retrieved. At the point at which the IQueryable was forced into operating as an IEnumerable it had to be evaluated, and the provider is given no way way of knowing that only the Album Keys and Names are required. What a waste!

(Incidentally, exactly the same problem was being exhibited by my "Compiler Type Converter" code, this isn't something particular to AutoMapper).

But back in February 2011, the author of AutoMapper wrote an article talking about this and how he'd been doing some work to improve the situation (Autoprojecting LINQ queries). I believe that it became a standard part of the library in the August 2013 3.0 release (according to the GitHub Release Notes).

The way it works is by adding some extension methods for IQueryable that work with AutoMapper. The above example now becomes:

Mapper.CreateMap<Album, AlbumStub>();
var allAlbumKeyAndNames = context.Albums
  .OrderBy(name => name);
  .Project().To<AlbumStub>();

The ".Project().To<AlbumStub>()" converts the IQueryable set into an IEnumerable but it does so in such a manner that only the minimum data is requested from the data source. So in this example, there will be no joins to the Tracks or Artists tables, nor will the ArtistKey field of the Album table even be mentioned in the underlying query! The "OrderBy" call is moved up so that it operates against the IQueryable and can be performed by SQL rather than retrieving the data from the db and having to sort it in-memory (which is what would happen if OrderBy was called after Project..To since it would be operating against an IEnumerable reference rather than an IQueryable).

There are some limitations to the projections that can be performed (which are documented in the AutoMapper GitHub wiki page Queryable Extensions). One problem that I found early on is that, while with IEnumerable mappings you could map to an immutable type such as

public class ImmutableAlbumStub
{
  public ImmutableAlbumStub(int albumKey, string name)
  {
    if (string.IsNullOrWhiteSpace(name))
      throw new ArgumentException("Null/blank name specified");
    AlbumKey = albumKey;
    Name = name;
  }
  public int AlbumKey { get; private set; }
  public string Name { get; private set; }
}

by using

Mapper.CreateMap<Album, ImmutableAlbumStub>()
  .ConstructUsing(a => new ImmutableAlbumStub(a.AlbumKey, a.Name));

if you attempt this using this mapping with Project..To results you'll receive an ArgumentException with the message

'ProjectionExamples.ImmutableAlbumStub' does not have a default constructor

Hmm. Bummer.

But, on the whole, I thought that this general "autoprojecting" thing was an awesome idea! And one that I wanted to steal (er.. I mean incorporate into my own code :)

Auto-Projecting to Immutable Types (with the Compilable Type Converter)

At its core, the problem is that we need to be able to provide Expression-based type converters that we can use with the IQueryable-based extension methods. Being able to do this will allow the IQueryable provider to analyse the Expressions and retrieve the bare minimum data required to satisfy the operation. I figured that this would be a walk in the park since the ICompilableTypeConverter is all about this - that's what enables its conversions to be compiled and be so fast!

Unfortunately, the very idea of analysing arbitrary expressions and translating them into SQL (or whatever) is a complex matter and, since this translation is handled by the query provider, it may vary from one provider to another. So far I've only tested this with Entity Framework and it's Entity Framework's limitations that I've encountered and worked with / around.

The first problem is to do with the handling of null values. If we continue with the album data model and imagine that it's actually optional to assign an artist to an album, then in such a case there would be a null ArtistKey on the Album database record. This would mean that the Artist property on the corresponding instance of the Entity-Framework-generated class would also be null. But if I try to map this onto another type structure such as with

var albumsWithArtists = context.Albums
  .Select(a => new {
    Name = a.Name,
    Artist = (a.Artist == null) ? null : new { Name = a.Artist.Name }
  });

then we get another NotSupportedException as soon as the data is evaluated, this time with the message

Unable to create a null constant value of type 'Anonymous type'. Only entity types, enumeration types or primitive types are supported in this context.

Unfortunately, this is - broadly speaking - what happens in the type converters that my code generates. And something similar happens with properties that are enumerable. The Tracks property, for example:

var albumsWithTrackNames = context.Albums
  .Select(a => new {
    Name = a.Name,
    TrackNames = (a.Tracks == null) ? null : a.Tracks.Select(t => t.Name)
  });

Cannot compare elements of type 'System.Collections.Generic.ICollection`1[[ProjectionExamples.Album, ProjectionExamples, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]]'. Only primitive types, enumeration types and entity types are supported.

This second one doesn't seem all that unreasonable from Entity Framework's side; if there are no tracks associated with an album then an empty Tracks list would be recorded against an Album instance, not a null Tracks reference. Unfortunately my conversion methods don't assume this and just performing this null checking makes Entity Framework throw its toys out of the pram. We can't even check for nulls and then resort to a default empty array -

var albumsWithTrackNames = context.Albums
  .Select(a => new {
    Name = a.Name,
    TrackNames = (a.Tracks == null) ? new string[0] : a.Tracks.Select(t => t.Name)
  });

as that will result in the same error.

And then, the killer:

var albumsAsImmutableTypes = context.Albums
  .Select(a => new ImmutableAlbum(a.AlbumKey, a.Name));

This results in a NotSupportedException with the message

Only parameterless constructors and initializers are supported in LINQ to Entities.

Oh dear.

Soooooo...

The approach I took to address this was two-fold. First, assume that all lists will be empty if there is no data for them and so assume that lists will never be null. Second, perform two mappings for each translation. Firstly to interim objects that have only the required properties, this is done while dealing with IQueryable data. And then map these interim types to the real destination objects, this is done after pushing the interim results into an IEnumerable set so that the limitations of the query provider no longer apply. The interim objects all have an "is-initialised" property on them so that if the source object is null then it can be mapped to an interim object with its "is-initialised" flag set to false, otherwise the flag will be set to true. When the interim types are mapped to the destination types, instances with "is-initialised" set to false will be mapped to null references.

This means that only the minimum required data will be retrieved but that the data may be mapped to immutable objects and that Entity Framework's awkward behaviour around nulls can be side-stepped. It's a bit like an automated version of

var albumsAsImmutableTypes = context.Albums
  .Select(a => (a == null) ? new { AlbumKey = a.AlbumKey, Name = a.Name })
  .AsEnumerable()
  .Select(a => new ImmutableAlbumStub(a.AlbumKey, a.Name));

but without having to write that interim mapping by hand.

When building the mappings, first an "ideal mapping" is generated from the source types (the Entity Framework types) to the destination types (the ImmutableAlbumStub). This will never be used directly but performing this work reveals what property mappings are required and allows the interim types to be constructed to expose only the minimum required data.

Since there is an overhead to performing this work (when not dealing with IQueryable data the "ideal mapping" is fine to use and none of this extra work is required) and since there are some tweaks to behaviour (such as the assumption that enumerable sets will never be null), I created a separate static class to use, the ProjectionConverter. It works as follows (this example includes a mapping of nested types so that it's not as simple as the album "stub" example above):

ProjectionConverter.CreateMap<Track, ImmutableTrack>();
ProjectionConverter.CreateMap<Album, ImmutableAlbum>();
using (var context = new ProjectionTestEntities1())
{
  var albumsWithTrackListings = context.Albums
    .Project().To<ImmutableAlbum>();

The target classes are:

public class ImmutableAlbum
{
  public ImmutableAlbum(string name, IEnumerable<ImmutableTrack> tracks)
  {
    if (string.IsNullOrWhiteSpace(name))
      throw new ArgumentException("Null/blank name specified");
    if (tracks == null)
      throw new ArgumentNullException("tracks");

    Name = name;
    Tracks = tracks.ToList().AsReadOnly();
    if (Tracks.Any(t => t == null))
      throw new ArgumentException("Null reference encountered in tracks set");
  }

  /// <summary>
  /// This will never be null or blank
  /// </summary>
  public string Name { get; private set; }

  /// <summary>
  /// This will never be null nor contain any null references
  /// </summary>
  public IEnumerable<ImmutableTrack> Tracks { get; private set; }
}

public class ImmutableTrack
{
  public ImmutableTrack(int number, string name)
  {
    if (string.IsNullOrWhiteSpace(name))
      throw new ArgumentException("Null/blank name specified");
    if (number < 1)
      throw new ArgumentOutOfRangeException("number must be greater than zero");

    Number = number;
    Name = name;
  }

  /// <summary>
  /// This will always be greater than zero
  /// </summary>
  public int Number { get; private set; }

  /// <summary>
  /// This will never be null or blank
  /// </summary>
  public string Name { get; private set; }
}

The Project and To methods are IQueryable extensions in my "Compilable Type Converter" project, not the ones in AutoMapper. All of the same options that I talked about last time are available for the projections (so some or all of the target types may be initialised by-property-setter instead of by-constructor), the big difference is that the ProjectionConverter must be used instead of the regular Converter.

And with that, I'm done! IQueryable-based mappings to immutable types are now possible in a simple and efficient manner!

Bonus material: Dynamic "anonymous" types

The interim types that are generated by the code are created dynamically. The ProjectionConverter maintains a dictionary of generated types so if a mapping is required that requires an interim type with the exact same set of properties as an interim type that has been used before, then a new instance of that type will be created, rather than having to build an entirely new type and then creating an instance of that. Obviously, the first time that any mapping is generated, some new types will have to be built.

Since the C# compiler uses anonymous types, I'd wondered if there was some .net mechanism to generate these types on-the-fly. But after doing some testing (by compiling some code and investigating the output using ildasm), it would seem that the compiler analyses the source code at compile time and bakes in classes to the IL that may be used for all of the required anonymous types. So that was a no-go.

But a few years ago I'd been experimenting with a similar topic, so I was able to dust off and repurpose some old code. Which was convenient! All that I required was for a new type to be created with a particular set of non-indexed read-and-write properties. It doesn't need any methods, fields or events, it doesn't need any static properties, it doesn't need any read-only or write-only fields. It just requires a simple set of gettable/settable instance properties with particular names and types. I used the following to achieve this:

using System;
using System.Reflection;
using System.Reflection.Emit;
using System.Threading;

namespace CompilableTypeConverter.QueryableExtensions.ProjectionConverterHelpers
{
  public class AnonymousTypeCreator
  {
    public static AnonymousTypeCreator DefaultInstance
      = new AnonymousTypeCreator("DefaultAnonymousTypeCreatorAssembly");

    private readonly ModuleBuilder _moduleBuilder;
    public AnonymousTypeCreator(string assemblyName)
    {
      if (string.IsNullOrWhiteSpace(assemblyName))
        throw new ArgumentException("Null/blank assemblyName specified");

      var assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(
        new AssemblyName(assemblyName),
        AssemblyBuilderAccess.Run
      );
      _moduleBuilder = assemblyBuilder.DefineDynamicModule(
        assemblyBuilder.GetName().Name,
        false // emitSymbolInfo (not required here)
      );
    }

    public Type Get(AnonymousTypePropertyInfoSet properties)
    {
      if (properties == null)
        throw new ArgumentNullException("properties");

      var typeName = "<>AnonymousType-" + Guid.NewGuid().ToString("N");
      var typeBuilder = _moduleBuilder.DefineType(
        typeName,
        TypeAttributes.Public
           | TypeAttributes.Class
           | TypeAttributes.AutoClass
           | TypeAttributes.AnsiClass
           | TypeAttributes.BeforeFieldInit
           | TypeAttributes.AutoLayout
      );

      var ctorBuilder = typeBuilder.DefineConstructor(
        MethodAttributes.Public,
        CallingConventions.Standard,
        Type.EmptyTypes // constructor parameters
      );
      var ilCtor = ctorBuilder.GetILGenerator();
      ilCtor.Emit(OpCodes.Ldarg_0);
      ilCtor.Emit(OpCodes.Call, typeBuilder.BaseType.GetConstructor(Type.EmptyTypes));
      ilCtor.Emit(OpCodes.Ret);

      foreach (var property in properties)
      {
        // Prepare the property we'll add get and/or set accessors to
        var propBuilder = typeBuilder.DefineProperty(
          property.Name,
          PropertyAttributes.None,
          property.PropertyType,
          Type.EmptyTypes
        );
        var backingField = typeBuilder.DefineField(
          property.Name,
          property.PropertyType,
          FieldAttributes.Private
        );

        // Define get method
        var getFuncBuilder = typeBuilder.DefineMethod(
          "get_" + property.Name,
          MethodAttributes.Public
           | MethodAttributes.HideBySig
           | MethodAttributes.NewSlot
           | MethodAttributes.SpecialName
           | MethodAttributes.Virtual
           | MethodAttributes.Final,
          property.PropertyType,
          Type.EmptyTypes
        );
        var ilGetFunc = getFuncBuilder.GetILGenerator();
        ilGetFunc.Emit(OpCodes.Ldarg_0);
        ilGetFunc.Emit(OpCodes.Ldfld, backingField);
        ilGetFunc.Emit(OpCodes.Ret);
        propBuilder.SetGetMethod(getFuncBuilder);

        // Define set method
        var setFuncBuilder = typeBuilder.DefineMethod(
          "set_" + property.Name,
          MethodAttributes.Public
           | MethodAttributes.HideBySig
           | MethodAttributes.SpecialName
           | MethodAttributes.Virtual,
          null,
          new Type[] { property.PropertyType }
        );
        var ilSetFunc = setFuncBuilder.GetILGenerator();
        ilSetFunc.Emit(OpCodes.Ldarg_0);
        ilSetFunc.Emit(OpCodes.Ldarg_1);
        ilSetFunc.Emit(OpCodes.Stfld, backingField);
        ilSetFunc.Emit(OpCodes.Ret);
        propBuilder.SetSetMethod(setFuncBuilder);
      }

      return typeBuilder.CreateType();
    }

    private static MethodInfo MethodInfoInvokeMember = typeof(Type).GetMethod(
      "InvokeMember",
      new[] {
        typeof(string),
        typeof(BindingFlags),
        typeof(Binder),
        typeof(object),
        typeof(object[])
      }
    );
  }
}

The AnonymousTypePropertyInfoSet data that is used to generate new classes is just a set of PropertyInfo instances that don't have the same property name used for multiple different property types and that ensures that none of the properties are indexed. It also overrides the Equals and GetHashCode method so that it can be used as a key in a dictionary of interim types to prevent creating more types that necessary. In essence, really it's an IEnumerable<PropertyInfo> with a few bells and whistles.

(These files can be found in the Bitbucket project at AnonymousTypeCreator.cs and AnonymousTypePropertyInfoSet.cs while the dynamic type creation is required by the PropertyConverter.cs).

And on that note, I really am done!

Posted at 23:12

Tags:

Comments

22 January 2014

Reflection and C# optional constructor arguments

Bonus provocative headline: Like AutoMapper, but 100x faster!

I overheard someone at work bemoaning the fact that StructureMap doesn't seem to support optional constructor arguments (which, having a quick scout around the internet, does indeed seem to be the case, though there are solutions out there such as Teaching StructureMap About C# 4.0 Optional Parameters and Default Values).

This put me in mind of the "Compilable Type Converter" project I wrote a couple of years ago. It started off life as a way to try to easily extend AutoMapper to apply all of its cleverness to constructor arguments as well as properties. So instead of using the properties on one object to populate the properties of another object, it would call a constructor on the destination class and pass in values taken from properties on the source object. (AutoMapper allows constructors to be used for creating new instances, using the "ConvertUsing" method, but it doesn't do its magic with name mappings and type conversions*).

It then grew to generate conversion LINQ Expressions, which were compiled for performance. And from there it became a standalone component that could perform mappings without AutoMapper at all! It could still be used with AutoMapper since the converters it generated could be used in "ConvertUsing" calls but the property-to-constructor-argument mappings would be created automatically instead of manually. And if non-compilable type converters (not compiled to LINQ Expression and so functional but slower) were being generated with my project, there were classes to utilise AutoMapper to help perform the type conversions.

The last thing I had done to it was add in support so that it could generate compiled converters that would populate the destination object using property setters (like AutoMapper does) instead of by-constructor.

I wrote a couple of posts about this a long time ago, but they were early posts and they weren't written all that well, so I'm embarrassed to link to them here. :)

Anyway.. the point is, I was fairly confident that the Compilable Type Converter also did not support optional constructor arguments. And I didn't actually know how optional constructor arguments would look in the reflected information (the converter uses reflection to analyses the source and destination types and decide how to perform the conversion, but then generates a LINQ Expression to do the work which should have performance comparable to custom hand-written conversion code) so it seemed like a good opportunity to brush off an old project and have a bit of fun with it!

.net's representation of optional constructor arguments

This is the easy bit. I hadn't known how easy until I looked into it, but very easy.

Say there is a class

public class TypeWithOptionalConstructorArguments
{
  public TypeWithOptionalConstructorArguments(string name, int number = 1)
  {
    // Do initialisation work..
  }

  // Have the rest of the class here..
}

then to determine that the number argument is optional, we interrogate information about the constructor -

var constructorParameters = typeof(TypeWithOptionalConstructorArguments)
  .GetConstructor(new[] { typeof(string), typeof(int) })
  .GetParameters();

var numberParameter = constructorParameters[1];
var numberParameterType = numberParameter.ParameterType;
var isNumberParameterOptional = numberParameter.IsOptional;
var numberParameterDefaultValue = numberParameter.DefaultValue;

Here we find that numberParameterType is int, isNumberParameterOptional is true and numberParameterDefaultValue is 1. If we considered the first parameter then IsOptional would be false.

Incorporating this into the Compilable Type Converter

Before I give a quick run-down of how I made my code "optional-constructor-argument aware", I'll go quickly through the core concepts it uses when trying to generate a conversion.

There are Property Getters which will take a value from a property on the source type in order to satisfy a value required to generate the destination type (this value may be a constructor argument or a property, depending upon whether a by-constructor or by-property-setter conversion is desired). Property Getters come in several varieties; there is one that will map a source value if the source value's type may be assigned directly to the destination type (ie. the source value matches the destination value or inherits from it / implements it). There is one that will map enum values - from one enum to another, according to a naming convention (this convention is determined by a Name Matcher, see below). There is another one that will perform one specific type translation, so if a converter is generated for mapping between SNested and DNested then a new Property Getter may be created that will help convert type SParent to DParent if SParent has a property with type SNested that needs to be mapped to a property on DParent with type DNested. There's another that's very similar but for enumerable types, so given an SNested -> DNested converter it can help map SParent to DParent if SParent has a property of type IEnumerable<SNested> and DParent has a property of type IEnumerable<DNested>.

Property Getters are created by Property Getter Factories. When a conversion request is being analysed, the Property Getter Factories will be asked "can you perform a mapping from Src to Dest for the property on Dest named Prop?" (the property on Dest may be an actual property or it may be a constructor argument). The factory will look at all of the properties on the Src type and see which, if any, it would map onto Prop based upon the source property's name and type. The type matching depends upon what sort of Property Getter the factory can create (whether that be an assignable-to getter, an enum-translating getter, etc.. all of the options I just described above) and what name matching approach it will use. The name matching depends upon the Name Matcher that was provided to the factory at instantiation.

Name Matchers simply answer the question "are these property/argument names equivalent?", the basic implementation in the project is the CaseInsensitiveSkipUnderscoreNameMatcher. This ignores underscores and case when comparing names, so "Title" and "title" and considered to be the same, as are "Person_Name" and "personName".

Finally, when a by-constructor conversion is being generated, there may be multiple constructors which may be satisfied (ie. all of their constructor arguments may be provided with values from the source object's properties). In this case, a decision will need to be made as to which constructor to use. For this, there is a Constructor Prioritiser. Each may-be-satisfied-constructor is represented by the fully-generated converter that would use that constructor. The prioritiser must then pick one to be used as the converter that should be used for that translation.

The only Constructor Prioritiser implementation that I currently have is an ArgsLengthTypeConverterPrioritiser. This simply picks the constructor which has the most arguments, the thinking being that this must be the constructor that uses the most data from the source type and so will result in the best-populated destination instance possible.

However, if there are two constructors, one with four compulsory arguments and one with five arguments total, but two of them optional, then the five-argument constructor may no longer be the best bet. If a conversion is available that explicitly populates those five values with data from the source object, then this is probably still the best match. But if the only conversion that uses that five-argument constructor is actually relying on the default values for those two optional arguments then it's only actually populating three constructor arguments from the source data, so surely the four-argument constructor is better!

A quick(ish) explanation of how I introduced optional constructor arguments

So I have a CompilableTypeConverterByConstructorFactory. This has a method Get<TSource, TDest>() which will try return an ICompilableTypeConverter<TSource, TDest> that maps from TSource to TDest. If it can't create such a type converter then it will throw an exception.

The particular implementation of ICompilableTypeConverter<TSource, TDest> returned from this class will be a CompilableTypeConverterByConstructor<TSource, TDest>.

This class previously required a ConstructorInfo and a set of Property Getters for each argument in that constructor. The factory's job was to select the best ConstructorInfo and provide those Property Getters from the Property Getter Factories that it had access to. The constructor of the CompilableTypeConverterByConstructor<TSource, TDest> class would do some validation to ensure that the number of Property Getters matched the number of constructor arguments for the specified constructor, and that the types returned by the Property Getters matched the constructor's arguments types.

The change I made was for the CompilableTypeConverterByConstructor<TSource, TDest> to also take a ICompilableConstructorDefaultValuePropertyGetter set - Property Getters which are associated with a particular constructor argument which has a default value, and which just return this default value when a value is requested.

These Default Value Property Getters would only be specified by the Type Converter Factory if there was no Property Getter that could otherwise provide that constructor argument with a value - if it's possible to get data from the source object for a constructor argument then there's no point using the argument's default value!

The benefit of providing two distinct sets of Property Getters (those relying upon default values and those not) to the CompilableTypeConverterByConstructor<TSource, TDest> is that it was possible to add another public property to it; the NumberOfConstructorArgumentsMatchedWithNonDefaultValues (this is the total number of arguments that the target constructor has minus the number of Default Value Property Getters). And the benefit of this is that it allows for a Constructor Prioritiser to consider the number of constructor arguments that were populated with data from the source object, as opposed to the total number of constructor arguments fulfilled, regardless of how many actually had to fall back on to using default values. Which addresses the problem I outlined in the section above.

Code updates

While I was making these changes and experimenting with various scenarios (trying to re-familiarise myself with exactly how everything worked) I found it interesting to note how some I've changed some coding conventions over the years. Particularly, I disliked a method on the ITypeConverterFactory interface -

/// <summary>
/// This will return null if unable to generate the specified converter
/// </summary>
ITypeConverter<TSource, TDest> Get<TSource, TDest>();

From some sides, this doesn't sound all that bad. And it's not uncommon to find code out there that does the same sort of thing; try to get the requested value and return null if unable to.

As a rule, though, I don't like this at all. I prefer to avoid nulls wherever humanly possible and explicitly indicate the possibility of their presence where they must crop up.

If a class exists where a property may be null since the data is not required for that particular structure, then I will prefix that property with "Optional". If a method may be expected to return null then I will prefix it with "TryTo". This isn't a perfect system by any means but it's a convention that I've found useful.

So I could change the Get method above to

/// <summary>
/// This will return null if unable to generate the specified converter
/// </summary>
ITypeConverter<TSource, TDest> TryToGet<TSource, TDest>();

if not being able to return the requested converter is not an error condition.

However, for the cases a converter could not be generated to perform the specified TSource -> TDest mapping, the caller has no additional information - all they have is a null! And I suspect that someone trying to get a converter by calling a "Get" method would indeed consider it an error condition if it didn't actually return a converter.

So I changed it to

/// <summary>
/// This will throw an exception if unable to generate the specified converter, it will never
/// return null
/// </summary>
ITypeConverter<TSource, TDest> Get<TSource, TDest>();

I then changed the Type Converter Factories to throw custom exceptions indicating what property could not be set on the target type or what constructor arguments could not be mapped. Changing the contract so that it is considered an error condition when a mapping could not be created resulted in more information being available to the caller, more useful and important information.

Static convenience wrapper

Since I felt like I was cooking on gas at this point, I thought I'd address another problem with this project; trying to use this code for the first time (if you'd just cloned the project, for example) is difficult! I've got a ReadMe file in the project that tells you how to initialise a converter factory and then generate types but it's quite a lot of work to do so!

In some of my other projects I've included "convenience wrappers" to do the fiddly work of initialising everything for the most common case so that the code is as easy as possible to get working with. For example, the CSSParser has the static Parser class, with its method "ParseCSS" and "ParseLESS" (with method signatures that will read from strings or from TextReaders). The CSSMinifier has the DefaultNonCachedLessCssLoaderFactory and EnhancedNonCachedLessCssLoaderFactory which can be initialised with only an ASP.Net "Server" reference. And, of course, AutoMapper is phenomenally easy to get going with since there is a static Mapper class with CreateMap and Map methods (amongst many others). So I thought that my Type Converter library would benefit from something similar!

It can't get much simpler than this:

Converter.CreateMap<MutablePersonDetails.RoleDetails, ImmutablePersonDetails.RoleDetails>();
var dest = Converter.Convert<MutablePersonDetails, ImmutablePersonDetails>(source);

The "source" object in this example is initialised with

var source = new MutablePersonDetails
{
  Name = "Henry",
  Roles = new List<MutablePersonDetails.RoleDetails>
  {
    new MutablePersonDetails.RoleDetails
    {
      Title = "Head Penguin Cleaner",
      ClearanceLevel = ClearanceLevelOptions.Maximum
    }
  }
};

(The actual classes for the source and destination types will be included later on for completion's sake).

The types MutablePersonDetails.RoleDetails and ImmutablePersonDetails.RoleDetails are considered "nested" as they are not the target of the primary mapping (which is from MutablePersonDetails to ImmutablePersonDetails). There are properties on the source and destination types which are sets of these RoleDetails nested types.

So first a mapping for the nested types is created. The Converter class is able to use this mapping to generate mappings between sets of these types; so creating a MutablePersonDetails.RoleDetails to ImmutablePersonDetails.RoleDetails mapping means that a List<MutablePersonDetails.RoleDetails> to IEnumerable<ImmutablePersonDetails.RoleDetails> becomes available as well.

The Convert call will implicitly try to create a suitable mapping if one is not already available, this is why no explicit call to CreateMap is required for MutablePersonDetails to ImmutablePersonDetails.

The mapping here was a "by-constructor" mapping (which is what I originally started this project for), it takes property values from the source object and uses them to populate constructor arguments on the destination type to create a new instance of it. But "by-property-setter" mappings are also supported, so we could also create a mapping in the opposite direction to that above:

Converter.CreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>();
var dest = Converter.Convert<ImmutablePersonDetails, MutablePersonDetails>(source);

The source and destination classes in the examples are as follow:

public class MutablePersonDetails
{
  public string Name { get; set; }
  public List<RoleDetails> Roles { get; set; }

  public class RoleDetails
  {
    public string Title { get; set; }
    public ClearanceLevelOptions ClearanceLevel { get; set; }
  }
}

public class ImmutablePersonDetails
{
  public ImmutablePersonDetails(string name, IEnumerable<RoleDetails> roles)
  {
    if (string.IsNullOrWhiteSpace(name))
      throw new ArgumentException("Null/blank name specified");
    if (roles == null)
      throw new ArgumentNullException("roles");

    Name = name;

    Roles = roles.ToList().AsReadOnly();
    if (Roles.Any(role => role == null))
      throw new ArgumentException("Null reference encountered in roles set");
  }

  public string Name { get; private set; }
  public IEnumerable<RoleDetails> Roles { get; private set; }

  public class RoleDetails
  {
    public RoleDetails(string title, ClearanceLevelOptions clearanceLevel)
    {
      if (string.IsNullOrWhiteSpace(title))
        throw new ArgumentException("Null/blank title specified");
      if (!Enum.IsDefined(typeof(ClearanceLevelOptions), clearanceLevel))
        throw new ArgumentOutOfRangeException("clearanceLevel");

      Title = title;
      ClearanceLevel = clearanceLevel;
    }

    public string Title { get; private set; }
    public ClearanceLevelOptions ClearanceLevel { get; private set; }
  }
}

public enum ClearanceLevelOptions
{
  Regular,
  Maximum
}

Ignoring properties / using default constructor arguments

If the above classes were changed such that MutablePersonDetails.RoleDetails no longer has a ClearanceLevel property and the ImmutablePersonDetails.RoleDetails constructor's clearanceLevel argument is assigned a default value..

// Nested type of MutablePersonDetails
public class RoleDetails
{
  public string Title { get; set; }
}

// Nested type of ImmutablePersonDetails
public RoleDetails(
  string title,
  ClearanceLevelOptions clearanceLevel = ClearanceLevelOptions.Regular)

.. then the Converter will take this into account and still generate the expected mapping with:

Converter.CreateMap<MutablePersonDetails.RoleDetails, ImmutablePersonDetails.RoleDetails>();
var dest = Converter.Convert<MutablePersonDetails, ImmutablePersonDetails>(source);

If we reversed this such that the MutablePersonDetails.RoleDetails still has a ClearanceLevel property but the ImmutablePersonDetails.RoleDetails does not..

// Nested type of MutablePersonDetails
public class RoleDetails
{
  public string Title { get; set; }
  public ClearanceLevelOptions ClearanceLevel { get; set; }
}

// Nested type of ImmutablePersonDetails
public class RoleDetails
{
  public RoleDetails(string title)
  {
    if (string.IsNullOrWhiteSpace(title))
      throw new ArgumentException("Null/blank title specified");
    Title = title;
  }
  public string Title { get; private set; }
}

.. then the mapping will fail as the Converter will throw an exception if it can't map every property on the target when performing a by-property-setter conversion. Unless it is explicitly instructed to ignore the property -

Converter.BeginCreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>()
  .Ignore(
    r => r.ClearanceLevel
  )
  .Create();
var dest = Converter.Convert<ImmutablePersonDetails, MutablePersonDetails>(source);

The BeginCreateMap allows for exceptions to be made to the normal mapping process. The Create call (at the end of the BeginCreateMap, Ignore, Create chain) is important since the work to try to generate the converter will not be performed without that call (and all of the BeginCreateMap and any subsequent calls in that chain will be ignored without Create being called).

This is different to the AutoMapper approach since AutoMapper will take in information about how the mappings should be created but not use it until a conversion is required. This means that mappings can be specified in any order with AutoMapper; the following would be fine, for example -

Mapper.CreateMap<ImmutablePersonDetails, MutablePersonDetails>();
Mapper.CreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>();

AutoMapper doesn't mind the mappings for the nested type appearing after the mapping for the "containing type" since it won't try to use this information until it actually performs a conversion.

My Converter class, however, generates the converters when CreateMap (or Convert is called). So a mapping for the nested types must be specified before the containing type as a converter for the containing type can't be generated without knowing how to convert the nested types! While I think there are advantages to the flexibility of AutoMapper's approach (not having to worry about converter dependencies; not having to worry about the order in which mappings are specified) I also think there are advantages to my approach since an exception will be raised as soon as a mapping is requested that can not be created (along with information about what properties or constructor arguments could not be satisfied).

Another advantage of the converters being generated as the mappings are specified is that the Converter is keeping track of them and can provide a reference to any of them through a call to GetConverter. The converters are all immutable and if a converter is returned from the GetConverter method then no further changes to the Converter class may affect it. This is reassuring in that the converter may be used elsewhere without having to worry about the mutability of the static Converter class but it also has performance benefits; calls to the Converter's Convert method (and CreateMap and GetConverter methods) require cache lookups and locks. If you use a converter reference delivered by the GetConverter method then you don't need to worry about these lookups and locks. Which brings me neatly to..

The Compilable Type Converter's Performance

First off, the Compilable Type Converter isn't intended to compete feature-for-feature with AutoMapper. AutoMapper is a well-rounded library with all sorts of functionality that address all sorts of edge cases. For example, I only encountered the BeforeMap and AfterMap calls when looking into it more deeply to write this article! It also offers object model flattening and retrieval of data through Get methods rather than properties. I don't have any intention of supporting any of these, though I do intend to add some custom property mappings at some point. Something like

Converter.BeginCreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>()
  .Custom(
    dest => dest.ClearanceLevel,
    src => src.GetClearanceLevel()
  )
  .Create();

(Let's not forget the killer feature of my library - for me, at least - is that it performs the name matching magic from properties onto constructor arguments so that immutable classes can be instantiated by the mappers).

So anyway.. making performance comparisons between the two libraries is probably not all that productive. But since I've banged on about the Compilable Type Converter producing LINQ-Expression-compiled converters, I'm going to anyway! :)

We'll stick with the ImmutablePersonDetails to MutablePersonDetails mapping that was in the earlier examples.

There are two aspects that need considering - the startup time and the conversion time. If the Compilable Type Converter can perform conversions faster than AutoMapper but with a greater initialisation cost (which we'd expect since there is expensive LINQ Expression compilation going on) then there will have to be a certain number of conversion performed before we "break even" on the startup time. But after that, it should be all win!

So I've set up a test program that times the initialisation processes, repeated in a loop. At the end of each loop, the Reset method is called for both the Mapper and Converter (these calls are outside of the initialisation work that is timed, since we're not interested in the efficiency of the Reset methods). The last loop doesn't call Reset so that everything is ready for the next section of the program, where I time a conversion from an ImmutablePersonDetails instance to a new MutablePersonDetails (over and over again).

The init sections looks like this (basically the same as we've already seen above). We have to actually perform one mapping in the init code since AutoMapper postpones doing work until a mapping is actually requested, as I've already spoken about.

Mapper.CreateMap<ImmutablePersonDetails, MutablePersonDetails>();
Mapper.CreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>();
var destAutoMapperInitialise = Mapper.Map<ImmutablePersonDetails, MutablePersonDetails>(source);

Converter.CreateMap<ImmutablePersonDetails.RoleDetails, MutablePersonDetails.RoleDetails>();
var converter = Converter.GetConverter<ImmutablePersonDetails, MutablePersonDetails>();
var destCompilableTypeConverterInitialise = converter.Convert(source);

Then there are three operations that are individually timed in the "convert loop":

// Convert using AutoMapper
Mapper.Map<ImmutablePersonDetails, MutablePersonDetails>(source);

// Convert using the Compilable Type Converter, through the static convenience wrapper
Converter.Convert<ImmutablePersonDetails, MutablePersonDetails>(source);

// Convert using the Compilable Type Converter, using the converter reference from a GetConverter
// call in the init phase (this will be quicker as the cache lookups and locks in the convenience
// wrapper are not required)
converter.Convert(source);

I've run this whole process half a dozen times and got comparable results each time. The last time I ran it, the average time to initialise AutoMapper was 8ms and to initialise the Compilable Type Converter was 41ms (average taken over 100 repeated initialisations). The average time (taken over 100,000 loops) to perform the conversions was 310 ticks for AutoMapper, 46 ticks for the Compilable Type Converter via the convenience wrapper and 3 ticks for the Compilable Type Converter via the converter reference that was obtained as part of the initialisation work.

The standout result here is that the Compilable Type Converter was able to perform the conversion 100x faster.

100x faster!

That's good times! :)

However, this ignores the initialisation overhead. If you were only ever going to perform a single conversion then speed of the initialised converter is more than offset by the additional initialisation time required. However, if you're expecting to perform a lot of these conversions then this initialisation overhead should be easily offset. (My original aim for this work was to translate a WCF web service's public-facing mutable classes into their internal immutable counterparts, so there would be many conversions in that case). In the example above, it would take 349 conversions to break even if using the Converter wrapper and only 300 if using the converter reference directly.

Another note from the future: AutoMapper 5.0 (released July 2016) has some significant performance improvements such that now the performance tests above (which would need tweaking to compile with modern AutoMapper) are only between 2x and 2.5x faster with the CompilableTypeMapper than with AutoMapper. This is fantastic work from the AutoMapper authors! See AutoMapper 5.0 speed increases for more details.

Posted at 23:19

Tags:

Comments

5 March 2013

Supporting IDispatch through the COMInteraction wrapper

Some time ago, I wrote some code that would generate a wrapper to apply a given interface to any object using reflection. The target object would need to expose the properties and methods of the interface but may not implement the interface itself. This was intended to wrap some old WSC components that I was having to work with but is just as easy to demonstrate with .Net classes:

using System;
using COMInteraction.InterfaceApplication;
using COMInteraction.InterfaceApplication.ReadValueConverters;

namespace DemoApp
{
  class Program
  {
    static void Main(string[] args)
    {
      // Warning: This code will not compile against the current code since the interface has changed
      // since the example was written but read on to find out how it's changed!
      // - Ok, ok, just replace "new InterfaceApplierFactory" with "new ReflectionInterfaceApplierFactory"
      //   and "InterfaceApplierFactory.ComVisibilityOptions.Visible" with "ComVisibilityOptions.Visible"
      //   :)
      var interfaceApplierFactory = new InterfaceApplierFactory(
        "DynamicAssembly",
        InterfaceApplierFactory.ComVisibilityOptions.Visible
      );
      var interfaceApplier = interfaceApplierFactory.GenerateInterfaceApplier<IAmNamed>(
        new CachedReadValueConverter(interfaceApplierFactory)
      );

      var person = new Person() { Id = 1, Name = "Teddy" };
      var namedEntity = interfaceApplier.Apply(person);
    }
  }

  public interface IAmNamed
  {
    string Name { get; }
  }

  public class Person
  {
    public int Id { get; set; }
    public string Name { get; set; }
  }
}

The "namedEntity" reference implements IAmNamed and passes the calls through to the wrapped "Person" instance through reflection (noting that Person does not implement IAmNamed). Of course, this will cause exceptions if an instance is wrapped that doesn't expose the properties or methods of the interface when those are called.

I wrote about the development of this across a few posts: Dynamically applying interfaces to objects, Part 2 and Part 3 (with the code available at the COMInteraction project on Bitbucket).

And this worked fine for the purpose at hand. To be completely frank, I'm not entirely sure how it worked when calling into the WSC components that expose a COM interface since I'm surprised the reflection calls are able to hook into the methods! It felt a bit hacky..

But just recently I've gotten into some of the nitty gritty of IDispatch (see IDispatch (IWastedTimeOnThis but ILearntLots)) and thought maybe I could bring this information to bear on this project.

Supporting Reflection and IDispatch

The existing InterfaceApplierFactory class has been renamed to the ReflectionInterfaceApplierFactory and a new implementation of IInterfaceApplierFactory has been added: the IDispatchInterfaceApplierFactory. (Where do I come up with these catchy names?! :).

Where the existing (and now renamed) class generated IL to access the methods and properties through reflection, the new class generates IL to access the methods and properties using the code from my previous IDispatch post, handily wrapped up into a static IDispatchAccess class.

The code to do this wasn't too difficult to write, starting with the reflection-approach code as a template and writing the odd bit of test code to disassemble with ildasm if I found myself getting a bit lost.

While I was doing this, I changed the structure of the ReflectionInterfaceApplierFactory slightly - I had left a comment in the code explaining that properties would be defined for the type generated by the IL with getter and setter methods attached to it but that these methods would then be overwritten since the code that enumerates methods for the implemented interface(s) picks up "get_" and "set_" methods for each property. The comment goes on to say that this doesn't appear to have any negative effect and so hasn't been addressed. But with this revision I've gone a step further and removed the code the generates the properties entirely, relying solely on the "get_" and "set_" methods that are found in the interface methods as this seems to work without difficulty too! Even indexed properties continue to work as they get the methods "get_Item" and "set_Item" - if you have a class with an indexed property you may not also have a method named "Item" as you'll get a compilation error:

"The type 'Whatever' already contains a definition for 'Item'

I'm not 100% confident at this point that what I've done here is correct and whether or not I'm relying on some conventions that may not be guaranteed in the future. But I've just received a copy of "CLR via C#" in the post so maybe I'll get a better idea of what's going on and amend this in the future if required!

The types generated by the IDispatchInterfaceApplierFactory will not work if properties are not explicitly defined (and so IL is emitted to do this properly in this case).

Choosing Reflection, IDispatch (or neither)

Another new class is the CombinedInterfaceApplierFactory which is intended to take the decision-making out of the use of reflection / IDispatch. It will generate an IInterfaceApplier whose Apply method will apply an IDispatch wrapper if the object-to-wrap's type's IsCOMObject property is true. Otherwise it will use reflection. Actually, it performs a check before this to ensure that the specified object-to-wrap doesn't already implement the required interface - in which case it returns it straight back! (This was useful in some scenarios I was testing out and also makes sense; if the type doesn't require any manipulation then don't perform any).

Not being Lazy

I realised, going back to this project, that I'd got over-excited when I discovered the Lazy<T> class in .Net 4 and used it when there was some work in the code that I wanted to defer until it was definitely required. But this, of course, meant that the code had a dependency on .Net 4! Since I imagine that this could be useful in place of the "dynamic" keyword in some cases, I figured it would make sense to try to remove this dependency. (My over-excitement is probably visible when I wrote about it at Check, check it out).

I was using it with the "threadSafe" option set to true so that the work would only be executed once (at most). This is a fairly straight forward implementation of the double-checked locking pattern (if there is such a thing! :) with a twist that if the work threw an exception then that exception should be thrown for not only the call on the thread that actually performed the work but also subsequent calls:

using System;

namespace COMInteraction.Misc
{
  /// <summary>
  /// This is similar to the .Net 4's Lazy class with the isThreadSafe argument set to true
  /// </summary>
  public class DelayedExecutor<T> where T : class
  {
    private readonly Func<T> _work;
    private readonly object _lock;
    private volatile Result _result;
    public DelayedExecutor(Func<T> work)
    {
      if (work == null)
        throw new ArgumentNullException("work");

      _work = work;
      _lock = new object();
      _result = null;
    }

    public T Value
    {
      get
      {
        if (_result == null)
        {
          lock (_lock)
          {
            if (_result == null)
            {
              try
              {
                _result = Result.Success(_work());
              }
              catch(Exception e)
              {
                _result = Result.Failure(e);
              }
            }
          }
        }
        if (_result.Error != null)
          throw _result.Error;
        return _result.Value;
      }
    }

    private class Result
    {
      public static Result Success(T value)
      {
        return new Result(value, null);
      }
      public static Result Failure(Exception error)
      {
        if (error == null)
          throw new ArgumentNullException("error");
        return new Result(null, error);
      }
      private Result(T value, Exception error)
      {
        Value = value;
        Error = error;
      }

      public T Value { get; private set; }

      public Exception Error { get; private set; }
    }
  }
}

Conclusion

So finally, the project works with .Net 3.5 and can be used with only the following lines:

var interfaceApplierFactory = new CombinedInterfaceApplierFactory(
  new ReflectionInterfaceApplierFactory("DynamicAssembly", ComVisibilityOptions.Visible),
  new IDispatchInterfaceApplierFactory("DynamicAssembly", ComVisibilityOptions.Visible)
);
var interfaceApplier = interfaceApplierFactory.GenerateInterfaceApplier<IWhatever>(
  new CachedReadValueConverter(interfaceApplierFactory)
);
var wrappedInstance = interfaceApplier.Apply(obj);

In real use, you would want to share generated Interface Appliers rather than creating them each time a new instance needs wrapping up in an interface but how you decide to handle that is down to you!*

* (I've also added a CachingInterfaceApplierFactory class which can be handed to multiple places to easily enable the sharing of generated Interface Appliers - that may well be useful in preventing more dynamic types being generated than necessary).

Posted at 23:29

Tags:

Comments

5 March 2013

The Full Text Indexer - Automating Index Generation

In the introductory Full Text Indexer post I showed how to build an Index Generator by defining "Content Retrievers" for each property of the source data type. I didn't think that, in itself, this was a huge amount of code to get started but it did have a generous spattering of potentially-cryptic class instantiations that implied a large assumed knowledge before you could use it.

With that in mind, I've added a project to the Full Text Indexer (Bitbucket) solution that can automate this step by applying a combination of reflection (to examine the source type) and default values for the various dependencies (eg. the string normaliser, token breaker, etc..).

This means that indexing data can now be as simple as:

var indexGenerator = (new AutomatedIndexGeneratorFactoryBuilder<Post, int>()).Get().Get();
var index = indexGenerator.Generate(posts.ToNonNullImmutableList());

where data is a set of Post instances (the ToNonNullImmutableList call is not required if the set is already a NonNullImmutableList<Post>).

public class Post
{
  public int Id { get; set; }
  public string Title { get; set; }
  public string Content { get; set; }
  public IEnumerable<Comment> Comments { get; set; }
}

public class Comment
{
  public string Author { get; set; }
  public string Content { get; set; }
}

The two "Get" calls are because the example uses an AutomatedIndexGeneratorFactoryBuilder which is able to instantiate an AutomatedIndexGeneratorFactory using a handful of defaults (explained below). The AutomatedIndexGeneratorFactory is the class that processes the object model to determine how to extract text data. Essentially it runs through the object graph and looks for text properties, working down through nested types or sets of nested types (like the IEnumerable<Comment> in the Post class above).

So an AutomatedIndexGeneratorFactory is returned from the first "Get" call and this returns an IIndexGenerator<Post, int> from the second "Get".

// This means we can straight away query data like this!
var results = index.GetMatches("potato");

(Note: Ignore the fact that I'm using mutable types for the source data here when I'm always banging on about immutability - it's just for brevity of example source code :)

Tweaking the defaults

This may be enough to get going - because once you have an IIndexGenerator you can start call GetMatches and retrieving search results straight away, and if your data changes then you can update the index reference with another call to

indexGenerator.Generate(posts.ToNonNullImmutableList());

But there are a few simple methods built in to adjust some of the common parameters - eg. to give greater weight to text matched in Post Titles I can specify:

var indexGenerator = (new AutomatedIndexGeneratorFactoryBuilder<Post, int>())
  .SetWeightMultiplier("DemoApp.Post", "Title", 5)
  .Get()
  .Get();

If, for some reason, I decide that the Author field of the Comment type shouldn't be included in the index I can specify:

var indexGenerator = (new AutomatedIndexGeneratorFactoryBuilder<Post, int>())
  .SetWeightMultiplier("DemoApp.Post.Title", 5)
  .Ignore("DemoApp.Comment.Author")
  .Get()
  .Get();

If I didn't want any comments content then I could ignore the Comments property of the Post object entirely:

var indexGenerator = (new AutomatedIndexGeneratorFactoryBuilder<Post, int>())
  .SetWeightMultiplier("DemoApp.Post.Title", 5)
  .Ignore("DemoApp.Post.Comments")
  .Get()
  .Get();

(There are overloads for SetWeightMultiplier and Ignore that take a PropertyInfo argument instead of the strings if that's more appropriate for the case in hand).

Explaining the defaults

The types that the AutomatedIndexGeneratorFactory requires are a Key Retriever, a Key Comparer, a String Normaliser, a Token Breaker, a Weighted Entry Combiner and a Token Weight Determiner.

The first is the most simple - it needs a way to extract a Key for each source data instance. In this example, that's the int "Id" field. We have to specify the type of the source data (Post) and type of Key (int) in the generic type parameters when instantiating the AutomatedIndexGeneratorFactoryBuilder. The default behaviour is to look for properties named "Key" or "Id" on the data type, whose property type is assignable to the type of the key. So in this example, it just grabs the "Id" field from each Post. If alternate behaviour was required then the SetKeyRetriever method may be called on the factory builder to explicitly define a Func<TSource, TKey> to do the job.

The default Key Comparer uses the DefaultEqualityComparer<TKey> class, which just checks for equality using the Equals class of TKey. If this needs overriding for any reason, then the SetKeyComparer method will take an IEqualityComparer<TKey> to do the job.

The String Normaliser used is the EnglishPluralityStringNormaliser, wrapping a DefaultStringNormaliser. I've written about these in detail before (see The Full Text Indexer - Token Breaker and String Normaliser variations). The gist is that punctuation, accented characters, character casing and pluralisation are all flattened so that common expected matches can be made. If this isn't desirable, there's a SetStringNormaliser method that takes an IStringNormaliser. There's a pattern developing here! :)

The Token Breaker dissects text content into individual tokens (normally individual words). The default will break on any whitespace, brackets (round, triangular, square or curly) and other punctuation that tends to define word breaks such as commas, colons, full stops, exclamation marks, etc.. (but not apostrophes, for example, which mightn't mark word breaks). There's a SetTokenBreaker which takes an ITokenBreak reference if you want it.

The Weighted Entry Combiner describes the calculation for combining match weight when multiple tokens for the same Key are found. If, for example, I have the word "article" once in the Title of a Post (with weight multiplier 5 for Title, as in the examples above) and the same word twice in the Content, then how should these be combined into the final match weight for that Post when "article" is searched for? Should it be the greatest value (5)? Should it be the sum of all of the weights (5 + 1 + 1 = 7)? The Weighted Entry Combiner takes a set of match weights and must return the final combined value. The default is to sum them together, but there's always the SetWeightedEntryCombiner method if you disagree!

Nearly there.. the Token Weight Determiner specifies what weight each token that is extracted from the text content should be given. By default, tokens are given a weight of 1 for each match unless they are from a property to ignore (in which they are skipped) or they are from a property that was specified by the SetWeightCombiner method, in which case they will take the value provided there. Any English stop words (common and generally irrelevant words such as "a", "an" and "the") have their weights divided by 100 (so they're not removed entirely, but matches against them count much less than matches for anything else). This entire process can be replaced by calling SetTokenWeightDeterminer with an alternate implementation (the property that the data has been extracted from will be provided so different behaviour per-source-property can be supported, if required).

Phew!

Well done if you got drawn in with the introductory this-will-make-it-really-easy promise and then actually got through the detail as well! :)

I probably went deeper off into a tangent on the details than I really needed to for this post. But if you're somehow desperate for more then I compiled my previous posts on this topic into a Full Text Indexer Round-up where there's plenty more to be found!

Posted at 00:01

Tags:

Comments

26 February 2012

The artist previously known as the AutoMapper-By-Constructor

I've had a series of posts that was initiated by a desire to integrate AutoMapper more easily with classes that are instantiated with so-called "verbose constructors"..

.. that ended up going on somewhat of a tangent and enabled the generation of compilable converters (using LINQ Expressions) that didn't utilise AutoMapper for the majority of simple cases.

While the original intention of the project was to handle the conversion to these "verbose constructor"-based types, it struck me a few days ago that it shouldn't be much work to put together a class similar to the CompilableTypeConverterByConstructor that instead instantiates a type with a parameter-less constructor and sets the data through property-setters rather than by converter. The concept that started this all off in my head was a service that exposed xml-serialisable objects at the boundary but used "always-valid" internal representations (ie. immutable data where all values were specified and validated by constructor) - I wanted a way to convert to internal types. But with this property-setting approach the code could transform both ways.

(Just a quick side-node that for transformations to data-set-by-property types, AutoMapper is actually a much more full-featured package but for what I had in mind the simple name-matching in my project coupled with the significantly improved performance from the compiled converters was a better fit).

I still find LINQ Expressions hard to write

I envisaged something along the lines of a new class

public class CompilableTypeConverterByPropertySetting<TSource, TDest>
    : ICompilableTypeConverter<TSource, TDest> where TDest : new()
{
    public CompilableTypeConverterByPropertySetting(
        IEnumerable<ICompilablePropertyGetter> propertyGetters,
        IEnumerable<PropertyInfo> propertiesToSet)
    {
        // Do constructor work..

where the number of propertyGetters would match the number of propertiesToSet. I won't go back over the ICompilableTypeConverter since it's not that important right this second but the property getters are:

public interface ICompilablePropertyGetter : IPropertyGetter
{
    /// <summary>
    /// This must return a Linq Expression that retrieves the value from SrcType.Property as
    /// TargetType - the specified "param" Expression must have a type that is assignable to
    /// SrcType.
    /// </summary>
    Expression GetPropertyGetterExpression(Expression param);
}

public interface IPropertyGetter
{
    /// <summary>
    /// The type whose property is being accessed
    /// </summary>
    Type SrcType { get; }

    /// <summary>
    /// The property on the source type whose value is to be retrieved
    /// </summary>
    PropertyInfo Property { get; }

    /// <summary>
    /// The type that the property value should be converted to and returned as
    /// </summary>
    Type TargetType { get; }

    /// <summary>
    /// Try to retrieve the value of the specified Property from the specified object (which
    /// must be of type SrcType) - this will throw an exception for null or if retrieval fails
    /// </summary>
    object GetValue(object src);
}

So this should be easy! All I need is to create LINQ Expressions that can take a ParameterExpression of type TSource, use it to instantiate a new TDest and set each of the properties that I already have. And I've already got Expressions to retrieve the data from the TSource instance for each of the properties!

private Func<TSource, TDest> GenerateCompiledConverter()
{
    // Declare an expression to represent the src parameter
    var src = Expression.Parameter(typeof(TSource), "src");

    // Declare a local variable that will be used within the Expression block to have a new
    // instance assigned to it and properties set
    var dest = Expression.Parameter(typeof(TDest));

    // Build up a list of Expressions that:
    // 1. Instantiate a new TDest instance
    var newInstanceGenerationExpressions = new List<Expression>
    {
        Expression.Assign(
            dest,
            Expression.New(typeof(TDest).GetConstructor(new Type[0]))
        )
    };

    // 2 Set properties on the new instance
    for (var index = 0; index < _propertiesToSet.Count; index++)
    {
        newInstanceGenerationExpressions.Add(
            Expression.Call(
                dest,
                _propertiesToSet[index].GetSetMethod(),
                _propertyGetters[index].GetPropertyGetterExpression(src)
            )
        );
    }

    // 3. Return the reference
    newInstanceGenerationExpressions.Add(
        dest
    );

    // Return compiled expression that instantiates a new object by retrieving properties
    // from the source and passing as constructor arguments
    return Expression.Lambda<Func<TSource, TDest>>(
        Expression.Block(
            new[] { dest },
            newInstanceGenerationExpressions
        ),
        src
    ).Compile();
}

(Take it as read that _propertiesToSet and _propertyGetters are PropertyInfo[] and ICompilablePropertyGetter[] that are validated and set as class-scoped members by the constructor).

And indeed it does look easy! And I'm kinda wondering what all the fuss was about, but it took me a fair bit of tinkering and reasoning to get here since the LINQ Expression tutorials and examples just aren't that easy to track down! And it's not like you can easily take apart arbitrary example code like when dealing with IL (see the IL Disassembler mention in Dynamically applying interfaces to objects).

But I got there in the end! The only slightly odd thing is that the last expression has to be the ParameterExpression "dest" that we've constructed, otherwise the block won't return anything - it just returns the result of the last expression.

Ok. I've actually lied. That isn't quite all of it. As an ICompilableTypeConverter, the CompilableTypeConverterByPropertySetting should be able to handle null values so that the CompilableTypeConverterPropertyGetter class can take any ICompilableTypeConverter reference and use it to retrieve and convert property values.. even when they're null. So the last section becomes:

    // Return compiled expression that instantiates a new object by retrieving properties
    // from the source and passing as constructor arguments
    return Expression.Lambda<Func<TSource, TDest>>(

        Expression.Condition
            Expression.Equal(
                src,
                Expression.Constant(null)
            ),
            Expression.Constant(default(TDest), typeof(TDest)),
            Expression.Block(
                new[] { dest },
                newInstanceGenerationExpressions
            )
        ),

        src

    ).Compile();

.. so that it will return the default value to TDest (null unless TDest is a ValueType) if the TSource value is null.

Wrapping in a Factory

As with the similar CompilableTypeConverterByConstructor class there's a factory class which will examine given TSource and TDest types and try to generate a CompilableTypeConverterByPropertySetting<TSource, TDest> instance based on the ICompilablePropertyGetter set it has (and the INameMatcher for matching source and destination properties).

I've also updated the ExtendableCompilableTypeConverterFactory (see The Less-Effort Extendable LINQ-compilable Mappers) such that it is more generic and doesn't insist on being based around CompilableTypeConverterByConstructorFactory. There is now a static helper class to instantiate an ExtendableCompilableTypeConverterFactory instance based upon whether the target type is to have its data set by-constructor or by-property-setting since the changes to ExtendableCompilableTypeConverterFactory have made it very abstract!

Splitting the AutoMapper dependency

Since the majority of work in this solution no longer requires AutoMapper, I've broken out a separate project "AutoMapperIntegration" which houses the AutoMapperEnabledPropertyGetter and AutoMapperEnabledPropertyGetterFactory classes so now the main project has no AutoMapper reference. My original intention was improve how AutoMapper worked with by-constructor conversions and this functionality is still available - without taking advantage of the compiled converters - by referencing the main project along with AutoMapperIntegration (and so the example in Teaching AutoMapper about "verbose constructors" is still applicable).

And so I've renamed the solution itself to...

The Compilable Type Converter!

Yeah, yeah, not too imaginative a title, I will admit! :)

I've actually moved my code over to BitBucket (see upcoming post!) from GitHub, so the code that I've been talking about can now be found at:

https://bitbucket.org/DanRoberts/compilabletypeconverter

An apology

This has been a particularly dry and largely self-involved post but if the Compilable Type Converter sounds like it might be useful to you, check out that BitBucket link and there's an introduction on the Overview page which jumps straight into example code.

Examples

To demonstrate the generation of a converter from a generic SourceType class to one that is based upon verbose constructors:

// Prepare a converter factory using the base types (AssignableType and
// EnumConversion property getter factories)
var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();
var converterFactory = ExtendableCompilableTypeConverterFactoryHelpers.GenerateConstructorBasedFactory(
    nameMatcher,
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new ICompilablePropertyGetterFactory[]
    {
        new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
        new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
    }
);

// Extend the converter to handle SourceType.Sub1 to ConstructorDestType.Sub1 and
// IEnumerable<SourceType.Sub1> to IEnumerable<ConstructorDestType.Sub1>
// - This will raise an exception if unable to create the mapping
converterFactory = converterFactory.CreateMap<SourceType.Sub1, ConstructorDestType.Sub1>();

// This will enable the creation of a converter for SourceType to ConstructorDestType
// - This will return null if unable to generate an appropriate converter
var converter = converterFactory.Get<SourceType, ConstructorDestType>();
if (converter == null)
    throw new Exception("Unable to obtain a converter");

var result = converter.Convert(new SourceType()
{
    Value = new SourceType.Sub1() { Name = "Bo1" },
    ValueList = new[]
    {
        new SourceType.Sub1() { Name = "Bo2" },
        null,
        new SourceType.Sub1() { Name = "Bo3" }
    },
    ValueEnum = SourceType.Sub2.EnumValue2
});

public class SourceType
{
    public Sub1 Value { get; set; }
    public IEnumerable<Sub1> ValueList { get; set; }
    public Sub2 ValueEnum { get; set; }
    public class Sub1
    {
        public string Name { get; set; }
    }
    public enum Sub2
    {
        EnumValue1,
        EnumValue2,
        EnumValue3,
        EnumValue4,
        EnumValue5,
        EnumValue6,
        EnumValue7,
        EnumValue8
    }
}

public class ConstructorDestType
{
    public ConstructorDestType(Sub1 value, IEnumerable<Sub1> valueList, Sub2 valueEnum)
    {
        if (value == null)
            throw new ArgumentNullException("value");
        if (valueList == null)
            throw new ArgumentNullException("valueList");
        if (!Enum.IsDefined(typeof(Sub2), valueEnum))
            throw new ArgumentOutOfRangeException("valueEnum");
        Value = value;
        ValueList = valueList;
        ValueEnum = valueEnum;
    }
    public Sub1 Value { get; private set; }
    public IEnumerable<Sub1> ValueList { get; private set; }
    public Sub2 ValueEnum { get; private set; }
    public class Sub1
    {
        public Sub1(string name)
        {
            name = (name ?? "").Trim();
            if (name == "")
                throw new ArgumentException("Null/empty name specified");
            Name = name;
        }
        public string Name { get; private set; }
    }
    public enum Sub2 : uint
    {
        EnumValue1 = 99,
        EnumValue2 = 100,
        EnumValue3 = 101,
        EnumValue4 = 102,
        EnumValue5 = 103,
        enumValue_6 = 104,
        EnumValue7 = 105
    }
}

.. and the equivalent where the destination types are based upon property-setting:

// Prepare a converter factory using the base types (AssignableType and EnumConversion property
// getter factories)
var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();
var converterFactory = ExtendableCompilableTypeConverterFactoryHelpers.GeneratePropertySetterBasedFactory(
    nameMatcher,
    CompilableTypeConverterByPropertySettingFactory.PropertySettingTypeOptions.MatchAsManyAsPossible,
    new ICompilablePropertyGetterFactory[]
    {
        new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
        new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
    }
);

// Extend the converter to handle SourceType.Sub1 to ConstructorDestType.Sub1 and
// IEnumerable<SourceType.Sub1> to IEnumerable<ConstructorDestType.Sub1>
// - This will raise an exception if unable to create the mapping
converterFactory = converterFactory.CreateMap<SourceType.Sub1, PropertySettingDestType.Sub1>();

// This will enable the creation of a converter for SourceType to ConstructorDestType
// - This will return null if unable to generate an appropriate converter
var converter = converterFactory.Get<SourceType, PropertySettingDestType>();
if (converter == null)
    throw new Exception("Unable to obtain a converter");

var result = converter.Convert(new SourceType()
{
    Value = new SourceType.Sub1() { Name = "Bo1" },
    ValueList = new[]
    {
        new SourceType.Sub1() { Name = "Bo2" },
        null,
        new SourceType.Sub1() { Name = "Bo3" }
    },
    ValueEnum = SourceType.Sub2.EnumValue2
});

public class SourceType
{
    public Sub1 Value { get; set; }
    public IEnumerable<Sub1> ValueList { get; set; }
    public Sub2 ValueEnum { get; set; }
    public class Sub1
    {
        public string Name { get; set; }
    }
    public enum Sub2
    {
        EnumValue1,
        EnumValue2,
        EnumValue3,
        EnumValue4,
        EnumValue5,
        EnumValue6,
        EnumValue7,
        EnumValue8
    }
}

public class PropertySettingDestType
{
    public Sub1 Value { get; set; }
    public IEnumerable<Sub1> ValueList { get; set; }
    public Sub2 ValueEnum { get; set; }
    public class Sub1
    {
        public string Name { get; set; }
    }
    public enum Sub2 : uint
    {
        EnumValue1 = 99,
        EnumValue2 = 100,
        EnumValue3 = 101,
        EnumValue4 = 102,
        EnumValue5 = 103,
        enumValue_6 = 104,
        EnumValue7 = 105
    }
}

Posted at 21:39

Tags:

Comments

3 January 2012

The Less-Effort Extendable LINQ-compilable Mappers

The last post almost finished off something I originally started back last April and enabled the creation of Compilable Type Converters which take properties from a source type and feed them in as constructor arguments on a destination type.

The only issue I had is that the final code to set up conversions was a bit verbose. To create a Converter from SourceEmployee to DestEmployee -

public class SourceEmployee
{
    public string Name { get; set; }
    public SourceRole Role { get; set; }
}

public class SourceRole
{
    public string Description { get; set; }
}

public class DestEmployee
{
    public DestEmployee(string name, DestRole role)
    {
        Name = name;
        Role = role;
    }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

public class DestRole
{
    public DestRole(string description)
    {
        Description = description;
    }
    public string Description { get; private set; }
}

the following code was required:

var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();

var roleConverterFactory = new CompilableTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new CombinedCompilablePropertyGetterFactory(
        new ICompilablePropertyGetterFactory[]
        {
            new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
            new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
        }
    )
);

var employeeConverterFactory = new CompilableTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new CombinedCompilablePropertyGetterFactory(
        new ICompilablePropertyGetterFactory[]
        {
            new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
            new CompilableEnumConversionPropertyGetterFactory(nameMatcher),
            new CompilableTypeConverterPropertyGetterFactory<SourceRole, DestRole>(
                nameMatcher,
                roleConverterFactory.Get<SourceRole, DestRole>()
            )
        }
    )
);

var employeeConverter = employeeConverterFactory.Get<SourceEmployee, DestEmployee>();

For more complicated type graphs this could quickly get tiring! What I really wanted to do was this:

var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();
var converterFactory = new ExtendableCompilableTypeConverterFactory(
    nameMatcher,
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new ICompilablePropertyGetterFactory[]
    {
        new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
        new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
    }
);
converterFactory = converterFactory.CreateMap<SourceRole, DestRole>();
var converter = converterFactory.Get<SourceEmployee, DestEmployee>();

The ExtendableCompilableTypeConverterFactory

This class basically wraps up the duplication seen above and returns a new ExtendableCompilableTypeConverterFactory instance each time that CreateMap is successfully called, the new instance having a Compilable Property Getter than can support that mapping. If the CreateMap calls was not successful then an exception will be raised - this will be case if there is no constructor on the destination type whose arguments can all be satisfied by properties on the source type (this also covers cases where additional mappings are required for referenced types). This exception is equivalent to the AutoMapperMappingException that AutoMapper throws in similar circumstances.

I'm just going to jump right in with this - if you've been reading this far then this will hold no challenges or surprises.

public class ExtendableCompilableTypeConverterFactory : ICompilableTypeConverterFactory
{
    private INameMatcher _nameMatcher;
    private ITypeConverterPrioritiserFactory _converterPrioritiser;
    private List<ICompilablePropertyGetterFactory> _basePropertyGetterFactories;
    private Lazy<ICompilableTypeConverterFactory> _typeConverterFactory;
    public ExtendableCompilableTypeConverterFactory(
        INameMatcher nameMatcher,
        ITypeConverterPrioritiserFactory converterPrioritiser,
        IEnumerable<ICompilablePropertyGetterFactory> basePropertyGetterFactories)
    {
        if (nameMatcher == null)
            throw new ArgumentNullException("nameMatcher");
        if (converterPrioritiser == null)
            throw new ArgumentNullException("converterPrioritiser");
        if (basePropertyGetterFactories == null)
            throw new ArgumentNullException("basePropertyGetterFactories");

        var basePropertyGetterFactoryList = new List<ICompilablePropertyGetterFactory>();
        foreach (var basePropertyGetterFactory in basePropertyGetterFactories)
        {
            if (basePropertyGetterFactory == null)
                throw new ArgumentException("Null entry encountered in basePropertyGetterFactories");
            basePropertyGetterFactoryList.Add(basePropertyGetterFactory);
        }

        _nameMatcher = nameMatcher;
        _converterPrioritiser = converterPrioritiser;
        _basePropertyGetterFactories = basePropertyGetterFactoryList;
        _typeConverterFactory = new Lazy<ICompilableTypeConverterFactory>(
            getConverterFactory,
            true
        );
    }

    private ICompilableTypeConverterFactory getConverterFactory()
    {
        return new CompilableTypeConverterByConstructorFactory(
            _converterPrioritiser,
            new CombinedCompilablePropertyGetterFactory(_basePropertyGetterFactories)
        );
    }

    /// <summary>
    /// This will return null if a converter could not be generated
    /// </summary>
    public ICompilableTypeConverterByConstructor<TSource, TDest> Get<TSource, TDest>()
    {
        return _typeConverterFactory.Value.Get<TSource, TDest>();
    }

    ITypeConverter<TSource, TDest> ITypeConverterFactory.Get<TSource, TDest>()
    {
        return Get<TSource, TDest>();
    }

    /// <summary>
    /// This will throw an exception if unable to generate the requested mapping - it will
    /// never return null. If the successful, the returned converter factory will be able
    /// to convert instances of TSourceNew as well as IEnumerable / Lists of them.
    /// </summary>
    public ExtendableCompilableTypeConverterFactory CreateMap<TSourceNew, TDestNew>()
    {
        // Try to generate a converter for the requested mapping
        var converterNew = _typeConverterFactory.Value.Get<TSourceNew, TDestNew>();
        if (converterNew == null)
            throw new Exception("Unable to create mapping");
        return AddNewConverter<TSourceNew, TDestNew>(converterNew);
    }

    /// <summary>
    /// Generate a further extended converter factory that will be able to handle conversion
    /// of instances of TSourceNew as well as IEnumerable / Lists of them. This will never
    /// return null.
    /// </summary>
    public ExtendableCompilableTypeConverterFactory AddNewConverter<TSourceNew, TDestNew>(
        ICompilableTypeConverter<TSourceNew, TDestNew> converterNew)
    {
        if (converterNew == null)
            throw new ArgumentNullException("converterNew");

        // Create a property getter factory that retrieves and convert properties using this
        // converter and one that does the same for IEnumerable properties, where the
        // IEnumerables' elements are the types handled by the converter
        var extendedPropertyGetterFactories = new List<ICompilablePropertyGetterFactory>(
            _basePropertyGetterFactories
        );
        extendedPropertyGetterFactories.Add(
            new CompilableTypeConverterPropertyGetterFactory<TSourceNew, TDestNew>(
                _nameMatcher,
                converterNew
            )
        );
        extendedPropertyGetterFactories.Add(
            new ListCompilablePropertyGetterFactory<TSourceNew, TDestNew>(
                _nameMatcher,
                converterNew
            )
        );

        // Return a new ExtendableCompilableTypeConverterFactory that can make use of these
        // new property getter factories
        return new ExtendableCompilableTypeConverterFactory(
            _nameMatcher,
            _converterPrioritiser,
            extendedPropertyGetterFactories
        );
    }
}

Ok.. except one. I've sprung the ListCompilablePropertyGetterFactory. The ListCompilablePropertyGetter is similar to the CompilableTypeConverterPropertyGetter but will deal with properties and constructor arguments which are IEnumerable<SourceType> and IEnumerable<DestType>, resp.

This means that the ExtendableCompilableTypeConverterFactory setup code above would have worked if the SourceType and DestType were

public class SourceEmployee
{
    public string Name { get; set; }
    public SourceRole[] Role { get; set; }
}

public class DestEmployee
{
    public DestEmployee(string name, IEnumerable<DestRole> role)
    {
        Name = name;
        Role = role;
    }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

as the CreateMap would return a Converter Factory that could map SourceRole to DestRole and IEnumerable<SourceRole> to IEnumerable<DestRole>.

CreateMap vs AddNewConverter

The CreateMap method will try to generate a new Converter and build new Property Getter Factories using that by passing it to AddNewConverter. If you need to add any custom mapping mechanisms then AddNewConverter may be called with an ICompilableTypeConverter.

For example, if our types now looked like

public class SourceEmployee
{
    public int Id { get; set; }
    public string Name { get; set; }
    public SourceRole[] Role { get; set; }
}

public class DestEmployee
{
    public DestEmployee(string id, string name, IEnumerable<DestRole> role)
    {
        Id = id;
        Name = name;
        Role = role;
    }
    public string Id { get; private set; }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

then we would need a way to translate int to string when the name matcher identifies the potential "Id" to "id" mapping. We could do that with AddNewConverter and a custom ICompilableTypeConverter implementation -

var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();
var converterFactory = new ExtendableCompilableTypeConverterFactory(
    nameMatcher,
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new ICompilablePropertyGetterFactory[]
    {
        new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
        new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
    }
);
converterFactory = converterFactory.CreateMap<SourceRole, DestRole>();
converterFactory = converterFactory.AddNewConverter<int, string>(
    new CompilableIntToStringTypeConverter()
);
var converter = converterFactory.Get<SourceEmployee, DestEmployee>();

public class CompilableIntToStringTypeConverter : ICompilableTypeConverter<int, string>
{
    public string Convert(int src)
    {
        return src.ToString();
    }

    public Expression GetTypeConverterExpression(Expression param)
    {
        if (param == null)
            throw new ArgumentNullException("param");
        return Expression.Call(
            param,
            typeof(int).GetMethod("ToString", new Type[0])
        );
    }
}

See, I promised last time that splitting ICompilableTypeConverter away from ICompilableTypeConverterByConstructor at some point! :)

Signing off

This has all turned into a bit of a saga! The final code for all this can be found at

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/

I've not done loads of performance testing but the generated Converters have consistently been around 1.1 or 1.2 times as slow as hand-rolled code (ie. approximately the same), not including the work required to generate the Converters. Compared to AutoMapper, this is quite a win (which was what originally inspired me to go on this journey). But out of the box it doesn't support all the many configurations that AutoMapper does! My main use case was to map legacy WebService objects (with parameter-less constructors) onto internal objects (with verbose constructors) which is all done. But there's currently no way to map back.. I think that's something to worry about another day! :)

Posted at 21:52

Tags:

Comments

2 January 2012

Extendable LINQ-compilable Mappers

To pick up from where I left off in a previous post, I was trying to write something that could automatically generate LINQ Expressions that could translate from (for example) -

public class SourceEmployee
{
    public string Name { get; set; }
    public SourceRole Role { get; set; }
}

public class SourceRole
{
    public string Description { get; set; }
}

public class DestEmployee
{
    public DestEmployee(string name, DestRole role)
    {
        Name = name;
        Role = role;
    }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

public class DestRole
{
    public DestRole(string description)
    {
        Description = description;
    }
    public string Description { get; private set; }
}

by applying name matching logic between properties on the source types and constructor arguments on the destination types. Having this all performed by LINQ Expressions should allow the final conversion to be comparatively fast to hand-rolled code.

This was all kicked off initially since I was using AutoMapper for some work and wasn't happy with its approach to mapping to types that have to be initialised with verbose constructors (as opposed to a parameter-less constructor and then the setting of individual properties). This much was achieved and the solution can be found here -

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/tree/FirstImplementation.

But I wanted to see if I could improve the performance by removing AutoMapper from the equation and using LINQ Expressions.

A more detailed recap

Where we left the code as of

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/tree/LinqExpressionPropertyGetters

we had the class

public class CompilableTypeConverterByConstructor<TSource, TDest>
    : ITypeConverterByConstructor<TSource, TDest>
{
    // ..
    private Lazy<Func<TSource, TDest>> _converter;
    public CompilableTypeConverterByConstructor(
        IEnumerable<ICompilablePropertyGetter> propertyGetters,
        ConstructorInfo constructor)
    {
        // ..
        _converter = new Lazy<Func<TSource, TDest>>(generateCompiledConverter, true);
    }

    public ConstructorInfo Constructor
    {
        get
        {
            // ..
        }
    }

    public TDest Convert(TSource src)
    {
        if (src == null)
            throw new ArgumentNullException("src");

        return _converter.Value(src);
    }

    private Func<TSource, TDest> generateCompiledConverter()
    {
        var srcParameter = Expression.Parameter(typeof(TSource), "src");
        var constructorParameterExpressions = new List<Expression>();
        foreach (var constructorParameter in _constructor.GetParameters())
        {
            var index = constructorParameterExpressions.Count;
            constructorParameterExpressions.Add(
                _propertyGetters[index].GetPropertyGetterExpression(srcParameter)
            );
        }

        return Expression.Lambda<Func<TSource, TDest>>(
            Expression.New(
                _constructor,
                constructorParameterExpressions.ToArray()
            ),
            srcParameter
        ).Compile();
    }
}

public interface ITypeConverterByConstructor<TSource, TDest>
{
    ConstructorInfo Constructor { get; }
    TDest Convert(TSource src);
}

which took a set of "Compilable Property Getters" that matched the arguments for a specified ConstructorInfo

public interface ICompilablePropertyGetter : IPropertyGetter
{
    Expression GetPropertyGetterExpression(Expression param);
}

public interface IPropertyGetter
{
    Type SrcType { get; }
    PropertyInfo Property { get; }
    Type TargetType { get; }
    object GetValue(object src);
}

and generated an internal conversion using LINQ Expressions.

There were only two Compilable Property Getters - CompilableAssignableTypesPropertyGetter, which would work with property-to-constructor-arguments where no conversion was required (eg. the available property was a string array and the constructor argument was an IEnumerable<string>) and CompilableEnumConversionPropertyGetter, which mapped one enum to another using an INameMatcher implementation. (The enum mapping LINQ Expression is generated by first coming up with a set of mappings and then generating a LINQ Expression consisting of a set of nested "if" statements for each mapped enum value).

public class CompilableAssignableTypesPropertyGetter<TSourceObject, TPropertyAsRetrieved>
    : AbstractGenericCompilablePropertyGetter<TSourceObject, TPropertyAsRetrieved>
{
    private PropertyInfo _propertyInfo;
    public CompilableAssignableTypesPropertyGetter(PropertyInfo propertyInfo)
    {
        if (propertyInfo == null)
            throw new ArgumentNullException("propertyInfo");
        if (!propertyInfo.DeclaringType.Equals(typeof(TSourceObject)))
            throw new ArgumentException("Invalid propertyInfo - DeclaringType must match TSourceObject");

        _propertyInfo = propertyInfo;
    }

    public override PropertyInfo Property
    {
        get { return _propertyInfo; }
    }

    public override Expression GetPropertyGetterExpression(Expression param)
    {
        if (param == null)
            throw new ArgumentNullException("param");
        if (!typeof(TSourceObject).IsAssignableFrom(param.Type))
            throw new ArgumentException("param.Type must be assignable to typeparam TSourceObject");

        Expression getter = Expression.Property(
            param,
            _propertyInfo
        );

        var targetType = typeof(TPropertyAsRetrieved);
        if (!targetType.IsAssignableFrom(_propertyInfo.PropertyType))
            getter = Expression.Convert(getter, targetType);

        if (!targetType.IsValueType && _propertyInfo.PropertyType.IsValueType)
            getter = Expression.TypeAs(getter, typeof(object));

        return getter;
    }
}

public abstract class AbstractGenericCompilablePropertyGetter<TSourceObject, TPropertyAsRetrieved>
    : ICompilablePropertyGetter
{
    private Lazy<Func<TSourceObject, TPropertyAsRetrieved>> _getter;
    public AbstractGenericCompilablePropertyGetter()
    {
        _getter = new Lazy<Func<TSourceObject, TPropertyAsRetrieved>>(generateGetter, true);
    }

    public Type SrcType
    {
        get { return typeof(TSourceObject); }
    }

    public abstract PropertyInfo Property { get; }

    public Type TargetType
    {
        get { return typeof(TPropertyAsRetrieved); }
    }

    object IPropertyGetter.GetValue(object src)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        if (!src.GetType().Equals(typeof(TSourceObject)))
            throw new ArgumentException("The type of src must match typeparam TSourceObject");
        return GetValue((TSourceObject)src);
    }

    public TPropertyAsRetrieved GetValue(TSourceObject src)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        return _getter.Value(src);
    }

    public abstract Expression GetPropertyGetterExpression(Expression param);

    private Func<TSourceObject, TPropertyAsRetrieved> generateGetter()
    {
        var param = Expression.Parameter(typeof(TSourceObject), "src");
        return Expression.Lambda<Func<TSourceObject, TPropertyAsRetrieved>>(
            GetPropertyGetterExpression(param),
            param
        ).Compile();
    }
}

public interface ICompilablePropertyGetter : IPropertyGetter
{
    /// <summary>
    /// This Linq Expression will retrieves the value from SrcType.Property as TargetType,
    /// the specified "param" Expression must have a type that is assignable to SrcType.
    /// </summary>
    Expression GetPropertyGetterExpression(Expression param);
}

public interface IPropertyGetter
{
    /// <summary>
    /// This is the type whose property is being accessed
    /// </summary>
    Type SrcType { get; }

    /// <summary>
    /// This is the property on the source type whose value is to be retrieved
    /// </summary>
    PropertyInfo Property { get; }

    /// <summary>
    /// This is the type that the property value should be converted to and returned as
    /// </summary>
    Type TargetType { get; }

    /// <summary>
    /// Try to retrieve the value of the specified Property from the specified object
    /// (which must be of type SrcType)
    /// </summary>
    object GetValue(object src);
}

and to generate instances of these classes we had some factories (CompilableTypeConverterByConstructorFactory, CompilableAssignableTypesPropertyGetterFactory and CompilableEnumConversionPropertyGetterFactory). These would do the work of examining the properties and constructors of specified source and destination type pairs and determining the best constructor that could be satisfied (if any) with the Compilable Property Getters. The code in these factories is none too exciting.

The problem

If the mappings we want to generate are for very simple structures (in this case, "simple" means that all property-to-constructor-argument mappings are either directly assignable-to or are enum mappings) then everything's rosy - eg.

public class SourceEmployee
{
    public string Name { get; set; }
    public SourceRole Role { get; set; }
}

public enum SourceRole
{
    big_boss_man,
    worker_bee
}

public class DestEmployee
{
    public DestEmployee(string name, DestRole role)
    {
        Name = name;
        Role = role;
    }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

public enum DestRole
{
    BigBossMan,
    WorkerBee
}

(The enum mapping in this example would be handled by specifying a CaseInsensitiveSkipUnderscoreNameMatcher for the CompilableEnumConversionPropertyGetterFactory).

But the problem I opened with does not come under this "simple structure" umbrella as in that case SourceRole and DestRole are types for which we have no Compilable Property Getter! Oh noes!

The CompilableTypeConverterPropertyGetter

For inspiration, I go back to AutoMapper since it too can not magically handle nested types -

class Program
{
    static void Main(string[] args)
    {
        AutoMapper.Mapper.CreateMap<SourceTypeSub1, DestTypeSub1>();
        AutoMapper.Mapper.CreateMap<SourceType, DestType>();
        var dest = AutoMapper.Mapper.Map<SourceType, DestType>(
            new SourceType()
            {
                Value = new SourceTypeSub1() { Name = "N1" }
            }
        );
    }
}

public class SourceType
{
    public SourceTypeSub1 Value { get; set; }
}

public class SourceTypeSub1
{
    public string Name { get; set; }
}

public class DestType
{
    public DestTypeSub1 Value { get; set; }
}

public class DestTypeSub1
{
    public string Name { get; set; }
}

without the CreateMap call for SourceTypeSub1 to DestTypeSub1, the Map call from SourceType to DestType would fail with an AutoMapperMappingException.

Following the same tack, a way to create a new Compilable Property Getter from a CompilableTypeConverterByConstructor (which could then be used alongside the existing AssignableType and Enum Compilable Property Getters) should solve the problem. A plan!

Step one is going to be to expose a way to request the LINQ Expression that the CompilableTypeConverterByConstructor uses in its conversion. To address this we'll update CompilableTypeConverterByConstructor to implement a new interface ICompilableTypeConverterByConstructor which in turn implements ITypeConverterByConstructor (which is all that CompilableTypeConverterByConstructor implemented previously) -

public interface ICompilableTypeConverterByConstructor<TSource, TDest>
    : ICompilableTypeConverter<TSource, TDest>,
      ITypeConverterByConstructor<TSource, TDest> { }

public interface ICompilableTypeConverter<TSource, TDest>
    : ITypeConverter<TSource, TDest>
{
    /// <summary>
    /// This Linq Expression will generate a new TDest instance - the specified "param"
    /// Expression must have a type that is assignable to TSource
    /// </summary>
    Expression GetTypeConverterExpression(Expression param);
}

public interface ITypeConverterByConstructor<TSource, TDest> : ITypeConverter<TSource, TDest>
{
    ConstructorInfo Constructor { get; }
}

public interface ITypeConverter<TSource, TDest>
{
    TDest Convert(TSource src);
}

The ITypeConverterByConstructor has now become a specialised form of ITypeConverter (with corresponding Compilable variants) which inherently makes sense but will also be useful where we're going (but let's not get ahead of ourselves, that's coming up later in the post).

More importantly is the ICompilableTypeConverter GetTypeConverterExpression method which allows the creation of a Compilable Property Getter that is based upon a conversion that we want to feed back into the mapper -

public class CompilableTypeConverterPropertyGetter<TSourceObject, TPropertyOnSource, TPropertyAsRetrieved>
    : AbstractGenericCompilablePropertyGetter<TSourceObject, TPropertyAsRetrieved>
{
    private PropertyInfo _propertyInfo;
    private ICompilableTypeConverter<TPropertyOnSource, TPropertyAsRetrieved> _compilableTypeConverter;
    public CompilableTypeConverterPropertyGetter(
        PropertyInfo propertyInfo,
        ICompilableTypeConverter<TPropertyOnSource, TPropertyAsRetrieved> compilableTypeConverter)
    {
        if (propertyInfo == null)
            throw new ArgumentNullException("propertyInfo");
        if (!propertyInfo.DeclaringType.Equals(typeof(TSourceObject)))
            throw new ArgumentException("Invalid propertyInfo - DeclaringType must match TSourceObject");
        if (!propertyInfo.PropertyType.Equals(typeof(TPropertyOnSource)))
            throw new ArgumentException("Invalid propertyInfo - PropertyType must match TPropertyOnSource");
        if (compilableTypeConverter == null)
            throw new ArgumentNullException("compilableTypeConverter");

        _propertyInfo = propertyInfo;
        _compilableTypeConverter = compilableTypeConverter;
    }

    public override PropertyInfo Property
    {
        get { return _propertyInfo; }
    }

    /// <summary>
    /// This Linq Expression will retrieves the value from SrcType.Property as TargetType,
    /// the specified "param" Expression must have a type that is assignable to SrcType.
    /// </summary>
    public override Expression GetPropertyGetterExpression(Expression param)
    {
        if (param == null)
            throw new ArgumentNullException("param");
        if (typeof(TSourceObject) != param.Type)
            throw new ArgumentException("param.NodeType must match typeparam TSourceObject");

        // Get property value (from object of type TSourceObject) without conversion (this
        // will be as type TPropertyOnSource)
        // - If value is null, return default TPropertyAsRetrieved (not applicable if a
        //   value type)
        // - Otherwise, pass through type converter (to translate from TPropertyOnSource
        //   to TPropertyAsRetrieved)
        var propertyValue = Expression.Property(param, _propertyInfo);
        var conversionExpression = _compilableTypeConverter.GetTypeConverterExpression(propertyValue);
        if (typeof(TPropertyOnSource).IsValueType)
            return conversionExpression;
        return Expression.Condition(
            Expression.Equal(
                propertyValue,
                Expression.Constant(null)
            ),
            Expression.Constant(default(TPropertyAsRetrieved), typeof(TPropertyAsRetrieved)),
            conversionExpression
        );
    }
}

A corresponding CompilableTypeConverterPropertyGetterFactory is straight-forward to write. Like the other Property Getter Factories, it doesn't do a huge amount - it will determine whether a named property can be retrieved from a specified type and converted into a specified type based upon name match rules and what kind of Property Getter that Factory can generate)

public class CompilableTypeConverterPropertyGetterFactory<TPropertyOnSource, TPropertyAsRetrieved>
    : ICompilablePropertyGetterFactory
{
    private INameMatcher _nameMatcher;
    private ICompilableTypeConverter<TPropertyOnSource, TPropertyAsRetrieved> _typeConverter;
    public CompilableTypeConverterPropertyGetterFactory(
        INameMatcher nameMatcher,
        ICompilableTypeConverter<TPropertyOnSource, TPropertyAsRetrieved> typeConverter)
    {
        if (nameMatcher == null)
            throw new ArgumentNullException("nameMatcher");
        if (typeConverter == null)
            throw new ArgumentNullException("typeConverter");

        _nameMatcher = nameMatcher;
        _typeConverter = typeConverter;
    }

    /// <summary>
    /// This will return null if unable to return an ICompilablePropertyGetter for the
    /// named property that will return a value as the requested type
    /// </summary>
    public ICompilablePropertyGetter Get(
        Type srcType,
        string propertyName,
        Type destPropertyType)
    {
        if (srcType == null)
            throw new ArgumentNullException("srcType");
        propertyName = (propertyName ?? "").Trim();
        if (propertyName == "")
            throw new ArgumentException("Null/empty propertyName specified");
        if (destPropertyType == null)
            throw new ArgumentNullException("destPropertyType");

        // If destination type does not match type converter's destination type then can
        // not handle the request; return null
        if (destPropertyType != typeof(TPropertyAsRetrieved))
            return null;

        // Try to get a property we CAN retrieve and convert as requested..
        var property = srcType.GetProperties().FirstOrDefault(p =>
            p.GetIndexParameters().Length == 0
            && _nameMatcher.IsMatch(propertyName, p.Name)
            && p.PropertyType == typeof(TPropertyOnSource)
        );
        if (property == null)
            return null;

        // .. if successful, use to instantiate a CompilableTypeConverterPropertyGetter
        return (ICompilablePropertyGetter)Activator.CreateInstance(
            typeof(CompilableTypeConverterPropertyGetter<,,>).MakeGenericType(
                srcType,
                property.PropertyType,
                destPropertyType
            ),
            property,
            _typeConverter
        );
    }

    IPropertyGetter IPropertyGetterFactory.Get(
        Type srcType,
        string propertyName,
        Type destPropertyType)
    {
        return Get(srcType, propertyName, destPropertyType);
    }
}

Note: I skipped over actually altering the CompilableTypeConverterByConstructor class to implement the GetTypeConverterExpression but it wasn't anything too complex, the generateCompiledConverter method was changed from

private Func<TSource, TDest> generateCompiledConverter()
{
    var srcParameter = Expression.Parameter(typeof(TSource), "src");
    var constructorParameterExpressions = new List<Expression>();
    foreach (var constructorParameter in _constructor.GetParameters())
    {
        var index = constructorParameterExpressions.Count;
        constructorParameterExpressions.Add(
            _propertyGetters[index].GetPropertyGetterExpression(srcParameter)
        );
    }

    return Expression.Lambda<Func<TSource, TDest>>(
        Expression.New(
            _constructor,
            constructorParameterExpressions.ToArray()
        ),
        srcParameter
    ).Compile();
}

and expanded into

private Func<TSource, TDest> generateCompiledConverter()
{
    var srcParameter = Expression.Parameter(typeof(TSource), "src");
    return Expression.Lambda<Func<TSource, TDest>>(
        GetTypeConverterExpression(srcParameter),
        srcParameter
    ).Compile();
}

/// <summary>
/// This Linq Expression will generate a new TDest instance - the specified "param"
/// Expression must have a type that is assignable to TSource
/// </summary>
public Expression GetTypeConverterExpression(Expression param)
{
    if (param == null)
        throw new ArgumentNullException("param");
    if (!typeof(TSource).IsAssignableFrom(param.Type))
        throw new ArgumentException("param.Type must be assignable to typeparam TSource");

    // Instantiate expressions for each constructor parameter by using each of the
    // property getters against the source value
    var constructorParameterExpressions = new List<Expression>();
    foreach (var constructorParameter in _constructor.GetParameters())
    {
        var index = constructorParameterExpressions.Count;
        constructorParameterExpressions.Add(
            _propertyGetters[index].GetPropertyGetterExpression(param)
        );
    }

    // Return an expression that to instantiate a new TDest by using property getters
    // as constructor arguments
    return Expression.Condition(
        Expression.Equal(
            param,
            Expression.Constant(null)
        ),
        Expression.Constant(default(TDest), typeof(TDest)),
        Expression.New(
            _constructor,
            constructorParameterExpressions.ToArray()
        )
    );
}

The only notable difference is that GetTypeConverterExpression should return an Expression that can deal with null values - we need this so that null properties can be retrieved from source types and passed to destination type constructors. Previously there was a null check against the "src" parameter passed to the Convert method, but this can be relaxed now that nulls have to be supported for this class to work as part of a Property Getter.

Almost there!

With the introduction of a CombinedCompilablePropertyGetterFactory (which will run through a set a Compilable Property Getter Factories for each request until one of the returns a non-null value to the Get request), we end up with this structure:

var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();
var converterFactory = new CompilableTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new CombinedCompilablePropertyGetterFactory(
        new ICompilablePropertyGetterFactory[]
        {
            // Insert Compilable Property Getter Factories here..
        }
    )
);

which finally allows a setup such as:

var nameMatcher = new CaseInsensitiveSkipUnderscoreNameMatcher();

var roleConverterFactory = new CompilableTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new CombinedCompilablePropertyGetterFactory(
        new ICompilablePropertyGetterFactory[]
        {
            new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
            new CompilableEnumConversionPropertyGetterFactory(nameMatcher)
        }
    )
);

var employeeConverterFactory = new CompilableTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new CombinedCompilablePropertyGetterFactory(
        new ICompilablePropertyGetterFactory[]
        {
            new CompilableAssignableTypesPropertyGetterFactory(nameMatcher),
            new CompilableEnumConversionPropertyGetterFactory(nameMatcher),
            new CompilableTypeConverterPropertyGetterFactory<SourceRole, DestRole>(
                nameMatcher,
                roleConverterFactory.Get<SourceRole, DestRole>()
            )
        }
    )
);

var employeeConverter = employeeConverterFactory.Get<SourceEmployee, DestEmployee>();

var dest = employeeConverter.Convert(
    new SourceEmployee()
    {
        Name = "Richard",
        Role = new SourceRole() { Description = "Penguin Cleaner" }
    }
);

Hoorah!

Now, there's a slight refinement that I want to look at next time but I think this post has gone on more than long enough.

Footnote

For the super-observant, I mentioned that the use of ITypeConverter (as opposed to necessarily requiring ITypeConverterByConstructor) would be touched on again in this post. Since I've run out of steam that will be covered next time too.

Posted at 14:11

Tags:

Comments

27 December 2011

Dynamically applying interfaces to objects - Part 3

In this final part of this mini series there are two enhancements I want to add:

For cases where the same interface may be applied to multiple objects, I want to replace the WrapObject method (which applies the interface to a single reference) with a GenerateInterfaceApplier method which returns an object that applies a specified interface to any given reference but only generating the IL once
The ability to recursively wrap returned data

eg. be able to apply ITest to the ExampleObject reference:

public interface ITest
{
    IEmployee Get(int id);
}

public interface IEmployee
{
    int Id { get; }
    string Name { get; }
}

public class ExampleObject
{
    public object Get(int id)
    {
        if (id != 1)
            throw new ArgumentException("Only id value 1 is supported");
        return new Person(1, "Ted");
    }

    public class Person
    {
        public Person(int id, string name)
        {
            if (string.IsNullOrEmpty(name))
                throw new ArgumentException("Null/blank name specified");
            Id = id;
            Name = name.Trim();
        }
        public int Id { get; private set; }
        public string Name { get; private set; }
    }
}

Task 1: The IInterfaceApplier class

The first part is fairly straight-forward so I'll jump straight in -

public class InterfaceApplierFactory : IInterfaceApplierFactory
{
    private string _assemblyName;
    private bool _createComVisibleClasses;
    private Lazy<ModuleBuilder> _moduleBuilder;
    public InterfaceApplierFactory(string assemblyName, ComVisibility comVisibilityOfClasses)
    {
        assemblyName = (assemblyName ?? "").Trim();
        if (assemblyName == "")
            throw new ArgumentException("Null or empty assemblyName specified");
        if (!Enum.IsDefined(typeof(ComVisibility), comVisibilityOfClasses))
            throw new ArgumentOutOfRangeException("comVisibilityOfClasses");

        _assemblyName = assemblyName;
        _createComVisibleClasses = (comVisibilityOfClasses == ComVisibility.Visible);
        _moduleBuilder = new Lazy<ModuleBuilder>(
            () =>
            {
                var assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(
                    new AssemblyName(_assemblyName),
                    AssemblyBuilderAccess.Run
                );
                return assemblyBuilder.DefineDynamicModule(
                    assemblyBuilder.GetName().Name,
                    false
                );
            },
            true // isThreadSafe
        );
    }

    public enum ComVisibility
    {
        Visible,
        NotVisible
    }

    public IInterfaceApplier<T> GenerateInterfaceApplier<T>()
    {
        if (!typeof(T).IsInterface)
            throw new ArgumentException("typeparam must be an interface type");

        var typeName = "InterfaceApplier" + Guid.NewGuid().ToString("N");
        var typeBuilder = _moduleBuilder.Value.DefineType(
            typeName,
            TypeAttributes.Public | TypeAttributes.Class | TypeAttributes.AutoClass |
            TypeAttributes.AnsiClass | TypeAttributes.BeforeFieldInit |
            TypeAttributes.AutoLayout,
            typeof(object),
            new Type[] { typeof(T) }
        );

        // The content from the previous posts goes here (generating the constructor,
        // properties and methods)..

        return new InterfaceApplier<T>(
            src => (T)Activator.CreateInstance(
                typeBuilder.CreateType(),
                src
            )
        );
    }

    public IInterfaceApplier GenerateInterfaceApplier(Type targetType)
    {
        var generate = this.GetType().GetMethod("GenerateInterfaceApplier", Type.EmptyTypes);
        var generateGeneric = generate.MakeGenericMethod(targetType);
        return (IInterfaceApplier)generateGeneric.Invoke(this, new object[0]);
    }

    private class InterfaceApplier<T> : IInterfaceApplier<T>
    {
        private Func<object, T> _conversion;
        public InterfaceApplier(Func<object, T> conversion)
        {
            if (!typeof(T).IsInterface)
                throw new ArgumentException("Invalid typeparam - must be an interface");
            if (conversion == null)
                throw new ArgumentNullException("conversion");
            _conversion = conversion;
        }

        public Type TargetType
        {
            get { return typeof(T); }
        }

        public T Apply(object src)
        {
            return _conversion(src);
        }

        object IInterfaceApplier.Apply(object src)
        {
            return Apply(src);
        }
    }
}

public interface IInterfaceApplierFactory
{
    IInterfaceApplier<T> GenerateInterfaceApplier<T>();
    IInterfaceApplier GenerateInterfaceApplier(Type targetType);
}

public interface IInterfaceApplier<T> : IInterfaceApplier
{
    new T Apply(object src);
}

public interface IInterfaceApplier
{
    Type TargetType { get; }
    object Apply(object src);
}

Using this class means that we only need one instance of the ModuleBuilder no matter how many interfaces we're wrapping around objects and an "IInterfaceApplier" is returned instead of a reference to the interface-wrapped object. Note that I've used the .Net 4.0 Lazy class to instantiate the ModuleBuilder only the first time that it's required, but if you're using an earlier version of .Net then this could be replaced with the implementation (see my previous post about this here) or even by instantiating it directly from within the constructor.

I've also supported an alternative method signature for GenerateInterfaceApplier such that the target interface can be specified as an argument rather than a typeparam to a generic method - this will become important in the next section and the only interesting things to note are how IInterfaceApplier<T> is returned from the generic method as opposed to the IInterfaceApplier returned from the typeparam-less signature and how the alternate method calls into the typeparam'd version using reflection.

Task 2: Recursively applying interfaces

The approach I'm going to use here is to introduce a new interface that will be used when generating the interface appliers -

public interface IReadValueConverter
{
    object Convert(PropertyInfo property, object value);
    object Convert(MethodInfo method, object value);
}

Values will be passed through this when returned by property getters or (non-void) methods and it wil be responsible for ensuring that the value returned from the Convert method matches the property.PropertyType / method.ReturnType.

This will mean we'll change the method signature to:

public InterfaceApplier<T> GenerateInterfaceApplier<T>(IReadValueConverter readValueConverter)

That we'll change the constructor on the generated type:

// Declare private fields
var srcField = typeBuilder.DefineField("_src", typeof(object), FieldAttributes.Private);
var readValueConverterField = typeBuilder.DefineField(
    "_readValueConverter",
    typeof(IReadValueConverter),
    FieldAttributes.Private
);

// Generate: base.ctor()
var ctorBuilder = typeBuilder.DefineConstructor(
    MethodAttributes.Public,
    CallingConventions.Standard,
    new[] { typeof(object) }
);
var ilCtor = ctorBuilder.GetILGenerator();
ilCtor.Emit(OpCodes.Ldarg_0);
ilCtor.Emit(OpCodes.Call, typeBuilder.BaseType.GetConstructor(Type.EmptyTypes));

// Generate: if (src != null), don't throw exception
var nonNullSrcArgumentLabel = ilCtor.DefineLabel();
ilCtor.Emit(OpCodes.Ldarg_1);
ilCtor.Emit(OpCodes.Brtrue, nonNullSrcArgumentLabel);
ilCtor.Emit(OpCodes.Ldstr, "src");
ilCtor.Emit(
    OpCodes.Newobj,
    typeof(ArgumentNullException).GetConstructor(new[] { typeof(string) })
);
ilCtor.Emit(OpCodes.Throw);
ilCtor.MarkLabel(nonNullSrcArgumentLabel);

// Generate: if (readValueConverter != null), don't throw exception
var nonNullReadValueConverterArgumentLabel = ilCtor.DefineLabel();
ilCtor.Emit(OpCodes.Ldarg_2);
ilCtor.Emit(OpCodes.Brtrue, nonNullReadValueConverterArgumentLabel);
ilCtor.Emit(OpCodes.Ldstr, "readValueConverter");
ilCtor.Emit(
    OpCodes.Newobj,
    typeof(ArgumentNullException).GetConstructor(new[] { typeof(string) })
);
ilCtor.Emit(OpCodes.Throw);
ilCtor.MarkLabel(nonNullReadValueConverterArgumentLabel);

// Generate: this._src = src
ilCtor.Emit(OpCodes.Ldarg_0);
ilCtor.Emit(OpCodes.Ldarg_1);
ilCtor.Emit(OpCodes.Stfld, srcField);

// Generate: this._readValueConverter = readValueConverter
ilCtor.Emit(OpCodes.Ldarg_0);
ilCtor.Emit(OpCodes.Ldarg_2);
ilCtor.Emit(OpCodes.Stfld, readValueConverterField);

// All done
ilCtor.Emit(OpCodes.Ret);

That we'll change the reading of properties:

// Define get method, if required
if (property.CanRead)
{
    var getFuncBuilder = typeBuilder.DefineMethod(
        "get_" + property.Name,
        MethodAttributes.Public | MethodAttributes.HideBySig | MethodAttributes.NewSlot |
        MethodAttributes.SpecialName | MethodAttributes.Virtual | MethodAttributes.Final,
        property.PropertyType,
        Type.EmptyTypes
    );

    // Generate: return this._readValueConverter.Convert(
    //  property.DeclaringType.GetProperty(property.Name)
    //  _src.GetType().InvokeMember(property.Name, BindingFlags.GetProperty, null, _src, null)
    // );
    var ilGetFunc = getFuncBuilder.GetILGenerator();
    ilGetFunc.Emit(OpCodes.Ldarg_0);
    ilGetFunc.Emit(OpCodes.Ldfld, readValueConverterField);
    ilGetFunc.Emit(OpCodes.Ldtoken, property.DeclaringType);
    ilGetFunc.Emit(
        OpCodes.Call,
        typeof(Type).GetMethod("GetTypeFromHandle", new[] { typeof(RuntimeTypeHandle) })
    );
    ilGetFunc.Emit(OpCodes.Ldstr, property.Name);
    ilGetFunc.Emit(
        OpCodes.Call,
        typeof(Type).GetMethod("GetProperty", new[] { typeof(string) })
    );
    ilGetFunc.Emit(OpCodes.Ldarg_0);
    ilGetFunc.Emit(OpCodes.Ldfld, srcField);
    ilGetFunc.Emit(OpCodes.Callvirt, typeof(Type).GetMethod("GetType", Type.EmptyTypes));
    ilGetFunc.Emit(OpCodes.Ldstr, property.Name);
    ilGetFunc.Emit(OpCodes.Ldc_I4, (int)BindingFlags.GetProperty);
    ilGetFunc.Emit(OpCodes.Ldnull);
    ilGetFunc.Emit(OpCodes.Ldarg_0);
    ilGetFunc.Emit(OpCodes.Ldfld, srcField);
    ilGetFunc.Emit(OpCodes.Ldnull);
    ilGetFunc.Emit(OpCodes.Callvirt, methodInfoInvokeMember);
    ilGetFunc.Emit(
        OpCodes.Callvirt,
        typeof(IReadValueConverter).GetMethod(
            "Convert",
            new[] { typeof(PropertyInfo), typeof(object) }
        )
    );
    if (property.PropertyType.IsValueType)
        ilGetFunc.Emit(OpCodes.Unbox_Any, property.PropertyType);
    ilGetFunc.Emit(OpCodes.Ret);
    propBuilder.SetGetMethod(getFuncBuilder);
}

And that we'll change the calling of methods:

// .. skipped out the first half of the method-generating code
// - see https://www.productiverage.com/Read/15

// Generate either:
//  _src.GetType().InvokeMember(method.Name, BindingFlags.InvokeMethod, null, _src, args);
// or
//  return this._readValueConverter.Convert(
//   method.DeclaringType.GetMethod(method.Name, {MethodArgTypes})
//   this._src.GetType().InvokeMember(
//    property.Name,
//    BindingFlags.InvokeMethod,
//    null,
//    _src,
//    null
//   )
//  );
if (!method.ReturnType.Equals(typeof(void)))
{
    // We only need to use the readValueConverter if returning a value

    // Generate: Type[] argTypes
    var argTypes = ilFunc.DeclareLocal(typeof(Type[]));

    // Generate: argTypes = new Type[x]
    ilFunc.Emit(OpCodes.Ldc_I4, parameters.Length);
    ilFunc.Emit(OpCodes.Newarr, typeof(Type));
    ilFunc.Emit(OpCodes.Stloc_1);
    for (var index = 0; index < parameters.Length; index++)
    {
        // Generate: argTypes[n] = ..;
        var parameter = parameters[index];
        ilFunc.Emit(OpCodes.Ldloc_1);
        ilFunc.Emit(OpCodes.Ldc_I4, index);
        ilFunc.Emit(OpCodes.Ldtoken, parameters[index].ParameterType);
        ilFunc.Emit(OpCodes.Stelem_Ref);
    }

    // Will call readValueConverter.Convert, passing MethodInfo reference before value
    ilFunc.Emit(OpCodes.Ldarg_0);
    ilFunc.Emit(OpCodes.Ldfld, readValueConverterField);
    ilFunc.Emit(OpCodes.Ldtoken, method.DeclaringType);
    ilFunc.Emit(
        OpCodes.Call,
        typeof(Type).GetMethod("GetTypeFromHandle", new[] { typeof(RuntimeTypeHandle) })
    );
    ilFunc.Emit(OpCodes.Ldstr, method.Name);
    ilFunc.Emit(OpCodes.Ldloc_1);
    ilFunc.Emit(
        OpCodes.Call,
        typeof(Type).GetMethod("GetMethod", new[] { typeof(string), typeof(Type[]) })
    );
}
ilFunc.Emit(OpCodes.Ldarg_0);
ilFunc.Emit(OpCodes.Ldfld, srcField);
ilFunc.Emit(OpCodes.Callvirt, typeof(Type).GetMethod("GetType", Type.EmptyTypes));
ilFunc.Emit(OpCodes.Ldstr, method.Name);
ilFunc.Emit(OpCodes.Ldc_I4, (int)BindingFlags.InvokeMethod);
ilFunc.Emit(OpCodes.Ldnull);
ilFunc.Emit(OpCodes.Ldarg_0);
ilFunc.Emit(OpCodes.Ldfld, srcField);
ilFunc.Emit(OpCodes.Ldloc_0);
ilFunc.Emit(OpCodes.Callvirt, methodInfoInvokeMember);

if (method.ReturnType.Equals(typeof(void)))
    ilFunc.Emit(OpCodes.Pop);
else
{
    ilFunc.Emit(
        OpCodes.Callvirt,
        typeof(IReadValueConverter).GetMethod(
            "Convert",
            new[] { typeof(MethodInfo), typeof(object) }
        )
    );
    if (method.ReturnType.IsValueType)
        ilFunc.Emit(OpCodes.Unbox_Any, method.ReturnType);
}

ilFunc.Emit(OpCodes.Ret);

Task 2.1: Implementing IReadValueConverter

A naive implementation might be as follows:

public class SimpleReadValueConverter : IReadValueConverter
{
    private IInterfaceApplierFactory _interfaceApplierFactory;
    public SimpleReadValueConverter(IInterfaceApplierFactory interfaceApplierFactory)
    {
        if (interfaceApplierFactory == null)
            throw new ArgumentNullException("interfaceApplierFactory");
        _interfaceApplierFactory = interfaceApplierFactory;
    }

    public object Convert(PropertyInfo property, object value)
    {
        if (property == null)
            throw new ArgumentNullException("property");
        return tryToConvertValueIfRequired(property.PropertyType, value);
    }

    public object Convert(MethodInfo method, object value)
    {
        if (method == null)
            throw new ArgumentNullException("method");
        return tryToConvertValueIfRequired(method.ReturnType, value);
    }

    private object tryToConvertValueIfRequired(Type targetType, object value)
    {
        if (targetType == null)
            throw new ArgumentNullException("targetType");

        // If no conversion is required, no work to do
        // - Note: We can only deal with applying interfaces to objects so if a conversion
        //   is required where the target is not an interface then there's nothing we can do
        //   here, we'll have to return the value unconverted (likewise, if the target type
        //   is an int but the current value is null, although this is obviously incorrect
        //   there's nothing we can do about it here)
        if (!targetType.IsInterface || (value == null)
        || (value.GetType().IsSubclassOf(targetType)))
            return value;

        return _interfaceApplierFactory.GenerateInterfaceApplier(targetType, this)
            .Apply(value);
    }
}

This will do the job but it jumps out at me that if the same interface needs to be applied to multiple return values (ie. from different properties or methods) then the work done to generate that interface applier will be repeated for each request. It might be better (require less memory and cpu resources) to build up a list of interfaces that have already been handled and re-use the interface appliers where possible -

public class CachedReadValueConverter : IReadValueConverter
{
    private IInterfaceApplierFactory _interfaceApplierFactory;
    private NonNullImmutableList<IInterfaceApplier> _interfaceAppliers;
    private object _writeLock;

    public CachedReadValueConverter(IInterfaceApplierFactory interfaceApplierFactory)
    {
        if (interfaceApplierFactory == null)
            throw new ArgumentNullException("interfaceApplierFactory");

        _interfaceApplierFactory = interfaceApplierFactory;
        _interfaceAppliers = new NonNullImmutableList<IInterfaceApplier>();
        _writeLock = new object();
    }

    public object Convert(PropertyInfo property, object value)
    {
        if (property == null)
            throw new ArgumentNullException("property");

        return tryToConvertValueIfRequired(property.PropertyType, value);
    }

    public object Convert(MethodInfo method, object value)
    {
        if (method == null)
            throw new ArgumentNullException("method");

        return tryToConvertValueIfRequired(method.ReturnType, value);
    }

    private object tryToConvertValueIfRequired(Type targetType, object value)
    {
        if (targetType == null)
            throw new ArgumentNullException("targetType");

        // If no conversion is required, no work to do
        // - Note: We can only deal with applying interfaces to objects so if a conversion
        //   is required where the target is not an interface then there's nothing we can
        //   do here so we'll have to return the value unconverted (likewise, if the target
        //   type is an int but the current value is null, although this is obviously
        //   incorrect but there's nothing we can do about it here)
        if (!targetType.IsInterface || (value == null)
        || (value.GetType().IsSubclassOf(targetType)))
            return value;

        // Do we already have an interface applier available for this type?
        var interfaceApplierExisting = _interfaceAppliers.FirstOrDefault(
            i => i.TargetType.Equals(targetType)
        );
        if (interfaceApplierExisting != null)
            return interfaceApplierExisting.Apply(value);

        // Try to generate new interface applier
        var interfaceApplierNew = _interfaceApplierFactory.GenerateInterfaceApplier(
            targetType,
            this
        );
        lock (_writeLock)
        {
            if (!_interfaceAppliers.Any(i => i.TargetType.Equals(targetType)))
                _interfaceAppliers = _interfaceAppliers.Add(interfaceApplierNew);
        }
        return interfaceApplierNew.Apply(value);
    }
}

There will still be cases where there have to be multiple interface appliers for a given interface if there are interrelated references but this should limit how many duplicates are generated. For example:

using COMInteraction.InterfaceApplication;
using COMInteraction.InterfaceApplication.ReadValueConverters;

namespace Tester
{
    class Program
    {
        static void Main(string[] args)
        {
            var n1 = new Node() { Name = "Node1" };
            var n2 = new Node() { Name = "Node2" };
            var n3 = new Node() { Name = "Node3" };
            n1.Next = n2;
            n2.Previous = n1;
            n2.Next = n3;
            n3.Previous = n2;

            var interfaceApplierFactory = new InterfaceApplierFactory(
                "DynamicAssembly",
                InterfaceApplierFactory.ComVisibility.NotVisible
            );
            var interfaceApplier = interfaceApplierFactory.GenerateInterfaceApplier<INode>(
                new CachedReadValueConverter(interfaceApplierFactory)
            );
            var n2Wrapped = interfaceApplier.Apply(n2);

        }

        public class Node
        {
            public string Name { get; set; }
            public Node Previous { get; set; }
            public Node Next { get; set; }
        }
    }

    public interface INode
    {
        string Name { get; set; }
        INode Previous { get; set; }
        INode Next { get; set; }
    }
}

Each unique interface applier has a specific class name (of the form "InterfaceApplier{0}" where {0} is a guid). Examining the properties on n2Wrapped you can see that the class for n2Wrapped is different to the class for its Next and Previous properties as the INode interface applier hadn't been completely generated against n2 before an INode interface applier was required for these properties. But after this, all further INode-wrapped instances will share the same interface applier as the Previous and Next properties received - so there will be one "wasted" generated class but that's still a better job than the SimpleReadValueConverter would have managed.

"Inaccessible Interface"

In the above example, the INode interface has to be located outside of the Program class since it must be accessible by the InterfaceApplierFactory and the Program class is private. This isn't an issue for the Node class as the InterfaceApplierFactory doesn't need direct access to that, it just returns an IInterfaceApplier that we pass the n2 reference to as an argument. Alternatively, the Program class could be made public. This isn't exactly rocket science but if it slips your mind then you're presented with an error such as -

TypeLoadException: "Type 'InterfaceApplierdb2cb792e09d424a8dcecbeca6276dc8' from assembly 'DynamicAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' is attempting to implement an inaccessible interface."

at the line

return new InterfaceApplier<T>(
    src => (T)Activator.CreateInstance(
        typeBuilder.CreateType(),
        src,
        readValueConverter
    )
);

in InterfaceApplierFactory, which isn't the friendliest of warnings!

Posted at 18:38

Tags:

Comments

14 December 2011

Dynamically applying interfaces to objects - Part 2

Today, I'm going to address some of the "future developments" I left at the end of the last post. Specifically:

Wrapping up the "Interface Applier" into a generic method that specifies the target interface
Handling interface hierarchies
Marking the wrapper as being ComVisible

Wrapping up in a method

Part one is easy. Take the code from the last article and wrap in

public static T WrapObject<T>(object src)
{
    if (src == null)
        throw new ArgumentNullException("src");
    if (!typeof(T).IsInterface)
        throw new ArgumentException("Typeparam T must be an interface type");

    // Insert existing code that generates the new class with its constructor, properties
    // and methods here..

    return (T)Activator.CreateInstance(
        typeBuilder.CreateType(),
        src
    );
}

Ta-da! Note that we ensure that the typeparam T really is an interface - we made assumptions about this in the last article, so we need to assert this fact here. (It means that we only ever have to deal with properties and methods, and that they will always be public).

Handling interface hierarchies

This part is not much more difficult. We'll introduce something to recursively trawl through any interfaces that the target interface implements and build a list of them all:

public class InterfaceHierarchyCombiner
{
    private Type _targetInterface;
    private List<Type> _interfaces;
    public InterfaceHierarchyCombiner(Type targetInterface)
    {
        if (targetInterface == null)
            throw new ArgumentNullException("targetInterface");
        if (!targetInterface.IsInterface)
            throw new ArgumentException("targetInterface must be an interface type", "targetInterface");

        _interfaces = new List<Type>();
        buildInterfaceInheritanceList(targetInterface, _interfaces);
        _targetInterface = targetInterface;
    }

    private static void buildInterfaceInheritanceList(Type targetInterface, List<Type> types)
    {
        if (targetInterface == null)
            throw new ArgumentNullException("targetInterface");
        if (!targetInterface.IsInterface)
            throw new ArgumentException("targetInterface must be an interface type", "targetInterface");
        if (types == null)
            throw new ArgumentNullException("types");

        if (!types.Contains(targetInterface))
            types.Add(targetInterface);

        foreach (var inheritedInterface in targetInterface.GetInterfaces())
        {
            if (!types.Contains(inheritedInterface))
            {
                types.Add(inheritedInterface);
                buildInterfaceInheritanceList(inheritedInterface, types);
            }
        }
    }

    public Type TargetInterface
    {
        get { return _targetInterface; }
    }

    public IEnumerable<Type> Interfaces
    {
        get { return _interfaces.AsReadOnly(); }
    }
}

Then, in this new WrapObject method, we call instantiate a new InterfaceHierarchyCombiner for the typeparam T and use retrieve all the properties and methods from the Interfaces list, rather than just those on T.

eg. Instead of

foreach (var property in typeof(ITest).GetProperties())
{
    // Deal with the properties..

we consider

var interfaces = (new InterfaceHierarchyCombiner(typeof(T))).Interfaces;
foreach (var property in interfaces.SelectMany(i => i.GetProperties()))
{
    // Deal with the properties..

and likewise for the methods.

It's worth noting that there may be multiple methods within the interface hierarchy with the same name and signature. It may be worth keeping track of which properties / methods have had corresponding IL generated but - other than generating more instructions in the loop than strictly necessary - it doesn't do any harm generating duplicate properties or methods (so I haven't worried about it for now).

Com Visibility

What I wanted to do for the wrappers I was implementing was to create classes with the [ComVisible(true)] and [ClassInterface(ClassInterface.None)] attributes. This is achieved by specifying these attributes on the typeBuilder (as seen in the code in the last article):

typeBuilder.SetCustomAttribute(
    new CustomAttributeBuilder(
        typeof(ComVisibleAttribute).GetConstructor(new[] { typeof(bool) }),
        new object[] { true }
    )
);
typeBuilder.SetCustomAttribute(
    new CustomAttributeBuilder(
        typeof(ClassInterfaceAttribute).GetConstructor(new[] { typeof(ClassInterfaceType) }),
        new object[] { ClassInterfaceType.None }
    )
);

Again, easy! Once you know how :)

Example code / jumping to the end

I've not included a complete sample here since it would take up a fairly hefty chunk of space but also because I created a GitHub repository with the final code. This can be found at https://github.com/ProductiveRage/COMInteraction

This includes the work here but also the work I want to address next time; a way to automagically wrap return values where required and a way to change the WrapObject method so that if the same interface is to be applied to multiple objects, only a single call is required (and an object be returned that can wrap any given reference in that interface). The example I put out for this return-value-wrapping was that we want to wrap an object in ITest and have the return value from its Get method also be wrapped if it did not already implement IEmployee:

public interface ITest
{
    IEmployee Get(int id);
}

public interface IEmployee
{
    int Id { get; }
    string Name { get; }
}

For more details of how exactly this will all work, I'll see you back here; same bat-time, same bat-channel! :)

Posted at 21:16

Tags:

Comments

13 December 2011

Dynamically applying interfaces to objects

Another area of this migration proof-of-concept work I'm doing at the day job involves investigating the best way to swap out a load of COM components for C# versions over time. The plan initially is to define interfaces for them and code against those interfaces, write wrapper classes around the components that implement these interfaces and one day rewrite them one-by-one.

Writing the interfaces is valuable since it enables some documentation-through-comments to be generated for each method and property and forces me to look into the idiosyncracies of the various components.

However, writing endless wrappers for the components to "join" them to the interfaces sounded boring! Even if I used the .Net 4.0 "dynamic" keyword it seemed like there'd be a lot of repetition and opportunity for me to mistype a property name and not realise until debugging / writing tests. (Plus I found a problem that prevented me from using "dynamic" with the WSCs I was wrapping - see the bottom of this post for more details).

I figured this is the sort of thing that should be dynamically generatable from the interfaces instead of writing them all by hand - something like how Moq can create generate Mock<ITest> implementations. I did most of this investigation back in Summer, not long after the AutoMapper work I was looking into, and had hoped I'd be able to leverage my sharpened Linq Expression dynamic code generation skills. Alas, it seems that new classes can not be defined in this manner so I had to go deeper..

Reflection.Emit

I was aware that IL could be generated by code at runtime and executed as any other other loaded assembly might be. I'd read (and chuckled at) this article in the past but never taken it any further: Dynamic... But Fast: The Tale of Three Monkeys, A Wolf and the DynamicMethod and ILGenerator Classes

As I tried to find out more information, though, it seemed that a lot of articles would make the point that you could find out how to construct IL by using the IL Disassembler that's part of the .Net SDK: ildasm.exe (located in C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin on my computer). This makes sense because once you start constructing simple classes and examining the generated code in ildasm you can start to get a reasonable idea for how to write the generation code yourself. But it still took me quite a while to get to the point where the following worked!

What I really wanted was something to take, for example:

public interface ITest
{
    int GetValue(string id);
    string Name { get; set; }
}

and wrap an object that had that method and property such that the interface was exposed - eg.

public class TestWrapper : ITest
{
  private object _src;
  public TestWrapper(object src)
  {
    if (src == null)
      throw new ArgumentNullException("src");
    _src = src;
  }

  public int GetValue(string id)
  {
    return _src.GetType().InvokeMember(
      "GetValue",
      BindingFlags.InvokeMethod,
      null,
      _src,
      new object[] { id }
    );
  }

  public string Name
  {
    get
    {
      return _src.GetType().InvokeMember("Name", BindingFlags.GetProperty, null, _src, null)
    }
    set
    {
      _src.GetType().InvokeMember("Name", BindingFlags.SetProperty, null, _src, new object[] { value });
    }
  }
}

It may seem like using reflection will result in there being overhead in the calls but the primary objective was to wrap a load of WSCs in C# interfaces so they could be rewritten later while doing the job for now - so performance wasn't really a massive concern at this point.

Somewhere to work in

The first thing to be aware of is that we can't create new classes in the current assembly, we'll have to create them in a new one. So we start off with

var assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(
  new AssemblyName("DynamicAssembly"), // This is not a magic string, it can be called anything
  AssemblyBuilderAccess.Run
);
var moduleBuilder = assemblyBuilder.DefineDynamicModule(
  assemblyBuilder.GetName().Name,
  false
);

// This NewGuid call is just to get a unique name for the new construct
var typeName = "InterfaceApplier" + Guid.NewGuid().ToString();
var typeBuilder = moduleBuilder.DefineType(
  typeName,
  TypeAttributes.Public
    | TypeAttributes.Class
    | TypeAttributes.AutoClass
    | TypeAttributes.AnsiClass
    | TypeAttributes.BeforeFieldInit
    | TypeAttributes.AutoLayout,
  typeof(object),
  new Type[] { typeof(ITest) }
);

The TypeAttribute values I copied from the ildasm output I examined.

Note that we're specifying ITest as the interface we're implementing by passing it as the "interfaces" parameter to the moduleBuilder's DefineType method.

The Constructor

The constructor is fairly straight forward. The thing that took me longest to wrap my head around was how to form the "if (src == null) throw new ArgumentNullException()" construct. If seems that this is most easily done by declaring a label to jump to if src is not null which allows execution to leap over the point at which an ArgumentNullException will be raised.

// Declare private _src field
var srcField = typeBuilder.DefineField("_src", typeof(object), FieldAttributes.Private);
var ctorBuilder = typeBuilder.DefineConstructor(
  MethodAttributes.Public,
  CallingConventions.Standard,
  new[] { typeof(object) }
);

// Generate: base.ctor()
var ilCtor = ctorBuilder.GetILGenerator();
ilCtor.Emit(OpCodes.Ldarg_0);
ilCtor.Emit(OpCodes.Call, typeBuilder.BaseType.GetConstructor(Type.EmptyTypes));

// Generate: if (src != null), don't throw new ArgumentException("src")
var nonNullSrcArgumentLabel = ilCtor.DefineLabel();
ilCtor.Emit(OpCodes.Ldarg_1);
ilCtor.Emit(OpCodes.Brtrue, nonNullSrcArgumentLabel);
ilCtor.Emit(OpCodes.Ldstr, "src");
ilCtor.Emit(OpCodes.Newobj, typeof(ArgumentNullException).GetConstructor(new[] { typeof(string) }));
ilCtor.Emit(OpCodes.Throw);
ilCtor.MarkLabel(nonNullSrcArgumentLabel);

// Generate: _src = src
ilCtor.Emit(OpCodes.Ldarg_0);
ilCtor.Emit(OpCodes.Ldarg_1);
ilCtor.Emit(OpCodes.Stfld, srcField);

// All done!
ilCtor.Emit(OpCodes.Ret);

Properties

Although there's only a single property in the ITest example we're looking at, we might as look ahead and loop over all properties the interface has so we can apply the same sort of code to other interfaces. Since we are dealing with interfaces, we only need to consider whether a property is gettable, settable or both - there's no public / internal / protected / private / etc.. to worry about. Likewise, we only have to worry about properties and methods - interfaces can't declare fields.

foreach (var property in typeof(ITest).GetProperties())
{
  var methodInfoInvokeMember = typeof(Type).GetMethod(
    "InvokeMember",
    new[]
    {
      typeof(string),
      typeof(BindingFlags),
      typeof(Binder),
      typeof(object),
      typeof(object[])
    }
  );

  // Prepare the property we'll add get and/or set accessors to
  var propBuilder = typeBuilder.DefineProperty(
    property.Name,
    PropertyAttributes.None,
    property.PropertyType,
    Type.EmptyTypes
  );

  // Define get method, if required
  if (property.CanRead)
  {
    var getFuncBuilder = typeBuilder.DefineMethod(
      "get_" + property.Name,
      MethodAttributes.Public
        | MethodAttributes.HideBySig
        | MethodAttributes.NewSlot
        | MethodAttributes.SpecialName
        | MethodAttributes.Virtual
        | MethodAttributes.Final,
      property.PropertyType,
      Type.EmptyTypes
    );

    // Generate:
    //   return _src.GetType().InvokeMember(property.Name, BindingFlags.GetProperty, null, _src, null)
    var ilGetFunc = getFuncBuilder.GetILGenerator();
    ilGetFunc.Emit(OpCodes.Ldarg_0);
    ilGetFunc.Emit(OpCodes.Ldfld, srcField);
    ilGetFunc.Emit(OpCodes.Callvirt, typeof(Type).GetMethod("GetType", Type.EmptyTypes));
    ilGetFunc.Emit(OpCodes.Ldstr, property.Name);
    ilGetFunc.Emit(OpCodes.Ldc_I4, (int)BindingFlags.GetProperty);
    ilGetFunc.Emit(OpCodes.Ldnull);
    ilGetFunc.Emit(OpCodes.Ldarg_0);
    ilGetFunc.Emit(OpCodes.Ldfld, srcField);
    ilGetFunc.Emit(OpCodes.Ldnull);
    ilGetFunc.Emit(OpCodes.Callvirt, methodInfoInvokeMember);
    if (property.PropertyType.IsValueType)
      ilGetFunc.Emit(OpCodes.Unbox_Any, property.PropertyType);
    ilGetFunc.Emit(OpCodes.Ret);
    propBuilder.SetGetMethod(getFuncBuilder);
  }

  // Define set method, if required
  if (property.CanWrite)
  {
    var setFuncBuilder = typeBuilder.DefineMethod(
      "set_" + property.Name,
      MethodAttributes.Public
        | MethodAttributes.HideBySig
        | MethodAttributes.SpecialName
        | MethodAttributes.Virtual,
      null,
      new Type[] { property.PropertyType }
    );
    var valueParameter = setFuncBuilder.DefineParameter(1, ParameterAttributes.None, "value");
    var ilSetFunc = setFuncBuilder.GetILGenerator();

    // Generate:
    //   _src.GetType().InvokeMember(
    //     property.Name, BindingFlags.SetProperty, null, _src, new object[1] { value }
    //   );
    // Note: Need to declare assignment of local array to pass to InvokeMember (argValues)
    var argValues = ilSetFunc.DeclareLocal(typeof(object[]));
    ilSetFunc.Emit(OpCodes.Ldarg_0);
    ilSetFunc.Emit(OpCodes.Ldfld, srcField);
    ilSetFunc.Emit(OpCodes.Callvirt, typeof(Type).GetMethod("GetType", Type.EmptyTypes));
    ilSetFunc.Emit(OpCodes.Ldstr, property.Name);
    ilSetFunc.Emit(OpCodes.Ldc_I4, (int)BindingFlags.SetProperty);
    ilSetFunc.Emit(OpCodes.Ldnull);
    ilSetFunc.Emit(OpCodes.Ldarg_0);
    ilSetFunc.Emit(OpCodes.Ldfld, srcField);
    ilSetFunc.Emit(OpCodes.Ldc_I4_1);
    ilSetFunc.Emit(OpCodes.Newarr, typeof(Object));
    ilSetFunc.Emit(OpCodes.Stloc_0);
    ilSetFunc.Emit(OpCodes.Ldloc_0);
    ilSetFunc.Emit(OpCodes.Ldc_I4_0);
    ilSetFunc.Emit(OpCodes.Ldarg_1);
    if (property.PropertyType.IsValueType)
      ilSetFunc.Emit(OpCodes.Box, property.PropertyType);
    ilSetFunc.Emit(OpCodes.Stelem_Ref);
    ilSetFunc.Emit(OpCodes.Ldloc_0);
    ilSetFunc.Emit(OpCodes.Callvirt, methodInfoInvokeMember);
    ilSetFunc.Emit(OpCodes.Pop);

    ilSetFunc.Emit(OpCodes.Ret);
    propBuilder.SetSetMethod(setFuncBuilder);
  }
}

The gist is that for the getter and/or setter, we have to declare a method and then assign that method to be the GetMethod or SetMethod for the property. The method is named by prefixing the property name with either "get_" or "set_", as is consistent with how C# generates it class properties' IL.

The call to

_src.GetType().InvokeMember(
  property.Name,
  BindingFlags.SetProperty,
  null,
  _src,
  new object[] { value }
);

is a bit painful as we have to declare an array with a single element to pass to the method, where that single element is the "value" reference available within the setter.

Also worthy of note is that when returning a ValueType or setting a ValueType. As we're expecting to either return an object or set an object, the value has to be "boxed" otherwise bad things will happen!

Like the TypeAttributes in the Constructor, the MethodAttributes I've applied here were gleaned from looking at IL generated by Visual Studio.

Methods

We're on the home stretch now! Methods are very similar to the property setters except that we may have zero, one or multiple parameters to handle and we may or may not (if the return type is void) return a value from the method.

foreach (var method in typeof(ITest).GetMethods())
{
  var parameters = method.GetParameters();
  var parameterTypes = new List<Type>();
  foreach (var parameter in parameters)
  {
    if (parameter.IsOut)
      throw new ArgumentException("Output parameters are not supported");
    if (parameter.IsOptional)
      throw new ArgumentException("Optional parameters are not supported");
    if (parameter.ParameterType.IsByRef)
      throw new ArgumentException("Ref parameters are not supported");
    parameterTypes.Add(parameter.ParameterType);
  }
  var funcBuilder = typeBuilder.DefineMethod(
    method.Name,
    MethodAttributes.Public
      | MethodAttributes.HideBySig
      | MethodAttributes.NewSlot
      | MethodAttributes.Virtual
      | MethodAttributes.Final,
    method.ReturnType,
    parameterTypes.ToArray()
  );
  var ilFunc = funcBuilder.GetILGenerator();

  // Generate: object[] args
  var argValues = ilFunc.DeclareLocal(typeof(object[]));

  // Generate: args = new object[x]
  ilFunc.Emit(OpCodes.Ldc_I4, parameters.Length);
  ilFunc.Emit(OpCodes.Newarr, typeof(Object));
  ilFunc.Emit(OpCodes.Stloc_0);
  for (var index = 0; index < parameters.Length; index++)
  {
    // Generate: args[n] = ..;
    var parameter = parameters[index];
    ilFunc.Emit(OpCodes.Ldloc_0);
    ilFunc.Emit(OpCodes.Ldc_I4, index);
    ilFunc.Emit(OpCodes.Ldarg, index + 1);
    if (parameter.ParameterType.IsValueType)
      ilFunc.Emit(OpCodes.Box, parameter.ParameterType);
    ilFunc.Emit(OpCodes.Stelem_Ref);
  }

  var methodInfoInvokeMember = typeof(Type).GetMethod(
    "InvokeMember",
    new[]
    {
      typeof(string),
      typeof(BindingFlags),
      typeof(Binder),
      typeof(object),
      typeof(object[])
    }
  );

  // Generate:
  //   [return] _src.GetType().InvokeMember(method.Name, BindingFlags.InvokeMethod, null, _src, args);
  ilFunc.Emit(OpCodes.Ldarg_0);
  ilFunc.Emit(OpCodes.Ldfld, srcField);
  ilFunc.Emit(OpCodes.Callvirt, typeof(Type).GetMethod("GetType", Type.EmptyTypes));
  ilFunc.Emit(OpCodes.Ldstr, method.Name);
  ilFunc.Emit(OpCodes.Ldc_I4, (int)BindingFlags.InvokeMethod);
  ilFunc.Emit(OpCodes.Ldnull);
  ilFunc.Emit(OpCodes.Ldarg_0);
  ilFunc.Emit(OpCodes.Ldfld, srcField);
  ilFunc.Emit(OpCodes.Ldloc_0);
  ilFunc.Emit(OpCodes.Callvirt, methodInfoInvokeMember);

  if (method.ReturnType.Equals(typeof(void)))
    ilFunc.Emit(OpCodes.Pop);
  else if (method.ReturnType.IsValueType)
    ilFunc.Emit(OpCodes.Unbox_Any, method.ReturnType);

  ilFunc.Emit(OpCodes.Ret);
}

The boxing of ValueTypes when passed as parameters or returned from the method is required, just like the property accessors.

The only real point of interest here is the array generation for the parameters - this also took me a little while to wrap my head around! You may note that I've been a bit lazy and not supported optional, out or ref parameters - I didn't need these for anything I was working on and didn't feel like diving into it at this point. I'm fairly sure that if they become important features then whipping out ildasm and looking at the generated code there will reveal the best way to proceed with these.

The payoff!

Now that we've defined everything about the class, we can instantiate it!

var wrapper = (ITest)Activator.CreateInstance(
  typeBuilder.CreateType(),
  src
);

This gives us back an instance of a magic new class that wraps a specified "src" and passes through the ITest properties and methods! If it's applied to an object that doesn't have the required properties and methods then exceptions will be thrown when they are called - the standard exceptions that reflection calls to invalid properties/methods would result in.

But I think that's pretty much enough for this installment - it's been a bit dry but I think it's been worthwhile!

And then..

There are a lot of extension points that naturally arise from this rough-from-the-edges code - the first things that spring to my mind are a way to wrap this up nicely into a generic class that could create wrappers for any given interface, a way to handle interface inheritance, a way to possibly wrap the returned values - eg. if we have

public interface ITest
{
  IEmployee Get(int id);
}

public interface IEmployee
{
  int Id { get; }
  string Name { get; }
}

can the method that applies ITest to an object also apply IEmployee to the value returned from ITest.Get if that value itself doesn't already implement IEmployee??

Finally, off the top of my head, if I'm using these generated classes to read interact with WSCs / COM components, am I going to need to pass references over to COM components? If so, I'm going to have to find a way to flag them as ComVisible.

But these are issues to address another time :)

Footnote: WSC properties and "dynamic"

The code here doesn't use the .Net 4.0 "dynamic" keyword and so will compile under .Net 2.0. I had a bit of a poke around in the IL generated that makes use of dynamic since in some situations it should offer benefits - often performance is one such benefit! However, in the particularly niche scenario I'm working with it refuses to work :( Most of the components I'm wrapping are legacy VBScript WSCs and trying to set properties on these seems to fail when using dynamic.

I've pulled this example from a test (in xUnit) I wrote to illustrate that there were issues ..

[Fact]
public void SettingCachePropertyThrowsRuntimeBinderException()
{
  // The only way to demonstrate properly is unfortunately by loading an actual wsc - if we used a
  // standard .Net class as the source then it would work fine
  var src = Microsoft.VisualBasic.Interaction.GetObject(
    Path.Combine(
      new FileInfo(this.GetType().Assembly.FullName).DirectoryName,
      "TestSrc.wsc"
    )
  );
  var srcWithInterface = new ControlInterfaceApplierUsingDynamic(src);

  // Expect a RuntimeBinderException with message "'System.__ComObject' does not contain a definition
  // for 'Cache'"
  Assert.Throws<RuntimeBinderException>(() =>
  {
    srcWithInterface.Cache = new NullCache();
  });
}

public class ControlInterfaceApplierUsingDynamic : IControl
{
  private dynamic _src;
  public ControlInterfaceApplierUsingDynamic(object src)
  {
    if (src == null)
      throw new ArgumentNullException("src");

    _src = src;
  }

  public ICache Cache
  {
    set
    {
      _src.Cache = value;
    }
  }
}

[ComVisible(true)]
private class NullCache : ICache
{
  public object this[string key] { get { return null; } }
  public void Add(string key, object value, int cacheDurationInSeconds) { }
  public bool Exists(string key) { return false; }
  public void Remove(string key) { }
}

The WSC content is as follows:

<?xml version="1.0" ?>
<?component error="false" debug="false" ?>
<package>
  <component id="TestSrc">
    <registration progid="TestSrc" description="Test Control" version="1" />
    <public>
      <property name="Config" />
    </public>
    <script language="VBScript">
    <![CDATA[
      Public Cache
    ]]>
    </script>
  </component>
</package>

I've not been able to get to the bottom of why this fails and it's not exactly a common problem - most people left WSCs back in the depths of time.. along with VBScript! But for the meantime we're stuck with them since the thought of trying to migrate the entire codebase fills me with dread, at least splitting it this way means we can move over piecemeal and re-write isolated components of the code at a time. And then one day the old cruft will have gone! And people will consider the earlier migration code the new "old cruft" and the great cycle can continue!

Update (2nd May 2014): It turns out that if

var src = Microsoft.VisualBasic.Interaction.GetObject(
  Path.Combine(
    new FileInfo(this.GetType().Assembly.FullName).DirectoryName,
    "TestSrc.wsc"
  )
);

is altered to read

var src = Microsoft.VisualBasic.Interaction.GetObject(
  "script:"
  Path.Combine(
    new FileInfo(this.GetType().Assembly.FullName).DirectoryName,
    "TestSrc.wsc"
  )
);

then this test will pass! I'll leave the content here for posterity but it struck me while flipping through this old post that I had seen and addressed this problem since writing this.

Posted at 22:01

Tags:

Comments

13 July 2011

AutoMapper-By-Constructor without AutoMapper.. and faster

I've been wanting to see if I can improve the performance of the by-constructor type converter I wrote about (here). The plan is to implement Property Getters that can retrieve the property values - translated, if required - from a source object using LINQ Expressions. Then to push these through a ConstructorInfo call using more LINQ Expressions such that a single expression can be constructed that converts from source to destination types at the same speed that hand-rolled code would. In a lot of cases, this could be merely academic but if 1000s of instances are being converted together, then the overhead of AutoMapper could make a signficant difference.

So I want to expand

public interface IPropertyGetter
{
    Type SrcType { get; }
    PropertyInfo Property { get; }
    Type TargetType { get; }
    object GetValue(object src);
}

with

public interface ICompilablePropertyGetter : IPropertyGetter
{
    Expression GetPropertyGetterExpression(Expression param);
}

and to expand

public interface ITypeConverterByConstructor<TSource, TDest>
{
    ConstructorInfo Constructor { get; }
    TDest Convert(TSource src);
}

with

public interface ICompilableTypeConverterByConstructor<TSource, TDest>
    : ITypeConverterByConstructor<TSource, TDest>
{
    Expression GetTypeConverterExpression(Expression param);
}

Compilable Property Getter

Turns it out this was quite easy to implement if you know how.. but quite difficult to find examples out there if you don't! One of the things I like about LINQ Expressions code is that when you read it back it scans quite well and kinda makes sense. However, I'm still really not that experienced with it and when I want to try something new it takes me quite a while to get to grips with how I need to form the code.

The first property getter I've got will retrieve the value of a property from a specified source type TSourceObject, where the property value is of type TPropertyAsRetrieved. TPropertyAsRetrieved in this case must be assignable-to from the type of the property on TSourceObject. So TPropertyAsRetrieved could be a string IEnumerable if the property on TSourceObject was a string array, for example (as IEnumerable<string> is assignable-to from string[]).

public class CompilableAssignableTypesPropertyGetter<TSourceObject, TPropertyAsRetrieved>
    : AbstractGenericCompilablePropertyGetter<TSourceObject, TPropertyAsRetrieved>
{
    private PropertyInfo _propertyInfo;
    public CompilableAssignableTypesPropertyGetter(PropertyInfo propertyInfo)
    {
        if (propertyInfo == null)
            throw new ArgumentNullException("propertyInfo");
        if (!propertyInfo.DeclaringType.Equals(typeof(TSourceObject)))
            throw new ArgumentException("Invalid propertyInfo - DeclaringType must match TSourceObject");

        _propertyInfo = propertyInfo;
    }

    public override PropertyInfo Property
    {
        get { return _propertyInfo; }
    }

    public override Expression GetPropertyGetterExpression(Expression param)
    {
        if (param == null)
            throw new ArgumentNullException("param");
        if (!typeof(TSourceObject).IsAssignableFrom(param.Type))
            throw new ArgumentException("param.Type must be assignable to typeparam TSourceObject");

        // Prepare to grab the property value from the source object directly
        Expression getter = Expression.Property(
            param,
            _propertyInfo
        );

        // Try to convert types if not directly assignable (eg. this covers some common enum type conversions)
        var targetType = typeof(TPropertyAsRetrieved);
        if (!targetType.IsAssignableFrom(_propertyInfo.PropertyType))
            getter = Expression.Convert(getter, targetType);

        // Perform boxing, if required (eg. when enum being handled and TargetType is object)
        if (!targetType.IsValueType && _propertyInfo.PropertyType.IsValueType)
            getter = Expression.TypeAs(getter, typeof(object));

        return getter;
    }
}

In order to keep the interesting compilable getter code separate from the boring stuff which implements the rest of IPropertyGetter, I've used a base class AbstractGenericCompilablePropertyGetter -

public abstract class AbstractGenericCompilablePropertyGetter<TSourceObject, TPropertyAsRetrieved>
    : ICompilablePropertyGetter
{
    private Lazy<Func<TSourceObject, TPropertyAsRetrieved>> _getter;
    public AbstractGenericCompilablePropertyGetter()
    {
        _getter = new Lazy<Func<TSourceObject, TPropertyAsRetrieved>>(generateGetter, true);
    }

    public Type SrcType
    {
        get { return typeof(TSourceObject); }
    }

    public abstract PropertyInfo Property { get; }

    public Type TargetType
    {
        get { return typeof(TPropertyAsRetrieved); }
    }

    object IPropertyGetter.GetValue(object src)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        if (!src.GetType().Equals(typeof(TSourceObject)))
            throw new ArgumentException("The type of src must match typeparam TSourceObject");
        return GetValue((TSourceObject)src);
    }

    public TPropertyAsRetrieved GetValue(TSourceObject src)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        return _getter.Value(src);
    }

    public abstract Expression GetPropertyGetterExpression(Expression param);

    private Func<TSourceObject, TPropertyAsRetrieved> generateGetter()
    {
        var param = Expression.Parameter(typeof(TSourceObject), "src");
        return Expression.Lambda<Func<TSourceObject, TPropertyAsRetrieved>>(
            GetPropertyGetterExpression(param),
            param
        ).Compile();
    }
}

Compilable Type-Converter-By-Constructor

The general concept for this is straight-forward; a CompilableTypeConverterByConstructor<TSource, TDest> class will take a set of compilable property getters and a ConstructorInfo reference (that is used to instantiates instances of TDest and that takes the same number of parameters are there are property getters specified). The compilable type converter generates a LINQ Expression to perform the translation from TSource to TDest, given a ParameterExpression for the source object -

public Expression GetTypeConverterExpression(Expression param)
{
    if (param == null)
        throw new ArgumentNullException("param");
    if (!typeof(TSource).IsAssignableFrom(param.Type))
        throw new ArgumentException("param.Type must be assignable to typeparam TSource");

    // Instantiate expressions for each constructor parameter by using each of the
    // property getters against the source value
    var constructorParameterExpressions = new List<Expression>();
    foreach (var constructorParameter in _constructor.GetParameters())
    {
        var index = constructorParameterExpressions.Count;
        constructorParameterExpressions.Add(
            _propertyGetters[index].GetPropertyGetterExpression(param)
        );
    }

    // Return an expression that to instantiate a new TDest by using property getters
    // as constructor arguments
    return Expression.Condition(
        Expression.Equal(
            param,
            Expression.Constant(null)
        ),
        Expression.Constant(default(TDest), typeof(TDest)),
        Expression.New(
            _constructor,
            constructorParameterExpressions.ToArray()
        )
    );
}

There's some handling in there to return default(TDest) if a null source reference is passed in but there are no other particular areas of note.

Limitations

There's a lot more work to be done down this avenue, since currently there's only Compilable Property Getters for Assignable Types (where no real conversion is happening) and Enums (where lookups from the source values to destination values are attempted by name before falling back to a straight numeric mapping). The code as described here is available in this tagged release:

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/tree/LinqExpressionPropertyGetters

However, there's more on the way! I want to be able to take these simple compilable classes and use them to create more complicated type converters, so that once we have a compilable converter from:

public class SourceRole
{
    public string Description { get; set; }
}

public class DestRole
{
    public DestRole(string description)
    {
        Description = description;
    }
    public string Description { get; private set; }
}

we could leverage it translate

public class SourceEmployee
{
    public string Name { get; set; }
    public SourceRole Role { get; set; }
}

public class DestEmployee
{
    public DestEmployee(string name, DestRole role)
    {
        Name = name;
        Roles = roles;
    }
    public string Name { get; private set; }
    public DestRole Role { get; private set; }
}

or:

public class SourceRole
{
    public string Description { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
}

public class SourceEmployee
{
    public string Name { get; set; }
    public IEnumerable<SourceRole> Roles { get; set; }
}

public class DestRole
{
    public DestRole(string description, DateTime startDate, DateTime endDate)
    {
        Description = description;
        StartDate = startDate;
        EndDate = endDate
    }
    public string Description { get; private set; }
    public DateTime StartDate { get; private set; }
    public DateTime EndDate { get; private set; }
}

public class DestEmployee
{
    public DestEmployee(string name, IEnumerable<DestRole> roles)
    {
        Name = name;
        Roles = roles;
    }
    public string Name { get; private set; }
    public IEnumerable<DestRole> Roles { get; private set; }
}

.. something similar to the way in which AutoMapper's CreateMap method works.

Update (2nd January 2012)

I've finally got round to writing up this conclusion; here.

Posted at 20:11

Tags:

Comments

25 April 2011

Teaching AutoMapper about "verbose constructors"

As I alluded to in an earlier post (The joys of AutoMapper), I've been wanting to look into a way to get AutoMapper to work with these once-instantiated / always-valid / verbose-constructor classes I'm such a fan of. As I'd hoped, it's actually not that big of a deal and I've put together a demo project:

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1

There's an example in that download (and at the bottom of this post) if curiosity gets the better of you but I'm going to step through an outline of the solution here.

Before we get going, it's worth noting that I'm hoping to expand on this solution and improve it in a number of areas - to make life easier if you're starting with this post, I've tagged the repository as "FirstImplementation" in its current state, so for the solution in its current form (as I'm about to describe), it may be best to download it from here:

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/tree/FirstImplementation

The Plan

Take two Types - srcType and destType - and consider every constructor in destType..
For each argument in the constructor, try to find a property in srcType that can be used as the value for that argument

That property must meet name-matching criteria
Its value must be mappable to the constructor argument's type

If multiple destType constructors can be called using srcType's data, the most appropriate one must be selected
The ConstructorInfo reference and a list of "Property Getters" are handed off to a "Type Converter" class that now has all of the necessary information to create a new instance of destType given a srcType reference

A "Property Getter" is an object that can retrieve the value of a specified PropertyInfo from a srcType instance and cast that value to a particular type (ie. the type that the destType constructor argument requires)

This "Type Converter" will expose a Convert method that accepts a srcType reference and returns a new destType instance - we can pass this as a Func<srcType, destType> to an AutoMapper ConstructUsing method call and we're all done!

The Players

There's a class that tries to locate a property on srcType which can be used as a particular constructor argument:

public interface IPropertyGetterFactory
{
    IPropertyGetter Get(Type srcType, string propertyName, Type destPropertyType);
}

The IPropertyGetterFactory implementation will apply the name-matching criteria - it will compare "propertyName" to the actual names of properties on srcType - so it will have access to:

public interface INameMatcher
{
    bool IsMatch(string from, string to);
}

If the IPropertyGetterFactory manages to find a property name / type match it return an IPropertyGetter:

public interface IPropertyGetter
{
    Type SrcType { get; }
    PropertyInfo Property { get; }
    Type TargetType { get; }
    object GetValue(object src);
}

We have a class which considers all of the constructors of destType and tries to match up their argument names to srcType properties using an IPropertyGetterFactory:

public interface ITypeConverterByConstructorFactory
{
    ITypeConverterByConstructor<TSource, TDest> Get<TSource, TDest>();
}

If ITypeConverterByConstructorFactory is able to find destType constructors whose arguments can be fully populated by srcType data, it returns:

public interface ITypeConverterByConstructor<TSource, TDest>
{
    TDest Convert(TSource src);
    ConstructorInfo Constructor { get; }
    IEnumerable<PropertyInfo> SrcProperties { get; }
}

The ITypeConverterByConstructor may make use of an IConstructorInvoker implementation which handles the passing of the arguments to the constructor to create the new destType instance.

public interface IConstructorInvokerFactory
{
    IConstructorInvoker<T> Get<T>(ConstructorInfo constructor);
}

public interface IConstructorInvoker<TDest>
{
    TDest Invoke(object[] args);
}

For the cases where multiple destType constructors where available, a way to decide which is best is required (in most cases, we'll probably be interested in the constructor which has the most arguments, but there might be special cases):

public interface ITypeConverterPrioritiserFactory
{
    ITypeConverterPrioritiser<TSource, TDest> Get<TSource, TDest>();
}

public interface ITypeConverterPrioritiser<TSource, TDest>
{
    ITypeConverterByConstructor<TSource, TDest> Get(IEnumerable<ITypeConverterByConstructor<TSource, TDest>> options);
}

Some of the key elements - ITypeConverterByConstructor, IConstructorInvoker, ITypeConverterPrioritiser - have generic typeparams specified but the ITypeConverterByConstructorFactory that prepares the ITypeConverterByConstructor does not; I wanted to be able to use one ITypeConverterByConstructorFactory instance to prepare converters for various combinations of srcType, destType. This is why these key elements have factory interfaces to instantiate them - the factory class will have no typeparam specification but will create "worker" classes that do. IPropertyGetter is an exception to this pattern as I was expecting to have to have to maintain a list of them in each ITypeConverterByConstructor and so they would have to at least share a interface without typeparams.

The Plan - Re-written

Pass srcType and destType to an ITypeConverterByConstructorFactory and call Get, this will consider each of destType's constructors and determine which can have its arguments specified with data from srcType's properties by..
Calling Get on an IPropertyGetterFactory for each argument name and type, passing the srcType

IPropertyGetterFactory will use an INameMatcher to compare property names to the specified argument name
IPropertyGetterFactory will use its own judgement to determine whether the property mapping is valid (eg. the AutoMapperEnabledPropertyGetterFactory will allow properties whose type can be converted as required by AutoMapper)

The ITypeConverterByConstructorFactory may now have multiple ITypeConverterByConstructor instances (each will represent a ConstructorInfo and contain IPropertyGetters to retrieve data from the srcType to satisfy all of the constructor's arguments), it will use an ITypeConverterPrioritiser to pick the best one
This ITypeConverterByConstructor<srcType, destType> has a method Convert which returns a new destType instance given a srcType reference - success! This allows us to hook up AutoMapper with CreateMap and ConstructUsing.

The implementation

These interfaces and corresponding classes can all be found in the GitHub repository and hopefully it will make a reasonable amount of sense now that everything's been outlined here. With a basic knowledge of reflection and AutoMapper hopefully the code won't be too difficult to read through and there are examples both in the solution itself and in the Readme.

Again, there is a repository branch that only covers what's discussed here and not all the following work I'm planning for it:

https://github.com/ProductiveRage/AutoMapper-By-Constructor-1/tree/FirstImplementation

And now?

I'm happy I've solved the initial case I set out to, but it seems now like AutoMapper needn't be as key as I was first envisaging! For cases where the types don't all match up into nice assignable-to conversions, AutoMapper definitely comes in handy - but one class of cases I'd like to use this for would be converting from (asmx) webservice interface objects (where all properties have loose getters and setters) to a validated-by-constructor class. Most of the time the property types would match and wouldn't need AutoMapper. And then maybe the conversion could be compiled using IL generation or Linq Expressions so that it would be as fast as hand-written code, just without the opportunity for typos.. Intriguing!

Example

// Get a no-frills, run-of-the-mill AutoMapper Configuration reference..
var mapperConfig = new Configuration(
    new TypeMapFactory(),
    AutoMapper.Mappers.MapperRegistry.AllMappers()
);
mapperConfig.SourceMemberNamingConvention = new LowerUnderscoreNamingConvention();

// .. teach it the SourceType.Sub1 to DestType.Sub1 mapping (unfortunately AutoMapper can't
// magically handle nested types)
mapperConfig.CreateMap<SourceType.Sub1, ConstructorDestType.Sub1>();

// If the translatorFactory is unable to find any constructors it can use for the conversion,
// the translatorFactory.Get method will return null
var translatorFactory = new SimpleTypeConverterByConstructorFactory(
    new ArgsLengthTypeConverterPrioritiserFactory(),
    new SimpleConstructorInvokerFactory(),
    new AutoMapperEnabledPropertyGetterFactory(
        new CaseInsensitiveSkipUnderscoreNameMatcher(),
        mapperConfig
    )
);
var translator = translatorFactory.Get<SourceType, ConstructorDestType>();
if (translator == null)
    throw new Exception("Unable to obtain a mapping");

// Make our translation available to the AutoMapper configuration
mapperConfig.CreateMap<SourceType, ConstructorDestType>().ConstructUsing(translator.Convert);

// Let AutoMapper do its thing!
var dest = (new MappingEngine(mapperConfig)).Map<SourceType, ConstructorDestType>(
    new SourceType()
    {
        Value = new SourceType.Sub1() { Name = "Test1" },
        ValueList = new[]
        {
            new SourceType.Sub1() { Name = "Test2" },
            new SourceType.Sub1() { Name = "Test3" }
        },
        ValueEnum = SourceType.Sub2.EnumValue2
    }
);

public class SourceType
{
    public Sub1 Value { get; set; }
    public IEnumerable<Sub1> ValueList { get; set; }
    public Sub2 ValueEnum { get; set; }

    public class Sub1
    {
        public string Name { get; set; }
    }

    public enum Sub2
    {
        EnumValue1,
        EnumValue2,
        EnumValue3
    }
}

public class ConstructorDestType
{
    private Sub1 _value;
    private IEnumerable<Sub1> _valueList;
    private Sub2 _valueEnum;
    public ConstructorDestType(Sub1 value, IEnumerable<Sub1> valueList, Sub2 valueEnum)
    {
        if (value == null)
            throw new ArgumentNullException("value");
        if (valueList == null)
            throw new ArgumentNullException("valueList");
        if (!Enum.IsDefined(typeof(Sub2), valueEnum))
            throw new ArgumentOutOfRangeException("valueEnum");
        _value = value;
        _valueList = valueList;
        _valueEnum = valueEnum;
    }

    public Sub1 Value { get { return _value; } }
    public IEnumerable<Sub1> ValueList { get { return _valueList; } }
    public Sub2 ValueEnum { get { return _valueEnum; } }

    public class Sub1
    {
        public string Name { get; set; }
    }

    public enum Sub2
    {
        EnumValue1,
        EnumValue_2,
        EnumValue3
    }
}

Posted at 17:58

Tags:

Comments

You may also be interested in (see here for information about how these are generated):

You may also be interested in:

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

You may also be interested in (see here for information about how these are generated):

About

Recent Posts

Highlights

Archives