reublog - Code Generation on .NET

This is the first part in what's hopefully a series of short posts covering code generation on the .NET platform.

Almost every .NET application relies on code generation in some form, usually because they rely on a library which generates code as a part of how it functions. Eg, Json.NET leverages code generation and so does ASP.NET, Entity Framework, Orleans, most serialization libraries, many dependency injection libraries, and probably every test mocking library.

Let's skip past why code generation is useful and jump straight into a high level overview of code generation technologies for .NET.

Kinds of Code Generation

The 3 code gen methods for .NET which we'll discuss are: Expression Trees, IL Generation, and Syntax Generation. There are other methods, such as text templating (eg using T4). Here are the pros and cons of each as I see them.

Expression Trees

Using LINQ Expression Trees to compile expressions at runtime.

Easy to use, expressive API.
Allows level of access to private members.
Not fully supported on AOT-only platforms like iOS. Expression Trees will be interpreted instead of compiled.
Not all language constructs are supported.

IL Generation

Using Reflection.Emit to dynamically create types and methods using Common Intermediate Langage (known as CIL or just IL), which is the assembly language of the CLR.

Can produce code which cannot be expressed in C#, eg access private members of some type.
Very verbose.
Very difficult to debug: Visual Studio will not show you IL for dynamic methods, they are represented as an opaque Lightweight Function entry in the stack trace view.
Difficult to implement higher level features like C#'s async/await.
Not supported on AOT-only platforms like iOS.

Syntax Generation

Using Roslyn or some other API to generate C# syntax trees or source code and compile it either at runtime or when the target project is built.

Easy access to all C# language features.
Supported on AOT-only platforms, since output is C# code which can be compiled.
Well supported: it's the C# compiler, it's not going away or being deprecated any time soon.
API isn't always obvious since the primary purpose of the API is parsing/compiling code rather than generating it.
Note: to support runtime code generationn you need to include Roslyn with your app, which can add around 6MB to your distribution.

Orleans

Microsoft Orleans uses the latter two approaches: IL and Roslyn. It uses Roslyn wherever possible, since it allows for easy access to C# language features like async and since it's easy to comprehend both the code generator and the generated code. Otherwise, IL generation is used for two things:

Generating code at runtime. For example ILSerializerGenerator generates serializers as a last resort for types which C# serializers couldn't be generated for (for example, private inner classes). It's a faster and less restricted alternative to .NET's BinaryFormatter.
Producing code which cannot be expressed in C#. For example, FieldUtils provides access to private fields and methods for serialization.

General Strategy

Regardless of which technology a library makes use of, code generation typically involves two phases:

Metadata Collection
- The code generator takes some input and creates an abstract representation of it in order to drive the code synthesis process.
- Eg, a library for deeply cloning objects might take a Type as input and generate an object describing each field in that type.
Code Synthesis
- The code generator uses the metadata model to drive the process of actually generating code (LINQ expressions, IL instructions, syntax tree nodes).
- Eg, our deep cloning library will generate a method which takes an object of the specified type from the metadata model and then recursively copy each of the fields.

The two phases can be merged for simple code generators. Orleans uses two phases. In phase 1, the input assembly is scanned and metadata is collected for types matching various criteria: Grain classes, Grain interfaces, serializable types, and custom serializer registrations. In phase 2, support classes are generated. For example, each grain interface has two classes generated: an RPC proxy and an RPC stub.

Conclusion

That's enough for now. Maybe next time we'll take a look at writing that hypothetical deep cloning library using IL generation. After that, we can take a look at a serialization library I've been working on which uses Roslyn for both metadata collection and syntax generation. If either of those things are interesting to you, let me know here or on Twitter.

Next Post: .NET IL Generation - Writing DeepCopy