Tuesday, May 22, 2012

Expression Trees - Part I

Expression trees are the representation of code as data structure instead of executable code. We have used expression trees for the discussion regarding INotifyPropertyChanged but never got a chance to discuss the concept itself.

The feature was introduced in .net framework to support meta-programming in .net framework i.e. to be able to treat code as data. The feature enables us to analyze the code at runtime. It also allows the run-time creation of code. In INotifyPropertyChanged example, we used Expression trees to generate new code based on the expression passed as argument. This involved code analysis and generation which was only possible by the use of expression trees.

Expression tree works by creating a semantic model of code. This semantic model can be used by compilation tools to generate runnable code.

What does it enable me?
Expression trees enables various feature both in the framework and to the developers. It can be used for:
  1. Dynamic modification of executable code.
  2. The execution of LINQ query.
  3. Creation of dynamic queries.
Expression Trees In .Net Framework
The expression tree types were introduced in .net 3.5. They are further enhanced in .net 4.0. The types can be found in System.Linq.Expression namespace in System.Core assembly. The class hierarchy of the types can be presented as follows:

Amongst the class in the hierarchy, there are three sealed classes:
  1. BlockExpression
  2. GotoExpression
  3. Expression<TDelegate>
How to create Expression Trees?
Expression trees can be created using the following:
  1. Using a lambda expression
  2. Manual creation using the available API
We will be discussing both of them in this post.

Assignment of Lambda Expression
A lambda expression can be assigned to an expression tree or a delegate. It cannot be assigned to an implicitly typed variable. Basically C# compiler emits an expression tree or delegate based on the destination variable type. If we are using an implicitly typed variable then it would not be possible for the compiler to decide and hence the error.

We can also base our expression tree off of Action delegate if the expression is not returning any data. In the following example we are just creating an Expression Tree which just prints a string on the console. It doesn't return any data so we are using Action delegate here. We compile the expression into delegate in the same way as we did before. Now we can use it like a regular delegate.

When a lambda expression / anonymous function is assigned to a delegate it creates a reference for the executable code which might be used to call the executable function. On the other hand, assigning to expression tree creates a representation of the code of the lambda expression (anonymous function). This can not be executed directly but it may be used to find out how the executable code would look like. We can also tweak the code by playing with the nodes of expression tree. We can also execute other code based on the nodes in expression tree. That is exactly what we did with the example referred in first paragraph. In the example code, we determine the property (member expression) in the expression tree and then we raise PropertyChanged event for the said property. Doing this relieved us from using the magic string in the setters of our models and view models.

Conversion between Expression Tree & Delegate
An expression tree can be converted into a delegate by compiling it. It must be remembered that the reverse is not true i.e. a delegate can not be converted to an expression tree.

If a conversion exists from an anonymous function to a delegate type D, a conversion also exists to the expression tree type Expression<D>. Based on the same principle, the compiler converts the lambda expression, passed as argument, to the type of parameter. The parameter type might be either ExpressionTree or Delegate.

Building Expression Trees:
Expression trees are a lot easier to build if we think bottom-up. We can start building from leaf and work our way towards the top of the tree. It is a lot easier if we make a rough sketch of how the tree structure would be and then build the expression tree.

Let's first build a tree to represent the expression in the above algorithm.

If we were using lambda expression, then we can easily create a lambda to assign to a variable of Expression type. We can also use this to create a delegate. We can also use the delegate to generate some result.

Here the lambda expression returns and integer based on the evaluation of an expression. The lambda expression is assigned to an Expression type of variable. This is then compiled into a delegate as described above. The result is computed by the delegate and result is returned. This is rather confusing and not recommended use of expression tree. Why in the world we created the expression when we just needed an executable code. We could have rather assigned the lambda expression to a Func delegate and use that for computing. Or we could just return the result of Lambda expression and the framework would take care of creating an implicit anonymous function for the lambda expression. We would certainly never use this for production code. But in order to understand the feature, we would continue to use this to check the correctness of our expressions.

How to introduce variables:
More than usual we need expressions involving variables. In Expression Trees, variables are specified using Parameter Expressions. We have two options to declare parameter expressions.
  1. We can instantiate ParameterExpression directly.
  2. We can use Expression's factory method i.e. Expression.Variable to get the instance of parameter expression.
I prefer using the 2nd option and that seems to be a recommendation in documentation as well.

We can also use Expression.Parameter to create a ParameterExpression. In the following example, above example is recreated just changing Expression.Variable to Expression.Parameter.

This seems similar to the previous example except that we introduced Expression.Parameter instead of Expression.Variable.

Debugging Expression Trees
Visual Studio has little support of debugging expression tree. We can see how expression tree structure looks like while debugging. Just hover over an expression tree variable and use TextVisualizer to see DebugView.

The debug view shows the expression tree in a particular format. Different Node types are represented differently. The details of the format might be found here [http://msdn.microsoft.com/en-us/library/ee725345%28VS.100%29.aspx]

We can also use Html Visualizer to see the expression tree in the same format as the Text Visualizer.

Instead of hovering over the expression variable, we can use quick watch to see the same expression tree.

We can also use Immediate Window to see the same result.

Expression Trees & Dynamic Language Runtime
.Net CLR makes use of Dynamic Language Runtime [DLR] to provide support to dynamic languages or dynamic features of a language. The runtime uses expression trees for the dynamic code. It passes the expression tree to the appropriate runtime binder. The runtime binder selection would depend on the dynamic code including C# dynamic binder, Iron Python dynamic binder or for legacy COM Objects.

The runtime binder maps the request to the target's object call structure.

While binding, it might not find the expected properties / operations on the target object which might result in some exceptions. Now you realize the Microsoft.CSharp assembly reference to your .net 4.0 projects.

The use of expression trees by DLR happens on backend by the runtime. All you need to care about is the exception which could result.

It must be remembered that lambda statements cannot be used to build expression tree. These are lambdas specified in terms of code blocks [in c# they are defined within the braces {lambda_stamement}] which might return some values.

No comments: