1. Table of Contents
- Table of Contents
- Introduction
- An introduction to IL
- Under the hood of everyday .NET code
- 4.1. The case of Properties
- 4.2. The case of With
- 4.3. The case of For Each
- 4.4. The case of Lambda Expressions
- 4.4.1. An easy level example
- 4.4.2. A medium level example
- 4.4.3. A hard level example
- 4.4.4. An insane level example
- 4.5. The case of Anonymous Types
- 4.6. The case of Cases
- 4.7. The case of Iterators
- Emitting IL using VB or C#
- Generating IL using Expression Trees
- The curious case of F#
- Afterword
2. Introduction
Well over a year ago, when I was still pretty new to .NET and programming in general, I witnessed a lecture by Bart de Smet titled "Behind the Scenes of 10 C# Language Features". I was unfamiliar with most features, I was unfamiliar with C# (I'm a VB programmer by trade), I never even heard of Intermediate Language (which was abundant!) and even for seasoned C# veterans this lecture was tough. Needless to say I had a great time! Even though I hardly got anything Bart said that day it did inspire me. It inspired me to look beyond the code I was writing and it inspired me to write this article. And when I looked back at the lecture on Channel9 just last week I understood what Bart was saying.
Here is a link to the video of that lecture: Bart de Smet - Behind the Scenes of 10 C# Language Features
Throughout this article I will make references to this video when I think it could help to better understand the topic I am discussing. Needless to say I encourage you to watch the entire video although it is not necessary to understand this article. Is this article the written version of the video? Most certainly not! Although the video and this article have some overlap I hope most of it is complimentary to the other.
Another source of inspiration and knowledge is a book I recently reviewed for Manning Publications, calledMetaprogramming in .NET by Kevin Hazzard and Jason Bock. Unfortunately the book has not been published yet, but will be during fall this year. The introductory chapter is available for free, so I recommend you read it. Once again, reading this book is not necessary for understanding this article, but I might tell you to read a certain chapter when I think it could help to better understand the topic I am discussing (so I won't have to edit my article once the book is released).
So what IS necessary to understand this article? For starters a bit of persistency and good will. This article is BIG and I realize that. On top of that the topics discussed in this article are not easy, but I'll make sure you'll get there. A little understanding of Intermediate Language comes in handy. If you never heard of Intermediate Language (or IL) or you do know it and think it is really very scary (which I would totally understand), don't turn away yet, I'll explain about it in a bit. Furthermore we'll see some .NET constructs you may or may not yet be familiar with, such as Auto-Properties, Anonymous Types, Lambda Expressions and Iterator Methods. Again, don't sweat it. It sounds harder than it is.
I would advice you to not read this article all at once. Take a break every now and then, put it in your bookmarks and read on tomorrow evening. Let the new found knowledge settle into your brain before continuing. I wish you lots of pleasure reading this. So are we ready? Let's go!
3. An introduction to IL
So as I said we will look at IL (Intermediate Language). What is this IL? You could say sweet dreams are made of these! All the .NET code you will ever write is compiled into IL. This means the codes you write in .NET (C#, VB, F#...) are transformed into this language. This IL is then turned into machine code so your software does what it does. Did you click that link there? Notice that IL refers to a more broad concept that is not unique to .NET? That is because when I say IL I actually mean CIL or MSIL! So it's actually this Common IL or Microsoft IL that we are going to look at. So why are we looking at this IL? We can see what happens from looking at our VB or C# code, right? Sure, but what is REALLY going on is only visible on IL level (or even Assembler level, but let's not go there). For example, For Each (VB) or foreach (C#) cannot be literally turned into IL. What really happens are some function calls on the IEnumerable and IEnumerator Interfaces
as we will see in an example later in this article.
Before we look at an example of IL let me tell you why you would want to learn IL. First of all, learning a new language can be good fun and any new language you learn will make a consecutive language easier to learn. Second, IL is not like any higher level language such as VB or C#. Learning how things are handled in other languages makes you think on how you code in your own language. Whether you use this new found knowledge or not is up to you. At least you know the alternatives. Another good reason to learn IL specifically is because it gives you a better understanding of how .NET works under the hood. Whether you value such knowledge or not is up to you, but at least you can brag about it to your collegues. A more practical reason to learn IL is because you can write, compile and execute your own IL at runtime using Reflection.Emit. Doing this might be useful because using IL you can use language constructs that are not available in VB or C#. As a bonus Reflection.Emit
is faster than any other dynamic code generation you will find. We will see an example of this at the end of this article. I hear you think you never needed this before, why will you need this now? The truth is that you probably won't, but it is good to know the options are open to you.
So you must now be eager to see some IL! Let's first look at a simple Hello World example (yes, really). Open up Visual Studio and create a new Console Application, either in VB or C#. Paste the following code into the (parameterless) Main
method.
Whether you can see it or not, but IL was emitted when you built this. You can look at the IL of an assembly by using Intermediate Language Disassembler or IL DASM for short. It is included in the Microsoft SDK's, if you have Visual Studio 2010 installed you should have no trouble finding it (you can simply use the 'Find' tool to search for ILDASM). In case you can't find it, you can download an older version of ILDASM right here. So let's start it up and you should get a window looking something like this:
Now try opening the Console Application you just created. Go to File -> Open and select the ConsoleApplication (make sure you saved and built your project). It should be available in the bin\debug folder. You should now get a tree view containing Namespaces, Classes, and Methods.
You can double-click on any Method to see it's IL. Whether you have created your Console Application in VB or C# does not matter. The IL will be mostly the same. The part of the IL we will be looking at is the following part, which should be the same for VB and C#.
.locals init ([0] string s)
IL_0000: nop
IL_0001: ldstr "Hello IL!"
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: call void [mscorlib]System.Console::WriteLine(string)
IL_000d: nop
IL_000e: call valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
IL_0013: pop
IL_0014: ret
} // end of method Program::Main
Yikes! Now that is pretty scary code! No, it isn't. We will look at it one line at a time. But before we do there is something you should know about IL. IL is a stack-based language. That means variables can only be pushed up on a stack (just literally think of it as a stack of values) and must be 'consumed' in the order they were pushed on the stack. So let's look at the code sample. In the first line we see .locals init ([0] string s)
. Does this even need explanation? This is simply the IL declaration of the string s
we declared in our program. The next line saysnop
which is a pretty accurate description of what it does, nop. We will ignore any nop
opcodes we'll run across. Did I say opcodes? Yes I did, because what we see here is an opcode, an OPeration CODE, that tells the machine what to do. basically everything you see in IL is an opcode, so it's a lot less scary than it sounds. Let's continue with the next line of code. This is where it gets interesting! ldstr "Hello IL!"
. The ldstr
opcode means that a string
should be pushed on the stack. In this case that string is "Hello IL!"
. The next line, stloc.0
stores this string
in local variable 0 (basically it stores the first item on the stack, which is the string Hello IL!
into the local variable 0). So what is local variable 0? Take a look at the first line again, .locals init ([0] string s)
. There is your answer, the string s
.
The next line says ldloc.0
. Can you guess what it does? It loads the value of local variable 0 on the stack. You will see a lot of opcodes starting with st or ld. It is safe to say these always mean STore and LoaD, you will do well to remember that. So what will IL do now "Hello IL!"
is back on the stack? It makes a call
toSystem.Console.Writeline
which take a string
as argument. At this point the string
that is on the stack is consumed and the stack is empty again. Whatever happens in Console.Writeline
is unknown to us (if you wish to look it up, be my guest though). Whenever the code returns from Console.Writeline
IL makes a call
toSystem.Console.ReadKey
which returns the valuetype System.ConsoleKeyInfo
and puts this on the stack. Since we are not using this ConsoleKeyInfo
the next opcode is pop
. Pop
simply takes the first value from the stack. That concludes the example and ret
is emitted, meaning return to the calling code.
Was that so hard? I don't think so. There are more opcodes you will see in this article, but you have seen the basics of IL, a stack-based language.
Further reading:
ILDASM.exe Tutorial
A list of OpCodes from the OpCodes Class
The first part of chapter 5 of the book Metaprogramming in .NET
Purpose of the nop opcode?
4. Under the hood of everyday .NET code
So now that you have seen a simple Hello World program let's look at some more interesting IL. Actually, let's look at some IL you may not have expected from looking at your code. Open one of the sample applications that can be downloaded at the top of this artible. Either UnderTheHoodVB
or UnderTheHoodCSharp
will do. You should leaveTheCuriousCaseOfFSharp
alone for now. Once you open the solution you will see two projects. One project contains some Windows Forms and the other contains some Classes that are or aren't used by the WinForms in the other project. The truth is that we are not going to run some of the code, it just sits there for theoretical analysis. The code that we will be running is mostly to show the code really does what I say it does. You might as well run ILDASM and open either the UnderTheHoodVB.Examples.dll
or UnderTheHoodCSharp.Examples.dll
since that is what we will be looking at mostly. The dll's can be found in the bin\debug folder of their respective project folders. So, are you set? Let's look at our first IL example!
4.1. The case of Properties
A question I have seen in the QA section of CP quite often is "What is the difference between a Public field and a Property?" or "Why would I use a Property instead of get and set functions?". Let's look at the first question first. In your solution open up theAutoPropertyClass
, the PropertyClass
and the GetterSetterClass
(all three found under the PropertyExample
folder). You will find the following code:Three seemingly completely different Class
es. However, if you open up ILDASM and look at the code that was generated you will find that the classes are actually pretty much the same! The AutoPropertyClass
and thePropertyClass
are even exactly the same. The compiler has actually generated backing fields for the auto-properties, as well as get and a set functions, the ones you see written in the GetterSetterClass
. What more do we see? A Property
is nothing more than a wrapper for a get and a set function (These are the red triangles in ILDASM, they are not present in the GetterSetterClass
's IL).
So what happens when we get or set the value of a Property
? This can be seen in the PropertyUser
class. It contains three methods, one that gets and sets the Properties
in the AutoPropertyClass
, one that does the same for the PropertyClass
and one that calls the get and set functions in the GetterSetterClass
. The generated IL? It's exactly the same for all three methods!
.locals init ([0] int32 n,
[1] class UnderTheHoodVB.Examples.PropertyExample.PropertyClass p,
[2] string t)
IL_0000: nop
IL_0001: newobj instance void UnderTheHoodVB.Examples.PropertyExample.PropertyClass::.ctor()
IL_0006: stloc.1
IL_0007: ldloc.1
IL_0008: ldstr "Hello"
IL_000d: callvirt instance void UnderTheHoodVB.Examples.PropertyExample.PropertyClass::set_Text(string)
IL_0012: nop
IL_0013: ldloc.1
IL_0014: ldc.i4.s 42
IL_0016: callvirt instance void UnderTheHoodVB.Examples.PropertyExample.PropertyClass::set_Number(int32)
IL_001b: nop
IL_001c: ldloc.1
IL_001d: callvirt instance string UnderTheHoodVB.Examples.PropertyExample.PropertyClass::get_Text()
IL_0022: stloc.2
IL_0023: ldloc.1
IL_0024: callvirt instance int32 UnderTheHoodVB.Examples.PropertyExample.PropertyClass::get_Number()
IL_0029: stloc.0
IL_002a: nop
IL_002b: ret
} // end of method PropertyUser::UseTheProperties
After the Hello World example you should actually be able to read this pretty well. We see some new opcodes such as newobj
, which is pretty self-explanatory. ldc.i4.s 42
might need some explanation. Ldc.i4 pushes a suppliedInt32
on the stack. The .s means it treats the supplied value as an Int16
rather than an Int32
, which may be right since 42 fits an Int16
just as well. What about the callvirt opcode? This is used to call overridable functions in a polymorphic manner. That is, callvirt
will call the function on a superclass rather than a base class even if the design time type of an object is of its baseclass (but a superclass is provided). Sounds difficult? Don't worry about it. In this context just assume callvirt
does the same as call
. So what do we see in the IL above? No such thing as a Property
is called, they are all get and set methods! So why would we still use Properties
? For starters they provide an intuitive API when coding. Instead of looking for the correct function to get or set some value we simply use one Property
to get or set the same value. Why not use a Public field
? Well, I hope that's pretty obvious. Properties
, through get and set methods, provide encapsulation and allow you to write extra code when a Properties
value is get or set.
That wraps up our example on Properties
. Was it what you expected it to be? Let's look at another VB and C# construct and see what IL makes of it.
4.2. The case of With
Have you ever wondered what happens when you use the With keyword in VB? It allows you to set someProperties
after initializing an Object
without needing a reference to the Object
. C# knows this same construct, but does not have a keyword for it like in VB. Let's look at the code. You can find it under the WithExample folder.
I think by now you can guess what the IL of ExampleWithoutWith
looks like, so I'm not going to discuss it. But what happens when we use that With
keyword? Here is the IL:
.locals init ([0] class UnderTheHoodVB.Examples.Person p,
[1] class UnderTheHoodVB.Examples.Person VB$t_ref$S0)
IL_0000: nop
IL_0001: newobj instance void UnderTheHoodVB.Examples.Person::.ctor()
IL_0006: stloc.1
IL_0007: ldloc.1
IL_0008: ldstr "Fu"
IL_000d: callvirt instance void UnderTheHoodVB.Examples.Person::set_FirstName(string)
IL_0012: nop
IL_0013: ldloc.1
IL_0014: ldstr "Bar"
IL_0019: callvirt instance void UnderTheHoodVB.Examples.Person::set_LastName(string)
IL_001e: nop
IL_001f: ldloc.1
IL_0020: ldc.i4.s 50
IL_0022: callvirt instance void UnderTheHoodVB.Examples.Person::set_Age(int32)
IL_0027: nop
IL_0028: ldloc.1
IL_0029: stloc.0
IL_002a: nop
IL_002b: ret
} // end of method WithExample::ExampleUsingWith
The first thing you should see is the extra local variable that is initialized. It has some weird name (by which you can see I'm using the VB generated IL) that is not valid in regular VB or C# code. What we then see is that a newPerson
is created, but it is not assigned to Person p
, but to the extra, weird variable. From then on everything is pretty normal, all the Properties
are set on the extra, variable. After that, on IL_0028, the weird variable is assigned to our own variable p
. If you are looking at the C# generated IL you will see exactly the same, except that the extra variable has a different name. See how the compiler is playing tricks on us again?
Let's look at another, pretty easy example before we start off with the more difficult stuff.
4.3. The case of For Each
One of my favorite, yet easy, compiler tricks is how it handles the For Each... Next Statement (foreach in C#). You can open up the example in the ForEachExamples
folder. It contains a single Class
with four methods. Two methods for an iteration on an IEnumerable and two methods for an iteration on an IEnumerable(Of T)(IEnumerable<T>
in C#). The first method simply uses the For Each Keyword
, the second uses the code as it is emitted by the compiler (as seen in IL). So let's look at the code for the IEnumerable
which uses For Each
.
These methods look pretty straightforward and it's probably not something you haven't done a million times before. The IL code that is generated is slightly different for C# and VB. So let's look at the IL code for VB. Have a look at the IL of C# at your own leisure.
.method public static void ForEach(class [mscorlib]System.Collections.IEnumerable l,
class [mscorlib]System.Action`1<object> 'handler') cil managed
{
// Code size 82 (0x52)
.maxstack 2
.locals init ([0] object obj,
[1] class [mscorlib]System.Collections.IEnumerator VB$t_ref$L0,
[2] bool VB$CG$t_bool$S0)
IL_0000: nop
IL_0001: nop
.try
{
IL_0002: ldarg.0
IL_0003: callvirt instance class [mscorlib]System.Collections.IEnumerator [mscorlib]System.Collections.IEnumerable::GetEnumerator()
IL_0008: stloc.1
IL_0009: br.s IL_0025
IL_000b: ldloc.1
IL_000c: callvirt instance object [mscorlib]System.Collections.IEnumerator::get_Current()
IL_0011: call object [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::GetObjectValue(object)
IL_0016: stloc.0
IL_0017: ldarg.1
IL_0018: ldloc.0
IL_0019: call object [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::GetObjectValue(object)
IL_001e: callvirt instance void class [mscorlib]System.Action`1<object>::Invoke(!0)
IL_0023: nop
IL_0024: nop
IL_0025: ldloc.1
IL_0026: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
IL_002b: stloc.2
IL_002c: ldloc.2
IL_002d: brtrue.s IL_000b
IL_002f: nop
IL_0030: leave.s IL_0050
} // end .try
finally
{
IL_0032: ldloc.1
IL_0033: isinst [mscorlib]System.IDisposable
IL_0038: ldnull
IL_0039: ceq
IL_003b: ldc.i4.0
IL_003c: ceq
IL_003e: stloc.2
IL_003f: ldloc.2
IL_0040: brfalse.s IL_004e
IL_0042: ldloc.1
IL_0043: isinst [mscorlib]System.IDisposable
IL_0048: callvirt instance void [mscorlib]System.IDisposable::Dispose()
IL_004d: nop
IL_004e: nop
IL_004f: endfinally
} // end handler
IL_0050: nop
IL_0051: ret
} // end of method ForEachExamples::ForEach
Wow! That is a lot of IL for such a short piece of code! I have pasted the entire IL code in here, because there are two input arguments. You see no less than two extra local variables that are created by the compiler. AnIEnumerator and a Boolean
(bool
in IL and C#). Also, we see a Try Finally Block that I really didn't put there in my code. As you can see the first thing that is done is that a call is made to the GetEnumerator method on theIEnumerable
(which is pushed on the stack by ldarg.0 (or 'load argument 0', where argument means a parameter that was passed to the method). We then see a weird opcode, br.s. Whenever you see an opcode that starts withbr
this usually means BRanch. It is followed by an address such as IL_0025. Br_s
means that the code will continue executing on the specified address (a jump or GoTo if you will). So if we follow this path we will see that a call toMoveNext is made on the Enumerator
. The result, a Boolean
(bool
in C#) is stored at local variable 2. We should now be able to guess what brtrue.s means. BRanch if TRUE to the specified address. We seek out the address IL_000b and end up right where we were. A call to get_Current (a Property
!) is made. We are going to ignore the next line, it boxes the Object
. VB generated IL differs from C# generated IL on this point. C# never makes the call to GetObjectValue. Next we are going to call Invoke
on the delegate we passed to the method (the Object
from the call to get_Current
is on the stack and passed to the Invoke
method. MoveNext
is called again and the loop starts again. If MoveNext
returns false
we move to the finally block
. Here the IEnumerator
is checked for type IDisposable. The isinst opcode casts an Object to a specified type. If the IEnumerator
implementsIDisposable
then Dispose is called (to cleanup resources) and the method is finished executing.
Now that we have stepped through the IL line by line we should actually be able to translate that IL back into VB or C#! That is exactly what I have done. Take a look at the following code and also notice the slight difference between VB and C#.
Now compare the IL that was generated by function ForEach
and by function ForEachRewritten
, they are exactly the same! That's pretty neat, isn't it? There is quite some stuff going on that you didn't know about! Who would have thunk it?
I have done the same for GenericForEach
and GenericForEachRewritten
(which use an IEnumerable(Of T)) (IEnumerable<T>
in C#). You may explore their respective IL at your own leisure. You can check if the different functions really have the same output by starting the application and clicking the 'For each'-button. You now get aForm
with four buttons which each execute one of the ForEach
functions and print their output to the TextBoxes
.
4.4. The case of Lambda Expressions
4.4.1. An easy level example
Now you should have gotten the hang of it. Let's look at another example of how IL generates something pretty different than what you had typed. Lambda Expressions (a sort of anonymous function+) are a great example of what the compiler can do! Open up the LambdaExamples
folder in the solution and look for theEasyLambdaButtonFactory
. What we are going to do is create a set of Buttons and assign a lambda expression to the Button.Click Event. Take a look at the following code.
So what do we see here? In a loop we create ten Buttons
. We assign the string
value of "1"
to each Buttons Text Property
. Whenever the Button
is clicked we cast the sender
to a Button
, convert the Buttons Text Property
to an Integer
, add 1 to it and assign the new value to the Buttons Text Property
. The result should be that each time you click a Button
its Text
is incremented by one. You can see this for yourself on theEasy lambda form
by starting the application.
Let's look at the IL again. Actually we can see some weird stuff has happened just by looking at the Class
in ILDASM. It got an extra method that we did not implement!
Now where did that extra Shared (static in C#) function come from? That is our lambda expression! Just look at the IL of that thing.
.method private specialname static void _Lambda$__4(object sender,
class [mscorlib]System.EventArgs e) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
// Code size 38 (0x26)
.maxstack 3
.locals init ([0] class [System.Windows.Forms]System.Windows.Forms.Button senderBtn,
[1] int32 VB$t_i4$S0)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: castclass [System.Windows.Forms]System.Windows.Forms.Button
IL_0007: stloc.0
IL_0008: ldloc.0
IL_0009: ldloc.0
IL_000a: callvirt instance string [System.Windows.Forms]System.Windows.Forms.ButtonBase::get_Text()
IL_000f: call int32 [mscorlib]System.Convert::ToInt32(string)
IL_0014: ldc.i4.1
IL_0015: add.ovf
IL_0016: stloc.1
IL_0017: ldloca.s VB$t_i4$S0
IL_0019: call instance string [mscorlib]System.Int32::ToString()
IL_001e: callvirt instance void [System.Windows.Forms]System.Windows.Forms.ButtonBase::set_Text(string)
IL_0023: nop
IL_0024: nop
IL_0025: ret
} // end of method EasyLambdaButtonFactory::_Lambda$__4
It gets the argument Object sender
and EventArgs e
, it casts the sender
to a Button
, Gets the Text
and converts it to an Integer
, adds one to it (using the add_ovf opcode) and assigns it to the Text Property
of theButton
. That can't be a coincidence!
So how is this thing called in the function we DID implement?
Well, here you have it.
IL_001d: ldftn void UnderTheHoodVB.Examples.LambdaExamples.EasyLambdaButtonFactory::_Lambda$__4(object,
class [mscorlib]System.EventArgs)
IL_0023: newobj instance void [mscorlib]System.EventHandler::.ctor(object,
native int)
IL_0028: callvirt instance void [System.Windows.Forms]System.Windows.Forms.Control::add_Click(class [mscorlib]System.EventHandler)
A pointer to the generated function is pushed up the stack (using the ldftn opcode). A new instance of anEventHandler delegate is created and a the pointer is passed to the constructor. The EventHandler
is added to the list of listeners for the Buttons Click Event
. We might as well had coded our own Shared
(static
in C#) function and used AddressOf (+= operator in C#).
So why don't we? Well, first of all it is not very readable to have a lot of Shared functions
sitting around in ourClass
that are used in just one place, second because we can do very nifty stuff using lambda expressions, as we will see in the next example.
By the way, you might have noticed that the C# compiler also created a field called something likeCachedAnonymousDelegate
. Want to know what's up with that? This is a little performance issue. In order to prevent creating multiple delegates the C# compiler creates one, stores it and re-uses it instead of creating a new delegate every time.
4.4.2. A medium level example
So it is time to open up the MediumLambdaButtonFactory
. This lambda is just slightly different from the first one. Let's see.
What is the trick? We have used the btn
variable in the handler, even though is outside the scope of the lambda expression (had we created a Shared function
we would not have access to the btn
variable)! So let's check ILDASM again.
Holy cow! The compiler created an entire new type called _Closure$__5
! What's that? It holds the Button
that was outside the scope of the function as a field and has a function called _Lambda$__9
. I don't think it's necessary to take a look at the IL of that _Lambda$__9
function. It simply does what the previous lambda example did, except this time it doesn't cast the sender to a Button
, instead it uses the btn
field. What is interesting is to look at the IL of GenerateButtons
. Unfortunately I can't post it here since the emitted lines of opcode become to wide for the average monitor. However, what we see in the IL is that a new instance of _Closure$__5
is created and that the Button
is assigned to _Closure$__5.$VB$Local_btn
. To assign the function to the Button.Click Event
the same code is emitted as in the example above, except the delegate
now takes a pointer to_Closure$__5._Lambda$__9
.
So why did the compiler create an inner type for this function? As I said, the btn
variable is out of the functions scope. In this case the btn
variable would actually go out of scope as soon as the next For Loop
starts, but the function that is created by our lambda expression stays alive for as long as the Button
that the btn
variable points to does or until the Click Handler
is removed. So the compiler must find a way to keep that btn
variable alive for as long as the delegate
is alive. It does this by wrapping the btn
variable in a new Type
and keeping a reference to that an instance of that Type
through the Button.Click Event
. So while this example does the exact same as the previous example (you can check it in the medium lambda form
) the emitted IL is quite a bit different!
You might have guessed, but what happens if variables from multiple scopes are used in the same lambda expression? The compiler actually creates an inner type for each level of scope! So let's look at the next example, in which we are going to introduce a counter outside the For Loop that is shared among all Button Click Events
.
So what we see here is that the btn
variable is still within the scope of the For Loop
, but the counter
is outside the scope of the For Loop
and thus shared by all Buttons
. This means that if you would click one button it's Text
would change to "2" and if you would then click another button it's Text
would change to "3" (because the first button already incremented the counter
). You can see this effect in the hard lambda form
.
Once again we will take a look at ILDASM to see what was created for us by the compiler.
That's a type inside a type inside a type that was created for you... The most inner type (_Closure$__2
) holds a reference to it's outer type (_Closure$__1
). Why is that? Well, the lambda needs a reference to a btn
variable, which is unique for each Event Handler
, and a reference to the counter
variable, which is shared between allEvent Handlers
. So each Buttons
Click EventHandler
will have a reference to a unique instance of_Closure$__2
which will all hold a reference to the same instance of _Closure$__1
, which holds the counter
variable. If you would look at ILDASM with the C# project you would see that there is no inner-inner type, just two inner types. Besides that small difference all else still holds true for C#. Once again I won't show any IL code, because it wouldn't fit the page. You can look at it yourself. It's quite a bit, but don't be discouraged! Simply read it line by line and you will get it. We will look at the VB and C# equivalents in a minute, if it isn't clear to you now it will be in the next example.
4.4.4. An insane level example
Don't let that title scare you off. The only thing that makes this example slightly more difficult from the previous one is that a new level of scope was added to the lambda. I have created an additional counter called _outerCounter
as a field in the InsaneLambdaButtonFactory
.
So as you see the _outerCounter
variable is used inside the lambda and is shared by all Buttons
(much like thecounter
variable). There is a difference though. _outerCounter
might be changed by something other than a button click. So another scope another inner type? Nope! In this case the first inner type holds a reference to an instance of the object that created it.
Pretty smart, eh? The function in _Closure$__4
now has access to the instance of the ButtonFactory
that created it through the reference to _Closure$__3
. This way the ButtonFactory
and the lambda function both look at the same _outerCounter
. The _outerCounter
is incremented by one if the counter
that is shared by just the Buttons
is incremented ten times. If you want to see how it works go ahead and open up the insane lambda form
and click twenty times on whichever buttons you want.
So that is pretty neat, but wouldn't it be clearer if you could see some of this in VB or C#? Well, it's your lucky day! I have studied the IL for the insane lambda example and made a Class
that creates the exact same IL (save for some name changes). Go take a look at InsaneLambdaButtonFactoryRewritten
and compare the emitted IL to that of the InsaneLambdaButtonFactory
. Also compare the VB variant to the C# variant to spot some minor differences.
As you can see there is not a sight of a lambda expression. You should be able to debug this code and see what it does. The lambda part was moved to the function in the InnerInnerLambda
(InnerLambda2
in C#). This also holds the btn
variable and a reference to InnerLambda
(or InnerLambda1
in C#). The InnerLambda
takes care of the counter
variable and holds a reference to the ButtonFactory
for the _outerCounter
. All of the variables are set in the original function that creates the buttons. In fact you can see all variables have been removed from this function and are replaced by InnerLambda
and InnerInnerLambda
calls.
You can check that the InsaneLambdaButtonFactoryRewritten
really does the same as theInsaneLambdaButtonFactory
by running the application and opening the insane lambda rewritten Form
.
You can also experiment with this yourself, try nesting even further using nested For Each loops
and If Then Else Statements
. You now know how it works!
Further reading:
Lambda Expressions (Visual Basic) on MSDN
Anonymous functions (C# Programming Guide) on MSDN
A couple of blogs on the subject:
Anonymous Methods, Part 1 of ?
Anonymous methods as event handlers - Part 1
The implementation of anonymous methods in C# and its consequences (part 1)
I now encourage you to take a look at the lecture of Bart de Smet. He also takes a look at lambda expressions and additionally explains how they could cause memory leaks if you are not careful. It is actually the first topic he talks about so you can just start the video, sit back and relax.
Bart de Smet - Behind the Scenes of 10 C# Language Features
4.5. The case of Anonymous Types
Let's move on to the next VB and C# construct I have prepared for you, anonymous types. You can find the examples in the AnonymousTypeExamples
folder in your solution. What we are going to do is create a collection ofPerson
objects and select a sub-set of Properties
, which will create a so-called anonymous type. So let's look at the first example. Look at either the FormalNamePeopleFactory
or the NickNamePeopleFactory
. It does not matter at which we'll look first, so I'll go with the FormalNamePeopleFactory
. Here is the code for it.
That's not a lot of code, but a lot is going on that you don't know about (but will know about in a few moments). First, let's see what this code actually does. PeopleHelper.GetPeople
simply creates a collection of Person
objects. We then call the Select function, which is an Extension Method on IEnumerable(Of T)
(IEnumerable<T>
in C#). You can see we are creating a new Object
because both the VB and C# example have the New Keyword
. However, instead of defining a Type
, such as New Person
, we are using that With Keyword
again (see the earlier With example
in this article). We then define a set of non-existant Properties
and assign a value to them.
Let's look at the IL that was generated for this function.
.method private specialname static class VB$AnonymousType_0`2<string,int32>
_Lambda$__2(class UnderTheHoodVB.Examples.Person p) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
// Code size 38 (0x26)
.maxstack 3
.locals init ([0] class VB$AnonymousType_0`2<string,int32> _Lambda$__2,
[1] class VB$AnonymousType_0`2<string,int32> VB$t_ref$S0)
IL_0000: ldarg.0
IL_0001: callvirt instance string UnderTheHoodVB.Examples.Person::get_LastName()
IL_0006: ldstr ", "
IL_000b: ldarg.0
IL_000c: callvirt instance string UnderTheHoodVB.Examples.Person::get_FirstName()
IL_0011: call string [mscorlib]System.String::Concat(string,
string,
string)
IL_0016: ldarg.0
IL_0017: callvirt instance int32 UnderTheHoodVB.Examples.Person::get_Age()
IL_001c: newobj instance void class VB$AnonymousType_0`2<string,int32>::.ctor(!0,
!1)
IL_0021: stloc.0
IL_0022: br.s IL_0024
IL_0024: ldloc.0
IL_0025: ret
} // end of method FormalNamePeopleFactory::_Lambda$__2
Of course we have used a lambda expression, so we should look at the IL in the generated _Lambda$__2
function. As you can see this is a function that returns a VB$AnonymousType_0`2<string, int32>
(it's in the most upper line). We see that the fullname is pushed up the stack, p.LastName
, ", "
and p.FirstName
and are concatenated. Then the concatenated FullName
and p.Age
are pushed on the stack and a new instance of anAnonymousType(Of T1, T2)
(AnonymousType<T1, T2>
in C#) is created where T1
is a string
(theFullName Property
) and T2
is an int32
(the Age Property
). So where did this AnonymousType(Of T1, T2)
come from and why is it Generic?
When you check ILDASM you can actually see the AnonymousType
sitting in the Global Namespace
.
So the compiler actually creates a new type for you (making the anonymous type a lot less anonymous under the hood). So that explains where the AnonymousType
came from, but not why it is Generic
or why it is sitting in theGlobal Namespace
and not just right next to the function where it is used (perhaps even as another inner type).
That second part can be explained by looking at the other example, NickNamePeopleFactory
. So let's look at the code.
As you can see this function does almost exactly the same, except the FullName Property
is formatted slightly different. Since this is another function in another Class
you would expect the compiler to simply create another anonymous type (after all, it does that for lambda's too). This is not the case however, when we look at the IL code of this function we can see the following.
.locals init ([0] class VB$AnonymousType_0`2<string,int32> _Lambda$__3,
[1] class VB$AnonymousType_0`2<string,int32> VB$t_ref$S0)
IL_0000: ldarg.0
IL_0001: callvirt instance string UnderTheHoodVB.Examples.Person::get_FirstName()
IL_0006: ldstr " "
IL_000b: ldarg.0
IL_000c: callvirt instance string UnderTheHoodVB.Examples.Person::get_LastName()
IL_0011: call string [mscorlib]System.String::Concat(string,
string,
string)
IL_0016: ldarg.0
IL_0017: callvirt instance int32 UnderTheHoodVB.Examples.Person::get_Age()
IL_001c: newobj instance void class VB$AnonymousType_0`2<string,int32>::.ctor(!0,
!1)
IL_0021: stloc.0
IL_0022: br.s IL_0024
IL_0024: ldloc.0
IL_0025: ret
} // end of method NickNamePeopleFactory::_Lambda$__3
It looks much like the IL that the previous method generated, although we can see a difference in the formatting of the FullName Property
. But that's really the only difference we see! The same anonymous type is used for this function! Now what if this anonymous type was used in a Private Inner Class
and the anonymous type would have been another subtype of the Private Class
? Then obviously the NickNamePeopleFactory
would not have access to it anymore and a new AnonymousType
would have to be generated. Appearently it takes longer for the compiler to generate a new AnonymousType
than to reuse an already existing one.
So why, then, is it Generic
? Because the AnonymousType
is in the Global Namespace
it does not have access to any Private Types
, but because the AnonymousType
is Generic
it never actually references anyPrivate Types
and as such it could be reused with any Type you could possibly think of. Let's look at theBuggedPeopleFactory
.
As you can see some nitwit programmer (in this case me) switched Age
and FullName
so Age
now displaysFullName
and FullName
displays Age
! That means FullName
is no longer a string
and Age
is no int32
. Yet when we look at the generated IL we can see that the same AnonymousType
is used.
.method private specialname static class VB$AnonymousType_0`2<int32,string>
_Lambda$__1(class UnderTheHoodVB.Examples.Person p) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
// Code size 22 (0x16)
.maxstack 2
.locals init ([0] class VB$AnonymousType_0`2<int32,string> _Lambda$__1,
[1] class VB$AnonymousType_0`2<int32,string> VB$t_ref$S0)
IL_0000: ldarg.0
IL_0001: callvirt instance int32 UnderTheHoodVB.Examples.Person::get_Age()
IL_0006: ldarg.0
IL_0007: callvirt instance string UnderTheHoodVB.Examples.Person::get_FirstName()
IL_000c: newobj instance void class VB$AnonymousType_0`2<int32,string>::.ctor(!0,
!1)
IL_0011: stloc.0
IL_0012: br.s IL_0014
IL_0014: ldloc.0
IL_0015: ret
} // end of method BuggedPeopleFactory::_Lambda$__1
How about that? It simply returns the same AnonymousType
, but with different Generic
parameters
. Now if we would have used a Private Type
that another method that uses the same AnonymousType
does not have access to? It can still use the same AnonymousType
, just with other Generic parameters
! It really is a work of beauty!
But why and when are anonymous types reused anyway? They are reused when the number, name and order of the Properties
on the anonymous type are the same. For example, try switching FullName
and Age
around on one of the functions and you will see a second anonymous type being created in ILDASM. You could also spellFulName
with a single L on one of the functions and you will likewise see a new anonymous type being generated. The reason they are reused is so you can have two lists of anonymous types that represent the same Object
and you can still compare them (if they would have been different Types
entirely a comparison would always returnFalse
).
By the way, did you notice all the functions in the example return an IList? That is so I can bind to the anonymous type that is returned by the functions. You can see this in action in the Forms
that are in the GroupBox labeled 'Anonymous type examples
'.
Further reading:
Anonymous Types (Visual Basic) on MSDN
Anonymous Types (C# Programming Guide) on MSDN
Why are anonymous types generic?
And once again I also want to point you at the lecture by Bart de Smet. He explains a thing or two about anonymous types at around 45:30 mins.
Bart de Smet - Behind the Scenes of 10 C# Language Features
4.6. The case of Cases
The next thing I want to talk about is the Select Case Statement (switch Statement in C#). There is some magic going on here which is explained very well by Bart de Smet at around 13:40 mins. I recommend you watch this part before continuing. At this point I have some sad news for the C#ers who are reading this article. The next section is VB only (but of course you're very welcome to read it too). Why VB only? VB has a special kind of
Select Case
, being one where they turn things around quite a bit. A regular Select Case
compares a value to other values and executes the Case
where the two values are the same (or Else
). In this Select Case
however the first Case
where whatever statement returns True
is executed. Let's look at an example of a regular case in VB. You can find the code under the SelectCaseExample
in the VB solution.Public Sub DoACase()
Dim i As Integer = 10
Select Case i
Case 1
Console.WriteLine("i = 1")
Case 2
Console.WriteLine("i = 2")
Case 3
Console.WriteLine("i = 3")
Case Else
Console.WriteLine("i is something else.")
End Select
End Sub
As you can see i
is compared to 1, 2 and 3 and the code inside the cases is executes only when a Case
returnsTrue
. Now let's turn things around.
Public Sub DoATrueCase()
Dim i As Integer = 10
Select Case True
Case i = 1
Console.WriteLine("i = 1")
Case i = 2
Console.WriteLine("i = 2")
Case i = 3
Console.WriteLine("i = 3")
Case Else
Console.WriteLine("i is something else.")
End Select
End Sub
As you can see in this Select Case
we test for a couple of statements (that could be anything as long as it returns a Boolean
) and compare their outcomes to True
. However, let's look at their respective generated IL.
.method public instance void DoACase() cil managed
{
// Code size 86 (0x56)
.maxstack 2
.locals init ([0] int32 i,
[1] int32 VB$t_i4$L0,
[2] int32 VB$CG$t_i4$S0)
IL_0000: nop
IL_0001: ldc.i4.s 10
IL_0003: stloc.0
IL_0004: nop
IL_0005: ldloc.0
IL_0006: ldc.i4.1
IL_0007: sub
IL_0008: stloc.2
IL_0009: ldloc.2
IL_000a: switch (
IL_001d,
IL_002b,
IL_0039)
IL_001b: br.s IL_0047
IL_001d: nop
IL_001e: ldstr "i = 1"
IL_0023: call void [mscorlib]System.Console::WriteLine(string)
IL_0028: nop
IL_0029: br.s IL_0053
IL_002b: nop
IL_002c: ldstr "i = 2"
IL_0031: call void [mscorlib]System.Console::WriteLine(string)
IL_0036: nop
IL_0037: br.s IL_0053
IL_0039: nop
IL_003a: ldstr "i = 3"
IL_003f: call void [mscorlib]System.Console::WriteLine(string)
IL_0044: nop
IL_0045: br.s IL_0053
IL_0047: nop
IL_0048: ldstr "i is something else."
IL_004d: call void [mscorlib]System.Console::WriteLine(string)
IL_0052: nop
IL_0053: nop
IL_0054: nop
IL_0055: ret
} // end of method SelectCaseExample::DoACase
You can clearly see a Select Case
being executed here (it's the switch opcode). So let's look at the second example where the Select Case does not compare values, but looks if a given statement returns True.
.method public instance void DoATrueCase() cil managed
{
// Code size 97 (0x61)
.maxstack 3
.locals init ([0] int32 i,
[1] bool VB$t_bool$L0,
[2] bool VB$CG$t_bool$S0)
IL_0000: nop
IL_0001: ldc.i4.s 10
IL_0003: stloc.0
IL_0004: nop
IL_0005: ldc.i4.1
IL_0006: stloc.1
IL_0007: nop
IL_0008: ldloc.1
IL_0009: ldloc.0
IL_000a: ldc.i4.1
IL_000b: ceq
IL_000d: ceq
IL_000f: stloc.2
IL_0010: ldloc.2
IL_0011: brfalse.s IL_0020
IL_0013: ldstr "i = 1"
IL_0018: call void [mscorlib]System.Console::WriteLine(string)
IL_001d: nop
IL_001e: br.s IL_005e
IL_0020: nop
IL_0021: ldloc.1
IL_0022: ldloc.0
IL_0023: ldc.i4.2
IL_0024: ceq
IL_0026: ceq
IL_0028: stloc.2
IL_0029: ldloc.2
IL_002a: brfalse.s IL_0039
IL_002c: ldstr "i = 2"
IL_0031: call void [mscorlib]System.Console::WriteLine(string)
IL_0036: nop
IL_0037: br.s IL_005e
IL_0039: nop
IL_003a: ldloc.1
IL_003b: ldloc.0
IL_003c: ldc.i4.3
IL_003d: ceq
IL_003f: ceq
IL_0041: stloc.2
IL_0042: ldloc.2
IL_0043: brfalse.s IL_0052
IL_0045: ldstr "i = 3"
IL_004a: call void [mscorlib]System.Console::WriteLine(string)
IL_004f: nop
IL_0050: br.s IL_005e
IL_0052: nop
IL_0053: ldstr "i is something else."
IL_0058: call void [mscorlib]System.Console::WriteLine(string)
IL_005d: nop
IL_005e: nop
IL_005f: nop
IL_0060: ret
} // end of method SelectCaseExample::DoATrueCase
Well, well, well! Not a single switch
opcode can be found! What we have here is a lot of comparisons and BRanch opcodes. What you can see here is that it's actually something that looks like an If Then ElseIf ElseIf Else Statement
is being generated. Compare the IL of the previous example with the IL of the DoAnIfThenElseIf
method. You will see some similiarities. You might also see why C# does not support it. It is not a switch Statement
, but it is also not as consise as If Then ElseIf Else
. As for readability, I'll leave that up to you.
4.7. The case of Iterators
Was the previous example for VB readers, this example is actually for C# people. Iterator methods have been featured in C# for a while now. It is featured in VB in the VS Async CTP release and it will be featured in VB11 by default (or so I was told). I will not explain this part in much detail since Bart de Smet does a great job at explaining it too. I am simply going to point out some stuff.
Let's first look at a code example using an Iterator
. It can be found in the IteratorExample
folder in your C# solution.
public static string UseTheItator()
{
StringBuilder sb = new StringBuilder();
foreach (string s in EnumeratorFunction())
{
sb.AppendLine(s);
}
return sb.ToString();
}
private static IEnumerable<string> EnumeratorFunction()
{
string hello = "Hello";
yield return hello;
hello += " people!";
yield return "Iterator!";
}
What do you think will the StringBuilder in UseTheIterator
return? "Hello people! Iterator"
? On the Main form
press the Iterator button
and find out. You can see the returned text is "Hello Iterator"
. This is strange, because that would mean the hello
variable was already returned to the calling method before " people!"
was appended to it, but "Iterator!"
was returned as well. This is exactly what an Iterator
does. The yield Keyword tells the function to return to the calling method, then come back and continue executing. Just how does it do this? ILDASM has the answer.
Sweet mother of IL! The C# compiler generated a Type
that implements both IEnumerable<string>
andIEnumerator<string>
! If we look at the IL of the original EnumeratorFunction
we can see that it simply returns an instance of this generated type.
.locals init ([0] class UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0' V_0,
[1] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> V_1)
IL_0000: ldc.i4.s -2
IL_0002: newobj instance void UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::.ctor(int32)
IL_0007: stloc.0
IL_0008: ldloc.0
IL_0009: stloc.1
IL_000a: br.s IL_000c
IL_000c: ldloc.1
IL_000d: ret
} // end of method IteratorExample::EnumeratorFunction
So where is all the logic to return "Hello"
, append " people!"
etc.? You can all find it in the MoveNext
method of the generated type.
.method private hidebysig newslot virtual final
instance bool MoveNext() cil managed
{
.override [mscorlib]System.Collections.IEnumerator::MoveNext
// Code size 142 (0x8e)
.maxstack 3
.locals init ([0] bool CS$1$0000,
[1] int32 CS$4$0001)
IL_0000: ldarg.0
IL_0001: ldfld int32 UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<>1__state'
IL_0006: stloc.1
IL_0007: ldloc.1
IL_0008: switch (
IL_001f,
IL_001b,
IL_001d)
IL_0019: br.s IL_0021
IL_001b: br.s IL_004d
IL_001d: br.s IL_0080
IL_001f: br.s IL_0023
IL_0021: br.s IL_0088
IL_0023: ldarg.0
IL_0024: ldc.i4.m1
IL_0025: stfld int32 UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<>1__state'
IL_002a: nop
IL_002b: ldarg.0
IL_002c: ldstr "Hello"
IL_0031: stfld string UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<hello>5__1'
IL_0036: ldarg.0
IL_0037: ldarg.0
IL_0038: ldfld string UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<hello>5__1'
IL_003d: stfld string UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<>2__current'
// Etc...
IL_0080: ldarg.0
IL_0081: ldc.i4.m1
IL_0082: stfld int32 UnderTheHoodCSharp.Examples.IteratorExample.IteratorExample/'<EnumeratorFunction>d__0'::'<>1__state'
So what can we see here? Each time MoveNext
is called the _state
field is incremented by one and dependent on the value of the _state
field MoveNext
performs another piece of code. Feels kind of 'dirty' doesn't it? Anyway, the compiler really does an excellent job in keeping such difficult stuff hidden from the programmer. It really is a piece of art!As I said, you should really check out Bart de Smet's talk on Iterators
. He starts about Iterators
after about 36:30 mins.
Bart de Smet - Behind the Scenes of 10 C# Language Features
5. Emitting IL using VB or C#
As I already mentioned we can emit our own opcodes and generate IL on the fly using VB or C#! That is exactly what we are going to do here. But, we are not going to generate just any method, we are going to generate a method that makes use of a Try... Fault Block
! This feature is not available in VB or C#, but it is available in IL. The Try... Fault
block looks like a Try... Catch with the difference that a Fault
block does not actually catch the Exception. It simply executes some code, but only when an Exception
is thrown (and ALWAYS if anException
is thrown, much like the Try... Finally Block
). This is not as hard as it sounds, really. Open up the TypeFactory Class
in the EmitExamples
folder of your solution. When you open it you see a Public Shared
(static
in C#) function that returns a Type. The Type
, however, is generated when the function is called for the first time. Let's see how the Type
is created.
First we must create an Assembly (or dll) to hold the type, we can do this using an AssemblyBuilder. After that we create a Module, using a ModuleBuilder, that actually holds the Type
we are going to create. With the Module
we can get a TypeBuilder which builds the Type
and gives us access to MethodBuilders to define new methods on theType
. That is all fairly simple, right? Let's take a look at the code and it will become clear to you.
At this point we have done practically everything that is needed to implement our own methods on a type. That was really only a few lines of code! So how do we get our methods? We can see an example of that in theGenerateILMethod
method.
As you can see we call the DefineMethod method on the TypeBuilder
, which returns a MethodBuilder
. The method will have the name "InternalILMethod"
and it will be Private
and Shared
(static
in C#). It also requires two parameters, in this case an Action(Of String) (Action<string>
in C#) and a Boolean
. Now that we have a MethodBuilder
we want to give it some body. We want to emit some opcodes so the method actually does something when we are going to call it later. We do this by calling GetILGenerator on the MethodBuilder
.GetILGenerator
returns an ILGenerator Object for the current method. We can then use the ILGenerator
to emit opcodes. This is actually pretty easy as you will see.
So we now have a method that would look like the following:
There is a little trick you should remember should you ever need to emit your own opcodes like this. First write and build the code you actually want to emit and then check IL to see the IL that was emitted. Doing this will greatly simplify coding IL like this. So we now have our method, that uses a delegate
to pass some messages to its caller and may throw an Exception
if the supplied Boolean
is True
. If an Exception
is thrown the code will step into the Fault
block and send the message "Emit method finished unsuccessfully."
. The code will always execute the part in the Finally
block. We have a little problem now though. I will be calling this method dynamically, and the Exception
will not be caught by me, but by the dynamic caller, which will wrap it into anotherExceptions InnerException
and then throw it back to me. I prefer to dynamically call a method that does not throw an Exception
. So what we are going to do is call the method we just created from another method we are going to create. This extra method will catch the Exception
for us and pass the Exception Message
andStackTrace
to the delegate
.
Phew, that was a lot of code for a method that does so little! Well, such is the nature of IL. One thing you should notice is that I am using MethodBody Objects to make calls to methods on Objects
that are on the stack. In the call to the InternalILMethod
we just created I can simply put in the MethodBuilder
for that method as an argument. Because the method is Shared
(static
in C#) I do not need a reference to the current Object
.
So now that we have implemented two methods, one which calls the other, we have to actually create the Type
to be able to use it. Luckily this is very easy. We just call CreateType on the TypeBuilder
.
Now all I have to do is return the Type
to the caller and call the method we just generated. You can see how that's done in the EmitForm
. You can also open the Emit form
from the Main form
to see what happens when the method is called with and without throwing an Exception
. You can actually see at the StackTrace
that we really created a new method that calls another method in our dynamically created Type
! Ain't that something!?
Further reading:
I can actually recommend to read the MSDN documentation if you would like to know more on the Reflection.Emitclasses. For example, the ILGenerator.BeginExceptionBlock and the ILGenerator.BeginCatchBlock have pretty nice, worked out examples of creating methods dynamically.
Another recommendation I can make is to read the second part of chapter 5 in the book Metaprogramming in .NET.
Bonus:
CP member Pieter van Parys made me aware of a tool called 'BLToolkit' (Business Logic Toolkit). It has some cool features including some Classes to help you generate dynamic code. The EmitHelper should be especially interesting to anyone who wants to emit his own opcodes using VB or C#!
Thanks to Pieter van Parys for pointing this out to me :thumbsup:
6. Generating IL using Expression Trees
Luckily there is a shorter way to emit IL using the .NET Framework. It's called Expression Trees. You might have noticed that we actually created three methods on our dynamic Type
. Two using IL and another one usingExpression Trees
. What exactly is an Expression Tree
? It is a representation of code in the form of data. That sounds pretty abstract, but believe me, it's not. Expression Trees
revolve around theSystem.Linq.Expressions.Expression Type. All Expressions Inherit
from this base class and all Expressions
can be created using Shared
(static
in C#) factory methods on this type. Let's look at the code that created theInternalILMethod
above, but this time using Expression Trees
.
Does that look easy? Not exactly, mostly because I nested every Expression
in the containing Expression
. You should read it as follows: We declare a TryFinally Expression, which requires an Expression
that makes up the body for the Try
block and an Expression
that makes up for the body of the Finally
block. As body of the Try
block we create a TryFault Expression which, as you can guess, again needs an Expression
for the Try
block and an Expression
for the Fault
block. So for the Try
block we create a Block of Expressions, starting with a Call Expression which invokes Invoke
on the Action(Of String)
(Action<string>
in C#) parameter and passes the Constant Expression "Entered Expression Tree Method."
as an argument. The next Expression
in ourBlock Expression
is an IfThenExpression, which of course needs an If
and a Then Expression
. So for the If we create an EqualExpression and compare the Boolean
parameter to the Constant
value True
. In the Then
block we put a ThrowExpression in which we put a NewExpression which creates the Exception
. We are now out of the IfThenExpression
and into the BlockExpression
again, where we put in a final Expression
, being another call to the Invoke
method of the delegate input parameter. That concludes the Try
block and we are now in the Fault
block, where we again do a call to Invoke
. We are then out of the Fault
block and in the Finally
block where we make a call to Invoke
one last time. That makes up for the entire Expression
. I admit it takes some time to get used to, but once you get the hang of it actually makes sense.
So that was the inner method, now let's take a look at the method which catches the Exception
. And here we have a problem... When using Expression Trees
it is not possible to use a MethodBuilder
like as we did when using the ILGenerator
. The reason, appearently, is that a MethodBuilder
is still able to change. Why this restriction only goes for Expression Trees
I don't know, but it's a fact we have to live with. So instead of passing in a method I simply call the function that created the Expression Tree
and the Expression Tree
is neatly combined with the outer Expression Tree
when creating the method. So let's see what that looks like in code.
Perhaps this piece of code is a bit easier than the other function. What's notable in this example is that theExpression Tree
is wrapped in a LambdaExpression which can be compiled into the newly created method. Actually there are two option for compiling Expression Trees
. One is CompileToMethod which emits IL into theMethodBuilder
argument. Another way to compile an Expression Tree
is to call the Compile method which returns a delegate
that can be Invoked
right away. However, since we are using a TryFault
block this will throw an NotSupportedException
since TryFault
blocks are not supported in VB and C#.
Again, you can see how this method performs by opening the Expression tree form
from the Main form
. As you can now see in the StackTrace
of the Exception
only one method was created.
Now remember that we saved the created assembly to disk? You can find it in the bin folder of the startup project. Open it using ILDASM and check the emitted IL for both methods. It's exactly the same! Here is the IL for theTryFault
block for both the Emit
and the Expression Trees
example.
.try
{
IL_0000: ldarg.0
IL_0001: ldstr "Entered Emit method."
IL_0006: call instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_000b: ldarg.1
IL_000c: ldc.i4.1
IL_000d: ceq
IL_000f: brfalse IL_001f
IL_0014: ldstr "Well, that's it for you!"
IL_0019: newobj instance void [mscorlib]System.Exception::.ctor(string)
IL_001e: throw
IL_001f: ldarg.0
IL_0020: ldstr "Emit method finished successfully."
IL_0025: call instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_002a: leave IL_003b
} // end .try
fault
{
IL_002f: ldarg.0
IL_0030: ldstr "Emit method finished unsuccessfully."
IL_0035: call instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_003a: endfinally
} // end handler
IL_003b: leave IL_004c
} // end .try
.try
{
IL_0000: ldarg.0
IL_0001: ldstr "Entered Expression Tree method."
IL_0006: callvirt instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_000b: ldarg.1
IL_000c: ldc.i4.1
IL_000d: ceq
IL_000f: brfalse IL_001f
IL_0014: ldstr "Well, that's it for you!"
IL_0019: newobj instance void [mscorlib]System.Exception::.ctor(string)
IL_001e: throw
IL_001f: ldarg.0
IL_0020: ldstr "Expression Tree method finished successfully."
IL_0025: callvirt instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_002a: leave IL_003b
} // end .try
fault
{
IL_002f: ldarg.0
IL_0030: ldstr "Expression Tree method finished unsuccessfully."
IL_0035: callvirt instance void class [mscorlib]System.Action`1<string>::Invoke(!0)
IL_003a: endfinally
} // end handler
IL_003b: leave IL_004c
} // end .try
That does look pretty similiar! So IL can be emitted using Reflection.Emit
opcodes or Expression Trees
. Both methods have their pro's and cons. The con to Reflection.Emit
is obviously that you need lots of code to get things done and debugging is quite hard (but possible). The pro is that the sky is the limit, there is virtually nothing that can't be done with Emit
! The pro to Expression Trees
is that it is easier to understand and debug, especially when you write it out piece by piece (although you get nice IntelliSense support when nesting them). The cons are that it is actually not very well documented. Most pages to Expression Classes
on MSDN have no examples or even descriptions! Also, Expression Trees
have some limitations, as we experienced we could not make a call to a method that did not yet exist. Another limitation is that with Expression Trees
we can only generate Shared
(static
) methods. This may or may not be a problem of course.
Further reading:
While I have no real documentation on Expression Trees you could, like always, check out MSDN.
What you should read is chapter 6 of the book Metaprogramming in .NET
7. The curious case of F#
There are two more things I would like to take a look at. Functions as first class citizens and tail calls. Both are features of Microsofts functional programming language, F#.
7.1. The case of Functions as First Class Citizens
F# (and other functional languages) treats functions as first class citizens, which means that they can be passed as arguments to other functions, returned by functions, and stored in variables. Basically first-class functions are really just treated like any other variable such as in Integer
or a String
. VB and C# can sort of mimic this behaviour through delegates, but it's not quite the same. You can download the TheCuriousCaseOfFSharp
sample project at the top of this article. Open the solution and take a look at the code. The first thing you will see is the following.
// Functions as 'first class citizens'.
// This function returns a new function which takes a function as an argument.
let SomeFunc a b c d =
let newFunc f = f (a, b) + f (c, d)
newFunc
// Call SomeFunc passing in four integers and passing in an anonymous function
// that adds two integers to the function that is returned by SomeFunc.
let resultAdd = SomeFunc 1 2 3 4 (fun (a, b) -> a + b)
// Do the same as above, but multiply the integers.
let resultMult = SomeFunc 1 2 3 4 (fun (a, b) -> a * b)
// Print the results.
printfn "The result of adding: %d" resultAdd
printfn "The result of multiplying: %d" resultMult
Well, that goes pretty easy indeed! A function that returns a function that needs a function as argument. Notice the lambda's that are passed to the function that is returned from SomeFunc
. We already know this from VB and C#, but they got it from F#. So what do you think? Does the compiler simply create Shared
(static
) functions like we saw in the lambda examples earlier?
As you can see a new Type
is created for each function. The new types actually Inherits
from FSharpFunc(Of T, U) (FSharpFunc<T, U>
). Now whenever an FSharpFunc
is 'used as a value' the compiler generates a call to theInvoke method of the FSharpFunc
. I can't show the generated IL in here, because it would not fit the screen, however you can take a look for yourself. This is where you will want to look.
And this is what you should be looking out for.
IL_0074: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,class [FSharp.Core]Microsoft.FSharp.Core.Unit>::Invoke(!0)
I never called Invoke
in my code, so this is what the compiler does for me. And there you have it in a nutshell. Functions as first class citizens!
7.2. The case of Tails Calls
So let's take a look at tail calls. Functional languages excel in recursion. That is a method that may call itself. If you don't watch out you will get a StackOverflowException! That happens when the number of calls to the same functions reaches a certain amount. I'm not sure when, but it happens. Take a look at the following VB and C# code.
Any experienced programmer knows this function will cause a StackOverflowException
if it were called with input parameters that are not quite close to 10000000. So if we call it with input 1 our application will crash for sure. Well here's the deal, it won't in F#! Here is the F# code for the same function together with the calling code.
// Recursive 'tail call'. Eliminates the stack before calling a method.
let rec GetTenMillion i =
if i < 10000000
then GetTenMillion (i + 1)
elif i > 10000000
then GetTenMillion (i - 1)
else i
// This would throw a StackOverflowException in VB or C#!
GetTenMillion 1 |> printfn "GetTenMillion from 1: %d"
So what is this 'tail call'? Well, whenever the call to a recursive function is the last statement of that function then the call stack is cleared before the call is made. So in this case we can see that if i
is smaller than ten million we callGetTenMillion
again with i + 1
. In this example i + 1
is executed before the call to GetTenMillion
and nothing happens after that. This causes the call stack to clear. If, for example, we would add 1 AFTER the function executed it will be compiled as just another call on the stack and a StackOverflowException
may be thrown. So let's see some IL for this function.
.method public static int32 GetTenMillion(int32 i) cil managed
{
// Code size 41 (0x29)
.maxstack 4
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldc.i4 0x989680
IL_0007: bge.s IL_000b
IL_0009: br.s IL_000d
IL_000b: br.s IL_0014
IL_000d: ldarg.0
IL_000e: ldc.i4.1
IL_000f: add
IL_0010: starg.s i
IL_0012: br.s IL_0000
IL_0014: ldarg.0
IL_0015: ldc.i4 0x989680
IL_001a: ble.s IL_001e
IL_001c: br.s IL_0020
IL_001e: br.s IL_0027
IL_0020: ldarg.0
IL_0021: ldc.i4.1
IL_0022: sub
IL_0023: starg.s i
IL_0025: br.s IL_0000
IL_0027: ldarg.0
IL_0028: ret
} // end of method Program::GetTenMillion
Do you see that? Not a single call
opcode was emitted! What happens instead? If i
is smaller than ten million 1 is added to i
and we simply branch back to the beginning of the function. The same happens if i
is bigger than ten million, except 1 is subtracted from i
. Simple, but quite effective! Using Reflection.Emit
you could use this trick to make your own very deep recursive functions.
You can run the F# application and really see that 10000000 is printed and no StackOverflowException is thrown.
Further reading: A book that greatly helped me to understand at least some of F# is Expert F# 2.0 from Apress.
8. Afterword
Well, that certainly was A LOT of writing (for me) and reading (for you). As I said in the introduction this is not an easy subject. I hope I have made it as easy as possible and that you have enjoyed reading it as much as I've enjoyed writing it. Most of the stuff I've written down was new to me before I started writing so I can say I've learned A LOT and I hope you can say the same.
I would be happy to answer any questions or comments.
Happy coding!