C# Source Generators Boost Productivity in the Rx.NET Repo | endjin

Ian Griffiths 19th November 2024

.NET Conf 2024

Code generation has become increasingly important in recent versions of .NET, especially when using AOT compilation.

But they can also play a role in boosting your power as a developer. This talk will show how we've been using source generators to extend the reach of the test suite in the Reactive Extensions for .NET. AsyncRx.NET was an experimental project with no tests, but source generators have made it possible to make the existing Rx.NET test suite work in this new project.

Endjin are proud to be a .NET Foundation Corporate Sponsor, as we are maintainers of Reactive Extensions for .NET (AKA ReactiveX AKA Rx.NET) which is one of the most well established and widely used open source .NET projects.

If you're interested in learning more about Rx.NET download the free Introduction to Rx.NET book.

Transcript

Okay, thank you everyone for attending this talk on how C# source generators are going to help boost productivity in the Rx.NET repo. My name is Ian Griffiths, I am a Technical Fellow at endjin, and I'm the author of O'Reilly's Programming C# book, and I am the lead maintainer of the Rx.NET project. So I'm going to start by talking about exactly what we're Where and why we are using source generators, which means I'm going to talk about AsyncRx.NET, which is, and the most distinctive feature of this generator is that it's an application specific source generator, which is not the obvious way to use this technology, but so we think it's actually a really interesting thing to do with source generators, and then I'm going to talk about AsyncRx.

NET. The significance of performance in source generators because that is really important. So just to set the scene, I'm going to talk about the Rx.NET project and the specific part of that project where we're using source generators, and that is a feature called async Rx.NET. So just in case you've not heard of it, Rx.NET, or to use its full name, the Reactive Extensions for NET, is a library for event driven programming. Microsoft originally created it over 15 years ago, then they open sourced it back in 2012. And it defines basic abstractions for working with events, making it useful in any program where things happen. And that's such a fundamental and useful idea that Rx has become popular outside of Microsoft's world, especially actually in JavaScript. But it all started in NET. Now, although Rx.NET began as a Microsoft project, it's been community supported for over a decade. So the NET Foundation provides some support, but there are no Microsoft employees working on it for years now.

Now, about four years ago, all the maintainers moved on to other things, and work actually kind of stopped for two years, but my employer, endjin, offered to take over at the start of 2023, which is why I'm now the lead engineer. Maintainer. Now my topic for today is the use of the source generator in rx.net and this is part of our overall project for revitalizing Rx. The first phase was just bringing everything up to date with tooling because when we took over it didn't actually build on current versions of Visual Studio. We then moved on to fixing A few major outstanding problems, but the most interesting work is the new stuff.

So we published a new free online book, a new edition of that, Intro to Rx, originally written by Lee Campbell. We've updated that. It's freely available. We're working on some additional operators. As you'd expect, we've made sure that Rx works with NET 9, but the main focus for today. is this async Rx feature. So, when Rx was invented, C# didn't actually have the async and await keywords. And so Rx has a blocking design. If you write a handler for an Rx event source, and if you need to do slow work, There isn't a great way to do that. Either you have to resort to sync over async, which is problematic, or you have to return back from your callback before you're done processing, which can cause some different problems. And it'd be much better if we could just use the await keyword inside of an Rx handler, and you can't do that with standard Rx. Now when we took control of the Rx.NET codebase, there was an experimental implementation of exactly such an async. version of Rx so the code was there but it hadn't ever been published so one of the things we've done is make previews of that available on NuGet but it's labelled as an alpha release because it's not yet ready for production use and the reason for that Ties into our use of source generators.

The actual reason is there's a lack of unit tests for this. And this graph sort of illustrates the problem. On the left, you can see we've got over 7, 000 unit tests for ordinary Rx.NET and over on the right, you can see that async rx is lagging behind just a little bit with a grand total of. No tests at all. Yes, unfortunately, we don't have any tests right now, and we know this causes problems. There are some basic mistakes in the code that would easily have been picked up by a simple test suite. So we don't want to support async Rx without a comprehensive test suite, and I am getting to source generators, I'll get there very soon.

Now the frustrating thing is we've very nearly got what we need. The existing Rx.NET test suite captures everything that an Rx implementation actually needs to do. Nearly all the things that test suite checks about ordinary Rx are things we would also want to check with async Rx.NET. And you can see this if I show you this existing So on the left here, we've got a test from ordinary rx.net. It's testing the select operator. It's basically checking that if we chain together a couple of select operators and feed in specific inputs at specific times, we get the expected outputs at the expected times. All of our tests use virtual time, by the way, so that we're not affected by actual execution speed.

So we can run them. Quickly, so these timings are all kind of artificial, but they enable us to verify that the timing behaviour is what we need. Now, let's look at how this looks in an async world. There's not a lot of difference here. I'll just call out the differences. The actual test method itself has become asynchronous, so it returns a task and is matched with the async method. The schedule type that we use to do this virtualized timing is a different type because Async Rx uses different scheduling internally and so when we kick that off we have to do an await. But that's basically it. Everything else remains exactly the same. So it's clear that it doesn't take a lot of change to get from the test suite that we have to the test suite that we need the Async Rx.

So what we're going to do is take all 7, 300 of those tests and make the necessary modifications to make them async. Now we could Do that by hand. We could do that manually, but with the number of tests that we've got, that will be a slow process. But also, it means anytime we add a new test, we've got to write two of them, and that doesn't seem good. But what if we could automate the conversion? Then we could use the test suite we already have, and anytime we write a new test, it will just work for both the ordinary and the async versions of Rx. Now, when we first put this idea to the Rx developer community, we had a variety of responses. Some people declared that it was impossible, that we disagreed with that.

Some people suggested that we could solve this with regular expressions. But as the saying goes, well, then we'd have two problems. And in fact, it actually wouldn't work. Sometimes things are more complex than that slide I just showed you. Sometimes we have to convert a normal lambda into an async one, but only when it's passed as an argument to certain methods. And there's just no sensible way to handle that kind of thing with regular expression. Also, regex is Pretty horrible. So frankly, we'd be glad to be able to rule it out. Now, as it happens, endjin, my employer, doesn't just maintain Rx, we also maintain a project called Reaqtor with a Q, which is a service for hosting persistent, reliable, long running Rx queries.

Now Reaqtor already contains code for converting between synchronous and asynchronous Rx queries. So that's job done, right? We already have the technology. However, Reaqtor works using expression trees, and these are the part of Link that enables C# expressions to be rewritten in a data query language like SQL. Now expression trees are immensely powerful, and not least because they make it possible to automate exactly this kind of code transformation that we need. However, there is one major problem. They have fallen behind modern C#. Expression trees were introduced over 15 years ago and have barely changed since, and lots of basic language features are just not allowed inside expression trees, so they don't support the await keyword, for example.

Now, hold on. Didn't I just say that Reaqtor already has code for rewriting asynchronous queries? Well, how does that work if we can't use await? Well, it does do this, but to make it work, the Reaqtor asynchronous code means you have to avoid using await anywhere in your queries. And it's possible to work with that, but it's really not ideal. You want to be able to use the language's built in async features if you're doing async. Really, so for some time, we at endjin were trying to encourage Microsoft to bring expression trees up to date, but it's become clear that this is never going to happen, and that's partly because there's this thing called Roslyn.

So Roslyn is the C# compiler API, and it is capable of providing a complete description of any C# code. And part of Microsoft's rationale for abandoning expression trees is that Roslyn already does a perfect job. of representing C# code. So why would Microsoft maintain a second API that can only ever be a more limited alternative? So, In short, problems that you might once have solved with expression trees are now better solved with Roslyn, and so that's what we're doing. And more specifically, we're plugging into Roslyn as a source generator. Now, to quickly explain how these things work, a source generator is just a NET assembly.

It's a library that the compiler loads into. During compilation. Now, when it's compiling the code, the compiler first looks at all of your source files and tries to understand them. But then it says, are there any source generators registered? And if there are, it loads them up and provides them with access to all of the analysis that it has already performed. So all of the work that it has done to understand the source codes in the project becomes available. to the source generator. And this, by the way, is one of the big advantages of being a source generator is that you are guaranteed that your code will see the source in exactly the same way that the compiler understands it.

You're guaranteed to get exactly the same context that is actually being used to compile your code for real. Now our source Generator has access to all this information, and it can generate files which will then be fed back to the C# compiler, so you get to add extra code to the project. That's what a source generator is. Now, most of these things are pretty general purpose. If you look, for example, in the NET SDK, there are generators that can create code to evaluate regular expressions, a very widely used technology. There is also one that generates code for handling JSON serialization, and again, that's a very broadly used capability.

So these are all widely used. They're integrated into the SDK. So for these ones, it's really important that the generator performs well. Because it's surprisingly easy to write a source generator that destroys developer productivity. Every time you press a key in a C# source file in your editor, every time you type, the compiler might run again so that it can supply you with diagnostics or suggestions immediately. And source generators can end up running as part of that. And that could slow things down to the point of degrading the productivity. Let's see. of the development environment, and that would be really bad. So it's important these things perform well. However, the thing I'm talking about today is actually slightly different.

I am using the source generator mechanism, but we're not writing a general purpose thing that's going to be used by lots of people. We don't put this source generator in a NuGet package. It's entirely local to the RxCode base. You can do this. You can write a generator that's just a project in your solution. So this only affects developers who are actually working on Rx. If you're just using Rx, you won't get our code generator because we don't make it available. So we can make trade offs between the performance of our generator and versus the productivity enhancements it offers without imposing those choices on anyone else.

Of course, performance still matters. We don't want working on Rx to be a horrible experience, but it is slightly less critical than it would be if we thought anyone else was ever going to use this generator. So this is a slightly Off label usage of source generators. Our code generator exists purely for the benefits of our own project and nobody else, but we think this kind of application specific source generator is an interesting and important use case. Now to be clear, there are other ways we could have done this. we could actually have written a standalone console application that uses Roslyn directly, but and this is work in progress by the way. I think I'm going to show you we haven't shipped it yet. We're not done yet, so it's also possible we'll change our mind.

But we have found there are advantages, as I described earlier, to being a source generator. You get to see the code precisely as the real thing. RealSystem does. Okay, so, I should show you this, shouldn't I? Let me go into Visual Studio. So, I should show you this, shouldn't I? Let me go into Visual Studio. So on the screen here you can see on the left the exact same unit test that I showed you on the slide earlier. So this is literally coming from the Rx.NET code base. This is actually from the repo that My clone of the repo on my machine for that. And on the right we have the asynchronous version of that, and you can see the changes. You can see that the, the method has turned async. You can see that the, the scheduler type is slightly different. And I just wanna show you that this is kind of all hooked up if I change one of these expressions here.

So if instead of adding one in the first value, I add, let's say 15, and I then build the project. So this is set up right now so that my code generator, my source generator only runs during a build. So the build has to run before we'll see it actually go through. And my machine has now decided to be slow because it knows I'm doing a demo. Okay, there we go. You can see on the right there that 15 has popped up like you would hope. And if I put that back and maybe add a second select call. The X goes to. X times two, and I build that again, if I build it again, but without a syntax error, then you should see, within a few seconds after it's finished the build, it should appear over here.

On the right, there we go. So this is definitely being generated. I only had to write the test once and any changes I make over here are, thanks to my source generator, going to be reproduced over here, but with whatever changes are necessary to make it asynchronous. So this is the basic idea. We're going to have all 7,000 tests. Actually, we can't do literally all of them because there's some feature disparity between the two libraries, but essentially all the tests are going to be duplicated by our source generator. So let's take a look at the source generator and see how it works. So the heart of the code here. is this class here called TestMethodToAsyncRewriter.

And it derives from a Roslyn type called C# Syntax Rewriter. And the basic idea of this is a rewriter is able to take some piece of C# syntax, it might be a whole class, it might just be a method, it might be an expression, and it can change it somehow. Now let me show you how we are actually using this. So if I go to the place where I instantiate this, this is a little bit of a mess, but, let me find a place where I use it. So this is in the middle of my actual source generator. You can tell by the look of it that it's a work in progress, but it is real. So Just for demo purposes. If I now run this, and if I go find this window, you can see it's actually launched the C# compiler.

This is genuinely the compiler. Let's bring that back in there. the same one that compiles all your code, and this is set up so I can debug my source generator inside the C# process. If you look out there, you can see I'm in the C# compiler process, but this is my code. Now, What's going to happen here is, because I'm a source generator, I get access to all the code, and I've written this to find all the test methods.

So if I look at this original method thing here, and if I just call toString on it, we can see that this is, some bit of C# code. So it's, it's a test method, and we want to rewrite this to be async. So this is the original. Synchronous version from normalRx.NET. It's not using the async scheduler. And if I close this and then just step over this, this is using my method asyncifier. And if I now look at the result that came out of that, and if I call ToString() on that and show the code, then you can see it. It's just rewriting the body so we don't actually see the outside method declaration here, but you can see it's using the async scheduler type. The call to start async is now using a wait. And that's about it for this one, but if there were more things going on, oh actually all of these things here have now returned a value to us because all these callbacks have to be asynchronous, but they weren't, they weren't written.

In that way, and so they had to return a value task in order to compile successfully, they all had to be awaited. So it's done all of the work to turn it into the async version of the code. So let's take a look at how that actually works. I'm going to show you a specific piece of the rewriter here. So when you derive a type for C# Syntax Rewriter, you get to override various methods. So essentially when we ask this thing to rewrite an expression, Roslyn is going to walk through the entire structure of the code. And anytime we've overridden a one of these methods, it will say, Oh, you want to know about assignment expressions, do you? All right, I will invoke your, visit assignment expression every time we find an assignment statement in the code.

So if I just break point this and let it run a bit further and clear that out, it's hit that break point. And if I look at this node, Then, let's do two string on that again. This id equals x. Yeah, that looks plausibly like an assignment expression. Now, why do I care about assignment expressions? Well, I've got this method here in my rewriter that says, try replacing disposable assignment with assignAsync. What is that about? Well, let me find a test for which this is relevant. If I go into select test here, and I find, where is it? Dispose inside selector. Here, we're using one of, Rx is disposable types. This is a bunch of utilities to implement disposable. Serial disposable has this property, and every time you assign a new disposable into it, it says, did we already have one? If we do, we'll just dispose that one, and this becomes the new one. So it kind of lets you use one disposable after another, making sure you've disposed them each time round.

But for async purposes, this is no good, because this might dispose the existing disposable. Okay. And if you're in an async world, well that's going to be an IAsyncDisposable, and you're going to have to do an await on the disposeAsync method, so you can't use an assignment. If I find the rewritten version of this test, so this is my async code from the code generator, it actually has to turn into this thing here. The serial asyncDisposable, provided by asyncRx.NET, requires assignment to be done by doing a method invocation so that you can await it. And so you can see if I get these side by side again, you can see that it's this code here. This is the whole of the assignment statement. It's been turned into. a call to assign async with basically the same code on the inside, although it's obviously had to rewrite the call to dispose as a call to await dispose async. So this is what we're attempting to do.

This is the logic being, being executed right now. Obviously you can see the generated output from the last time it ran, but that's actually what, this, TryReplaceDisposable, AssignmentWithAsync, AssignMethod is false. Let me just quick, I've got a breakpoint there. If I let this run, we will actually hit one of these. So this thing says, well, is it one we're interested in? Is this actually assigning into a thing called disposable? Is it using one of the types that we know requires this special handling? And if it is, we go, right, fine. We need to work out, what thing is being assigned into. So that's actually the d variable, the d that has the disposable in it. And we're going to look at the thing being assigned in. So this expression here is basically the right hand side of the expression to be assigned. So if we take a look at that, you can see that was the thing being assigned into the disposable. So it's a call to subscribe, it returns dispose.

So what we do in our code is we, first of all, we actually rewrite that. We say, let's just get the asyncified version of that. And now if I look at this, basically the same code, but you can see it's turned the dispose into an await dispose async. It's turned the subscribe into a subscribe async, which also needs to be awaited. So that's the thing we were going to assign into the serializable, except this has to be turned into an invocation of the assign async method. So we actually rewrite it as that, so this is no longer going to be an assignment expression. Because assignment expressions can't work in this scenario because it needs to be awaitable.

And so you can see we've now got our call to assignAsync with the thing that used to be assigned as a property. And then of course we've got to wrap all of that in an await. So that's what this thing does here, that wraps it as an await expression. And so So that's that. That's now rewritten the thing. So basically, the whole of our, rewriter just looks like a bunch of methods like this. So, if I come back out a level, you can see down here, we have, We see we've asked Roslyn to say, tell us anytime you see an identifier, and we work out, is that a type that we recognize and know needs to be rewritten to a different type?

So if I come in here, you can see I've got a big list that says, well, if it's test scheduler, it becomes test scheduler async. If it's, Observable becomes async observable and so on. And we also got a bunch of generic substitutions as well. So essentially, that's how this thing works. Roslyn is going to describe the entire test method to us one piece at a time, and we can just override methods saying, yeah, we need to be able to rewrite generic names, type names, member access expressions in various ways, and we can define And that enables us to automate the process of rewriting the test methods to be async.

Okay, so I'm nearly out of time, so let me go back to my slides. Just to remind you, if you are thinking of using this. I'm hoping that's come. Then my browser's frozen again. Hopefully you can still hear me. I'm going to wait till it catches up. Okay, that seems to have caught up. Right, so, if you're going to do this sort of thing, performance is critical because by default you might run as part of every single time the compiler executes, including every keypress. Now actually, Visual Studio has just changed the default on that to make this less of an issue, but you need to be and you need to look at the newer source generator interfaces to make sure that your, generator can run as efficiently as possible. but. We think this is a very powerful technique.

Writing code with code. It's a thing we've been doing in various ways in engine for a long time. It's a very powerful technique. It lets you express an idea once and then apply it any number of times over. So I only had to work out that the disposable assignment needs to be replaced with that await assign async method call. Once and then it automatically works any number of times. Every single place in the code where that substitution is required just happens straight away automatically now. They can be quite mind bending to write because you're writing some code that reads some code and generates some code and it's very easy to forget what you're doing.

But, once you get used to it. It's a great technique. Roslyn was the critical enabling generator for this. And you don't have to write source generators, you can talk to Roslyn directly, but, like I say, there are advantages to being integrated into the real build process because you know you're seeing the code precisely as the build system understands it. And the heart of this is syntax tree transformation.

Okay, so don't forget to download NET 9. here are a bunch of resources.

This slide deck will be available for download later, so you can go and find them.

And meanwhile, thank you very much for listening!