Lazy and once-only C# async initialization
There are a couple of closely related optimizations that are often useful: lazy initialization, and "just once" initialization. These are conceptually pretty straightforward, although the details can get messy once multiple threads are involved. But what if the initialization work involves asynchronous operation?
Lazy initialization
Lazy initialization is an important performance technique. Putting off work until you know you need the results enables you to get on with more useful things in the meantime. And, once you've done the work, hanging onto the results may save you from repeating that work later. It's a pretty simple concept, but in multi-threaded environments it's surprisingly easy to get it wrong, and so .NET provides some helper types that handle the subtle race conditions for you: Lazy<T>
and LazyInitializer
. (The latter is more lightweight, but uses an optimistic concurrency policy that means it will occasionally execute your initialization code twice, and then discard one of the results. If you can't tolerate this, use Lazy<T>
.)
Just-once initialization
Lazy initialization incorporates a simpler idea that we could use in isolation: if you're going to do some expensive work, better to do it just once, instead of repeating it every time you need the results of that work. This idea is so simple that it might seem like it barely needs stating, but once you add in concurrency, it is the reason lazy initialization is surprisingly easy to get wrong.
Lazy<T>
gives us just-once initialization, but what if we only want the just-once behaviour, and don't require laziness? Perhaps there's some work that our program is inevitably going to need to perform, so there's no particular benefit in deferring it, but we still want the just-once behaviour.
That shouldn't be so hard, right? We could do that in the constructor:
public class TypeWithExpensiveOneTimeWork
{
private readonly ExpensivelyCalculatedResults data;
public TypeWithExpensiveOneTimeWork()
{
this.data = ExpensiveWorker.PerformSlowWork();
}
public string DoSomething(int input) => data.GetResult(input);
}
There is one objection to this: some take the view that constructors shouldn't do non-trivial work. However, I'm not going to be dogmatic about that. Instead, I want to look at a variation on this: what if the slow work we need to perform just once involves asynchronous operation?
Async just-once eager initialization
If PerformSlowWork
in the preceding example returned Task<T>
, the simple approach I just showed won't work. Constructors cannot be declared as async
(because they can't return a Task
—by definition they return an instance of the type they construct). We might be tempted to do this:
public class TerribleIdeaNeverEverDoThis
{
private readonly ExpensivelyCalculatedResults data;
public TerribleIdeaNeverEverDoThis()
{
this.data = ExpensiveWorker.PerformSlowWorkAsync().Result; // NOOOOOOOOOOOOOOOOO!
}
public string DoSomething(int input) => data.GetResult(input);
}
Don't do that.
In general, it's a really bad idea to retrieve the Result
of a Task<T>
unless you can be certain that the task has already completed (which it almost certainly won't have done in this example). There are a handful of exceptions to that rule, but they are specialized and tricky to get right. Unless you are in complete control of the context in which it runs, using Result
in the way shown above risks causing a deadlock. (Reading Result
on an unfinished task blocks the calling thread, and if the thread has ownership of anything required to complete the asynchronous work, that will prevent completion.)
However, there is a pretty simple way to get the same effect safely:
public class TypeWithExpensiveOneTimeAsyncWork
{
private readonly Task<ExpensivelyCalculatedResults> dataTask;
public TypeWithExpensiveOneTimeAsyncWork()
{
this.dataTask = ExpensiveWorker.PerformSlowWorkAsync(); // Note: no await
}
public async ValueTask<string> DoSomethingAsync(int input)
{
ExpensivelyCalculatedResults data = await dataTask.ConfigureAwait(false);
return data.GetResult(input);
}
}
Since in this scenario we know we will definitely need to perform the slow work, we don't need lazy behaviour, so we kick the work off immediately in the constructor. But we don't wait for it to finish there—we just store the resulting task. And then, any method that needs access to that expensive-to-obtain information can just await
that task.
This works because you are allowed to await
the same Task<T>
any number of times. (Note that the field has to be a Task<T>
, not a ValueTask<T>
. You're only allowed to await
a ValueTask<T>
once.) Calls to DoSomethingAsync
that occur before the expensive work is complete will block at the await
. If there are multiple concurrent calls to the method while we're in that state, that's fine, they'll all just block, and then when the expensive initialization completes, they will all become runnable simultaneously. (Whether they actually run concurrently at that point will typically be down to the task scheduler, which by default will defer to the thread pool.)
In a program that expects to call that DoSomethingAsync
method many times in succession, we would expect the first call to be slow (because it will have to wait for the expensive asynchronous initialization to complete) but subsequent calls will not have to wait because that data
task will have completed, so the await
won't need to wait. This is why I've made DoSomethingAsync
return a ValueTask<string>
. Asynchronous methods that you expect mostly not to need to block in practice are more memory-efficient if they return a ValueTask<string>
.
Async lazy initialization
The eager asynchronous initialization just shown is good if you know you're definitely going to need the results of the work, and are likely to need it as early as possible. But in cases where your code might not need the results at all (e.g., you're writing a command line tool, and only certain command line flags will trigger the behaviour that needs this particular data), then lazy initialization is a better bet.
These Lazy<T>
and LazyInitializer
types mentioned earlier do not offer any direct support for asynchronous code. That's essentially because they don't need to. You can just use Lazy<Task<T>>
, e.g.:
public class TypeWithExpensiveLazyAsyncWork
{
private readonly Lazy<Task<ExpensivelyCalculatedResults>> dataTaskSource;
public TypeWithExpensiveLazyAsyncWork()
{
this.dataTaskSource = new(() => ExpensiveWorker.PerformSlowWorkAsync());
}
public async ValueTask<string> DoSomethingAsync(int input)
{
ExpensivelyCalculatedResults data = await dataTaskSource.Value.ConfigureAwait(false);
return data.GetResult(input);
}
}
This avoids starting the expensive work until something asks for it. So this will give you "at most once" initialization—if the program never hits the code path that asks for the results of this expensive work, it will never be performed. But once something does ask, Lazy<T>
will ensure that it only runs the code that builds the Task<T>
once.
What if your organization tolerates failure?
Although Blofeld might not approve, sometimes it is necessary to tolerate failure. The problem with the async
techniques just shown is that if the initialization fails, you are stuck with Task<T>
that is in a faulted state, so every attempt to await
it will throw an exception. To be able to recover from this, you would need to be able to reset the field.
Here's one way you could do that:
public class TypeWithRetriableExpensiveLazyAsyncWork
{
private Lazy<Task<ExpensivelyCalculatedResults>> dataTaskSource;
public TypeWithRetriableExpensiveLazyAsyncWork()
{
this.dataTaskSource = InitializeDataTaskSource();
}
private Task<ExpensivelyCalculatedResults> Data
{
get
{
Task<ExpensivelyCalculatedResults> result = this.dataTaskSource.Value;
if (result.IsFaulted)
{
// Try one more time. If the underlying cause of the problem remains,
// this will also fail, but each subsequent attempt to get the data
// will kick off a new try.
result = this.InitializeDataTaskSource().Value;
}
return result;
}
}
private Lazy<Task<ExpensivelyCalculatedResults>> InitializeDataTaskSource()
{
return this.dataTaskSource = new(() => ExpensiveWorker.PerformSlowWorkAsync());
}
public async ValueTask<string> DoSomethingAsync(int input)
{
ExpensivelyCalculatedResults data = await this.Data.ConfigureAwait(false);
return data.GetResult(input);
}
}
In practice, error recovery behaviour is often application-specific, so you might need something more complex. But the basic idea of creating a new Lazy<Task<T>>
(or new Task<T>
if you need just-once but don't care about laziness) will remain.
Summary
There's no specific support in .NET for lazy or once-only initialization, but you don't need it. A field of type Lazy<Task<T>>
will do the job. And if you don't need the lazy part, you can get once-only async initialization by storing just a Task<T>
in a field.