An Overview of the Corvus.Retry Library
TLDR; In this post we look at Corvus.Retry, a C# library for handling transient faults. We get overview of the main features, see how to get started with using it, and see some examples demonstrating how to use it.
Introduction
The Corvus GitHub Organisation contains a series of open-source repositories that provide tools and support for common requirements such as tenancy storage, authentication and more. Corvus encapsulate endjin's recommended practices for building .NET applications. These practices are founded in the knowledge gained through over a decade of real world experience building .NET applications for our customers.
Each blog post in this series will focus on a single Corvus repository, outlining the repository's intended use and how to get started.
Overview of Corvus.Retry
In this blog post Corvus.Retry is under the spotlight. Corvus.Retry is a library for handling transient errors which are common when using code that communicates over the internet, for example. The library was written before Polly was invented.
Corvus.Retry vs Polly
Polly is a newer .NET library for building policies and strategies to handle faults. It offers lots of extensibility support for different design patterns not included in Corvus.Retry, like Circuit-breaker and Bulkhead isolation. This makes the API more complex and harder to use, whereas Corvus.Retry is relatively simple, with the extensibility support being around the strategy and policy.
In summary, Corvus.Retry is less feature full but much simpler and easier to use than Polly; if your requirements are captured by the functionality of Corvus.Retry, we think you may prefer it over Polly.
Corvus.Retry is built for netstandard2.0.
Features
Retry operations
There are three types of retry operation in this library:
Retriable
This is the most common usage pattern. It wraps an existing method, and allows you to re-execute the method if it throws an exception through Retriable.Retry()
and Retriable.RetryAsync()
.
Retriable.Retry(() => DoSomethingThatMightFail(), strategy, policy);
Task resultTask = Retriable.RetryAsync(() => SomeOperationAsync(), strategy, policy);
RetryTask
Starting a task to execute a function asynchronously, with retry.
This is an analog of Task
which has built-in retry semantics - it allows you to start a task to execute a function asynchronously, with retry semantics.
Task task = RetryTask.Factory.StartNew(() => DoSomethingThatMightFail(), strategy, policy);
ReliableTaskRunner
Allows you to wrap a long-running asynchronous operation in a host that ensures it continues to run even after a transient failure.
ReliableTaskRunner runner = ReliableTaskRunner.Run(cancellationToken => DoSomeOperationAsync(cancellationToken));
Policy and strategy
There are two types that help you to control when a failed operation is retried, and how that retry occurs: IRetryPolicy
and IRetryStrategy
.
You can implement your own versions of these, there are also a number of built-in implementations.
IRetryPolicy
The retry policy gives you the ability to determine whether the operation should be retried or not, based on the exception that has been thrown.
There are three built-in retry policies:
AnyException
This will always retry on any exception, and is the default for Retriable
.
DoNotRetryPolicy
This will never retry, regardless of the exception. You use this to disable retry, without having to comment out your retry code.
AggregatePolicy
This gives you a means of ANDing together multiple policies. The AggregatePolicy
only succeeds if ALL of its children succeed.
IRetryStrategy
The IRetryStrategy
controls the way in which the operation is retried. It controls both the delay between each retry and the number of times that it will be retried.
Count
This simply tries the operation a maximum number of specified times, with no delay
DoNotRetry
This is the strategy equivalent of the DoNotRetryPolicy
. It forces a retry to be abandoned, regardless of policy.
Linear
This tries a specified number of times, with a constant delay between each try.
For example, new Linear(TimeSpan.FromSeconds(1), 5)
will try up to 5 times. The initial try will be immediate, each retry will be delayed by 1s. So the first retry will be after 1s (wall clock time 1s), the second after another 1s (wall clock time 2s), the third after another 1s (wall clock time 3s).
Incremental
This tries a specified number of times, with an arithmetically increasing delay between each retry.
For example, new(maxTries: 5, intialDelay: TimeSpan.FromSeconds(1), step: TimeSpan.FromSeconds(1))
will try up to 5 times. The initial try will be immediate, the delay between the initial try and the first retry will be one second. Each time it will increase the delay by 1s. So the first retry will be after 1s (wall clock time 1s), the second after another 2s (wall clock time 3s), the third after another 3s (wall clock time 6s).
This allows a slowly increasing delay.
Backoff
This tries a specified number of times, with a delay that increases geometrically between each retry.
For example, new Backoff(maxTries: 5, deltaBackoff: TimeSpan.FromSeconds(1))
will try up to 5 times. Each time it will increase the delay by a value calculated roughly like this: 2^n * (delta +/- a small random fudge)
, where n
is the current number of total tries.
This allows a rapidly increasing delay, with a bit of random jitter added to avoid contention.
How to get started
Corvus.Retry is available on NuGet. To get started with Corvus.Retry, add a reference to the Corvus.Retry NuGet package in your project, by running the following command:
dotnet add package Corvus.Retry
How to use
Within the Corvus.Retry repository, there are a series of interactive notebooks (under the folder Corvus.Retry.Documentation.Examples) that provide a suite of examples on how to use the features of Corvus.Retry. You can pull down the repo and run the notebooks locally yourself.
The README file provides links to each of the notebooks along with animations of the code snippets in action. You can also click the notebook icon (π) in the following headers to get to the corresponding interactive notebook.
Examples
Retriable π
Basic example
As a basic example, you can ping an internet host and have it retry on detecting an exception.
The following example uses Retriable.RetryAsync()
to ping the host this-internet-host-does-not-exist
, it will retry upon detecting any exception, and will try up to a maximum of Three times (once for the initial try and a maximum of two retries).
CancellationTokenSource cancellationTokenSource = new();
Ping pingSender = new Ping();
Task<PingReply> task = Retriable.RetryAsync<PingReply>(
() => pingSender.SendPingAsync("this-internet-host-does-not-exist"),
cancellationTokenSource.Token,
strategy: new Count(3),
policy: new AnyExceptionPolicy());
Since the host, this-internet-host-does-not-exist
, does not exist, the method will throw an error each time; after the final retry the exception is allowed to bubble-up.
You could cancel the operation using the cancellation token.
cancellationTokenSource.Cancel();
Mock HTTP service example
Transient faults are fairly common when using code that communicates over the internet; within that category, errors resulting from the the HTTP 429 - "Too many requests" - status code are common.
Let's imagine we're consuming an HTTP service which occasionally gives a 429 - Too Many Requests
error. To model this scenario lets create a custom exception, MockHttpServiceException
, and a MockHttpService
class with a method, MakeLotsOfRequestsAsync()
, that always throws a MockHttpServiceException
with status code 429
.
[Serializable]
public class MockHttpServiceException: Exception
{
override public string Message { get; }
public string StatusCode { get; }
public MockHttpServiceException(string message, string statusCode)
{
this.Message = message;
this.StatusCode = statusCode;
}
}
public class MockHttpService
{
public async Task MakeLotsOfRequestsAsync()
{
Console.WriteLine("MakeLotsOfRequests() method called");
await Task.Delay(100);
throw new MockHttpServiceException("429 - Too Many Requests", "429");
}
}
Creating a custom policy
For this scenario you would want a custom policy that retries upon detecting a MockHttpServiceException
with a status code of 429
. Let's do that.
It is very simple to create your own custom policy, and you will frequently do so. You implement the IRetryPolicy
interface, and its bool CanRetry(Exception exception);
method.
public class RetryOnTooManyRequestsPolicy : IRetryPolicy
{
public bool CanRetry(Exception exception)
{
return (exception is MockHttpServiceException httpException && httpException.StatusCode == "429");
}
}
Now, you can run the MakeLotsOfRequestsAsync()
with Retriable.RetryAsync()
, passing in a RetryOnTooManyRequestsPolicy
for the policy
parameter, and a strategy for the strategy
- let's go with the built in Linear
.
MockHttpService mockHttpService = new();
await Retriable.RetryAsync(() => mockHttpService.MakeLotsOfRequestsAsync(), CancellationToken.None, strategy: new Linear(TimeSpan.FromSeconds(1), maxTries: 5), policy: new RetryOnTooManyRequestsPolicy());
This above will run MakeLotsOfRequestsAsync()
a total of 5 times, with each try coming one second after the previous. Since the method will never succeed (it always throws an exception), the MockHttpServiceException
exception will bubble-up after the 5th execution.
RetryTask π
You can use RetryTask.Factory.StartNew()
just like you would Task.Factory.StartNew()
, with the usual parameters in addition to policy
and strategy
parameters for configuring retry behaviour.
Mock HTTP service example
Continuing again with the mock HTTP service example from earlier, let's say the client doesn't offer asynchronous versions of its methods, but we want to run the operation asynchronously with retry behaviour so that we can do other work whilst that operation runs. We can use RetryTask.Factory.StartNew()
to run the method, do some other work, then wait for the Task
to complete.
public class MockHttpService
{
public void MakeLotsOfRequests()
{
Console.WriteLine("MakeLotsOfRequests() method called");
Thread.Sleep(1000);
throw new MockHttpServiceException("429 - Too Many Requests", "429");
}
}
MockHttpService mockHttpService = new();
Task retryTask = RetryTask.Factory.StartNew(() => mockHttpService.MakeLotsOfRequests(), CancellationToken.None, strategy: new Count(5), policy: new RetryOnTooManyRequestsPolicy());
DoSomeWork();
retryTask.Wait();
RetriableTaskRunner π
You use RetriableTaskRunner
when you need something to keep running until explicitly cancelled, which can be done with RetriableTaskRunner.StopAsync()
.
Let's model such a scenario with a method containing an infinite loop, and cancel the operation some time later.
Example - an operation that runs forever
class ModelALongRunningProcess
{
public static async Task ExecuteALongRunningTransientProcess(CancellationToken cancellationToken)
{
Console.WriteLine("ExecuteALongRunningTransientProcess method executed");
int counter = 0;
while (true)
{
await Task.Delay(100);
int quotient = Math.DivRem(counter, 10, out int remainder);
// Throw an exception every ten times around the loop
if (remainder == 0 && quotient > 0){ throw new Exception(); }
counter++;
}
}
}
CancellationTokenSource cancelTokenSource = new CancellationTokenSource();
CancellationToken token = cancelTokenSource.Token;
ReliableTaskRunner runner = ReliableTaskRunner.Run(token => ModelALongRunningProcess.ExecuteALongRunningTransientProcess(token));
Thread.Sleep(5000); // Wait some time before cancelling operation
Task runnerTask = runner.StopAsync();
// Check to see if the task returned is faulted
Console.WriteLine($"Is runner task faulted:\t{runnerTask.IsFaulted}");
await runnerTask;
The method, ExecuteALongRunningTransientProcess()
, pauses for 100ms each time around the loop, and every ten times around throws an exception. The code above initiates the operation then waits for 5 seconds, the method is therefore executed a total of ten times before being cancelled.
The IsFaulted
property on the Task
returned by runner.StopAsync()
has a value of false
the example above, this is because the method never returned or errored in a way that didn't cause a refresh. ReliableTaskRunner
is for operations that are supposed to run forever, so if it completes in a way that doesn't cause a retry, the IsFaulted
on the Task
returned by runner.StopAsync()
is set to true
. Let's see an example of that.
As you can sse, Corvus.Retry is useful collection of functionality for handling transient errors. It's a library that works well and is used by other endjin-owned open source libraries. If you have any thoughts on better ways to approach particular problems, or related tools you'd like to see, then we'd like to hear them β please comment below, or in the issues section of the project.