Skip to content
Liam Mooney By Liam Mooney Apprentice Engineer II
An Overview of the Corvus.Retry Library

TLDR; In this post we look at Corvus.Retry, a C# library for handling transient faults. We get overview of the main features, see how to get started with using it, and see some examples demonstrating how to use it.

Introduction

The Corvus GitHub Organisation contains a series of open-source repositories that provide tools and support for common requirements such as tenancy storage, authentication and more. Corvus encapsulate endjin's recommended practices for building .NET applications. These practices are founded in the knowledge gained through over a decade of real world experience building .NET applications for our customers.

Each blog post in this series will focus on a single Corvus repository, outlining the repository's intended use and how to get started.

Overview of Corvus.Retry

In this blog post Corvus.Retry is under the spotlight. Corvus.Retry is a library for handling transient errors which are common when using code that communicates over the internet, for example. The library was written before Polly was invented.

Corvus.Retry vs Polly

Polly is a newer .NET library for building policies and strategies to handle faults. It offers lots of extensibility support for different design patterns not included in Corvus.Retry, like Circuit-breaker and Bulkhead isolation. This makes the API more complex and harder to use, whereas Corvus.Retry is relatively simple, with the extensibility support being around the strategy and policy.

In summary, Corvus.Retry is less feature full but much simpler and easier to use than Polly; if your requirements are captured by the functionality of Corvus.Retry, we think you may prefer it over Polly.

Corvus.Retry is built for netstandard2.0.

Features

Retry operations

There are three types of retry operation in this library:

Retriable

This is the most common usage pattern. It wraps an existing method, and allows you to re-execute the method if it throws an exception through Retriable.Retry() and Retriable.RetryAsync().

Retriable.Retry(() => DoSomethingThatMightFail(), strategy, policy);
Task resultTask = Retriable.RetryAsync(() => SomeOperationAsync(), strategy, policy);

RetryTask

Starting a task to execute a function asynchronously, with retry.

This is an analog of Task which has built-in retry semantics - it allows you to start a task to execute a function asynchronously, with retry semantics.

Task task = RetryTask.Factory.StartNew(() => DoSomethingThatMightFail(), strategy, policy);

ReliableTaskRunner

Allows you to wrap a long-running asynchronous operation in a host that ensures it continues to run even after a transient failure.

ReliableTaskRunner runner = ReliableTaskRunner.Run(cancellationToken => DoSomeOperationAsync(cancellationToken));

Policy and strategy

There are two types that help you to control when a failed operation is retried, and how that retry occurs: IRetryPolicy and IRetryStrategy.

You can implement your own versions of these, there are also a number of built-in implementations.

IRetryPolicy

The retry policy gives you the ability to determine whether the operation should be retried or not, based on the exception that has been thrown.

There are three built-in retry policies:

AnyException

This will always retry on any exception, and is the default for Retriable.

DoNotRetryPolicy

This will never retry, regardless of the exception. You use this to disable retry, without having to comment out your retry code.

AggregatePolicy

This gives you a means of ANDing together multiple policies. The AggregatePolicy only succeeds if ALL of its children succeed.

IRetryStrategy

The IRetryStrategy controls the way in which the operation is retried. It controls both the delay between each retry and the number of times that it will be retried.

Count

This simply tries the operation a maximum number of specified times, with no delay

DoNotRetry

This is the strategy equivalent of the DoNotRetryPolicy. It forces a retry to be abandoned, regardless of policy.

Linear

This tries a specified number of times, with a constant delay between each try.

For example, new Linear(TimeSpan.FromSeconds(1), 5) will try up to 5 times. The initial try will be immediate, each retry will be delayed by 1s. So the first retry will be after 1s (wall clock time 1s), the second after another 1s (wall clock time 2s), the third after another 1s (wall clock time 3s).

Incremental

This tries a specified number of times, with an arithmetically increasing delay between each retry.

For example, new(maxTries: 5, intialDelay: TimeSpan.FromSeconds(1), step: TimeSpan.FromSeconds(1)) will try up to 5 times. The initial try will be immediate, the delay between the initial try and the first retry will be one second. Each time it will increase the delay by 1s. So the first retry will be after 1s (wall clock time 1s), the second after another 2s (wall clock time 3s), the third after another 3s (wall clock time 6s).

This allows a slowly increasing delay.

Backoff

This tries a specified number of times, with a delay that increases geometrically between each retry.

For example, new Backoff(maxTries: 5, deltaBackoff: TimeSpan.FromSeconds(1)) will try up to 5 times. Each time it will increase the delay by a value calculated roughly like this: 2^n * (delta +/- a small random fudge), where n is the current number of total tries.

This allows a rapidly increasing delay, with a bit of random jitter added to avoid contention.

How to get started

Corvus.Retry is available on NuGet. To get started with Corvus.Retry, add a reference to the Corvus.Retry NuGet package in your project, by running the following command:

dotnet add package Corvus.Retry

How to use

Within the Corvus.Retry repository, there are a series of interactive notebooks (under the folder Corvus.Retry.Documentation.Examples) that provide a suite of examples on how to use the features of Corvus.Retry. You can pull down the repo and run the notebooks locally yourself.

The README file provides links to each of the notebooks along with animations of the code snippets in action. You can also click the notebook icon (πŸ““) in the following headers to get to the corresponding interactive notebook.

Examples

Retriable πŸ““

Basic example

As a basic example, you can ping an internet host and have it retry on detecting an exception.

The following example uses Retriable.RetryAsync() to ping the host this-internet-host-does-not-exist, it will retry upon detecting any exception, and will try up to a maximum of Three times (once for the initial try and a maximum of two retries).

CancellationTokenSource cancellationTokenSource = new();

Ping pingSender = new Ping();
Task<PingReply> task = Retriable.RetryAsync<PingReply>(
    () => pingSender.SendPingAsync("this-internet-host-does-not-exist"),
    cancellationTokenSource.Token,
    strategy: new Count(3),
    policy: new AnyExceptionPolicy());

Since the host, this-internet-host-does-not-exist, does not exist, the method will throw an error each time; after the final retry the exception is allowed to bubble-up.

RetryAsync ping example

You could cancel the operation using the cancellation token.

cancellationTokenSource.Cancel();

Mock HTTP service example

Transient faults are fairly common when using code that communicates over the internet; within that category, errors resulting from the the HTTP 429 - "Too many requests" - status code are common.

Let's imagine we're consuming an HTTP service which occasionally gives a 429 - Too Many Requests error. To model this scenario lets create a custom exception, MockHttpServiceException, and a MockHttpService class with a method, MakeLotsOfRequestsAsync(), that always throws a MockHttpServiceException with status code 429.

[Serializable]
public class MockHttpServiceException: Exception
{
    override public string Message { get; }
    public string StatusCode { get; }

    public MockHttpServiceException(string message, string statusCode)
    {
        this.Message = message;
        this.StatusCode = statusCode;
    }
}
public class MockHttpService
{
    public async Task MakeLotsOfRequestsAsync()
    {
        Console.WriteLine("MakeLotsOfRequests() method called");
        await Task.Delay(100);
        throw new MockHttpServiceException("429 - Too Many Requests", "429");
    }
}

Creating a custom policy

For this scenario you would want a custom policy that retries upon detecting a MockHttpServiceException with a status code of 429. Let's do that.

It is very simple to create your own custom policy, and you will frequently do so. You implement the IRetryPolicy interface, and its bool CanRetry(Exception exception); method.

public class RetryOnTooManyRequestsPolicy : IRetryPolicy
{
  public bool CanRetry(Exception exception)
  {
    return (exception is MockHttpServiceException httpException && httpException.StatusCode == "429");
  }
}

Now, you can run the MakeLotsOfRequestsAsync() with Retriable.RetryAsync(), passing in a RetryOnTooManyRequestsPolicy for the policy parameter, and a strategy for the strategy - let's go with the built in Linear.

MockHttpService mockHttpService = new();
await Retriable.RetryAsync(() => mockHttpService.MakeLotsOfRequestsAsync(), CancellationToken.None, strategy: new Linear(TimeSpan.FromSeconds(1), maxTries: 5), policy: new RetryOnTooManyRequestsPolicy());

This above will run MakeLotsOfRequestsAsync() a total of 5 times, with each try coming one second after the previous. Since the method will never succeed (it always throws an exception), the MockHttpServiceException exception will bubble-up after the 5th execution.

Custom IRetryPolicy

RetryTask πŸ““

You can use RetryTask.Factory.StartNew() just like you would Task.Factory.StartNew(), with the usual parameters in addition to policy and strategy parameters for configuring retry behaviour.

Mock HTTP service example

Continuing again with the mock HTTP service example from earlier, let's say the client doesn't offer asynchronous versions of its methods, but we want to run the operation asynchronously with retry behaviour so that we can do other work whilst that operation runs. We can use RetryTask.Factory.StartNew() to run the method, do some other work, then wait for the Task to complete.

public class MockHttpService
{
    public void MakeLotsOfRequests()
    {
        Console.WriteLine("MakeLotsOfRequests() method called");
        Thread.Sleep(1000);
        throw new MockHttpServiceException("429 - Too Many Requests", "429");
    }
}
MockHttpService mockHttpService = new();
Task retryTask = RetryTask.Factory.StartNew(() => mockHttpService.MakeLotsOfRequests(), CancellationToken.None, strategy: new Count(5), policy: new RetryOnTooManyRequestsPolicy());
DoSomeWork();
retryTask.Wait();

RetriableTaskRunner πŸ““

You use RetriableTaskRunner when you need something to keep running until explicitly cancelled, which can be done with RetriableTaskRunner.StopAsync().

Let's model such a scenario with a method containing an infinite loop, and cancel the operation some time later.

Example - an operation that runs forever

class ModelALongRunningProcess
{
    public static async Task ExecuteALongRunningTransientProcess(CancellationToken cancellationToken)
    {   
        Console.WriteLine("ExecuteALongRunningTransientProcess method executed");
        int counter = 0;
        while (true)
        {
            await Task.Delay(100);
            int quotient = Math.DivRem(counter, 10, out int remainder);
            // Throw an exception every ten times around the loop
            if (remainder == 0 && quotient > 0){ throw new Exception(); }
            counter++;
        }
    }
}
CancellationTokenSource cancelTokenSource = new CancellationTokenSource();
CancellationToken token = cancelTokenSource.Token;

ReliableTaskRunner runner =  ReliableTaskRunner.Run(token => ModelALongRunningProcess.ExecuteALongRunningTransientProcess(token));
Thread.Sleep(5000); // Wait some time before cancelling operation
Task runnerTask = runner.StopAsync();
// Check to see if the task returned is faulted
Console.WriteLine($"Is runner task faulted:\t{runnerTask.IsFaulted}");
await runnerTask;

The method, ExecuteALongRunningTransientProcess(), pauses for 100ms each time around the loop, and every ten times around throws an exception. The code above initiates the operation then waits for 5 seconds, the method is therefore executed a total of ten times before being cancelled.

The IsFaulted property on the Task returned by runner.StopAsync() has a value of false the example above, this is because the method never returned or errored in a way that didn't cause a refresh. ReliableTaskRunner is for operations that are supposed to run forever, so if it completes in a way that doesn't cause a retry, the IsFaulted on the Task returned by runner.StopAsync() is set to true. Let's see an example of that.

As you can sse, Corvus.Retry is useful collection of functionality for handling transient errors. It's a library that works well and is used by other endjin-owned open source libraries. If you have any thoughts on better ways to approach particular problems, or related tools you'd like to see, then we'd like to hear them – please comment below, or in the issues section of the project.

Liam Mooney

Apprentice Engineer II

Liam Mooney

Liam studied an MSci in Physics at University College London, which included modules on Statistical Data Analysis, High Performance Computing, Practical Physics and Computing. This led to his dissertation exploring the use of machine learning techniques for analysing LHC particle collision data.

Before joining endjin, Liam had a keen interest in data science and engineering, and did a number of related internships. However, since joining endjin he has developed a much broader set of interest, including DevOps and more general software engineering. He is currently exploring those interests and finding his feet in the tech space.