Exposing legacy batch processing code online using Azure Durable Functions, API Management and Kubernetes
TL;DR Via a combination of Kubernetes, Azure Durable Functions and Azure API Management, we can easily make legacy batch processing code available as a RESTful API.
Background
As part of a recent project, we were tasked with making a legacy batch analysis library publically available via an API. The context was a migration from on-premise infrastructure to cloud-based, and this part of it was about getting a quick win to allow time and space for more fundamental change under the covers. The mid-to-long term goal was that the library would be reworked with a cloud first mindset but that wasn't feasible for the short term, meaning we needed to look for other ways of exposing the functionality.
The library whose functionality we needed to expose was a custom written analysis library, built in C++ and running on Linux. It accepts batches of data containing a large number of small records and uses a set of rules to generate insights. Data is supplied via a local text file, and execution is controlled via a separate input file with results being written to a third file.
Non-functional requirements
The first, and arguably most important part of building this part of the solution was to identify the constraints under which it would operate. The big risk with making a tool such as this publicly available is that it is successful - and that success can bring unwanted costs. As a result, we needed to consider how we would constrain and monitor:
- the volume of requests
- the amount of data we would need to receive and store per-request
- how long we would retain data for once processed
- how much running each job would cost
Data volumes per-request were a particularly interesting area - if we restricted the volume of input data too much, the API would be useless, as the library is more accurate when it has more data to use. But allow too much and the costs could skyrocket. After discussions with the developers responsible for the library , and with some calculations as to how that might translate to HTTP request sizes, we settled on a request size limit of 10MB which struck a good balance between allowing a meaningful volume of data to be supplied without incurring significant cost.
Building blocks
From the outset there were some easy choices for our solution. To answer the non-functional requirements around protecting our API against malicious use, Azure API Management was an obvious choice. As well as allowing us to define usage policies around request sizes and volumes, it also opens up the policy of implementing paid usage tiers in the future. For example, a free tier might provide the ability to analyse one data set every 5 minutes (to keep costs under control) but a paid tier could increase that limit or even remove it altogether. With API Management, decisions like this can be made and implemented independently of the underlying code that does the work.
To provide the endpoint to handle the requests, we look to Azure Functions. Execution of the library itself takes some time - in our tests, anywhere between 5 and 20 seconds depending on the amount of data, which is long enough for us to consider it a "long running process" and look to implement the asynchronous request-reply pattern described here. An obvious way for us to implement this is to make use of Durable Functions, which implements this pattern out of the box.
We then have the question of how we're actually going to get from our incoming HTTP request to the execution of the inference engine. By the time endjin got involved in the project, the on-site team had already been experimenting with containerisation solutions, including Docker and Kubernetes, as potential paths from their existing on-premise solution to the cloud. As part of this, they'd created a Docker image containing the analysis library which we were able to make use of. Coupling this with Kubernetes Jobs provides the answer. In the Kubernetes world, a Job is a type of controller designed for managing a task with finite execution time through to completion.
When starting a container as a Job, you provide the command that should be executed once the container is running. In our case, we needed to provide the paths for the control, input and output files, which raises the question - how do we get data in and out of the container?
Fortunately there's a relatively simple answer to this; Kubernetes makes it possible to attach an Azure Storage File Share as a volume to a running pod. This means that when executing the job from our durable function, we can write the input data files to the file share then mount the share as a volume into the pod that is running our job. Once execution is complete, we can read the resulting output file back from the same share.
It's worth pointing out here that Kubernetes isn't the only option in Azure that would fit the bill. We chose it because it was already in use as part of the project, but Azure Container Instances would also have been a good choice, and the architecture I'm describing here would be equally applicable to ACI.
Architecture walkthrough
With all that said, here's a generalised architecture for this kind of scenario:
So let's walk through the way in which a typical request and the corresponding processing look.
- The client POSTs a request to the public facing endpoint. This request contains the dataset for analysis.
- The request is first processed by APIM [1]. This checks that:
- The request is not too large (if it is, a 413 response is returned).
- The request rate is not too high (if it is, a 429 response is returned).
- Assuming the policies do not reject the request, APIM then forwards the call to the internal API [2]. The internal API is secured using EasyAuth, so APIM supplies an authentication token obtained using its Managed Identity.
- The request is received by an Azure Function via an HTTP Trigger. This function acts as a Starter function for the Orchestration that manages the job execution. It generates a new operation Id (a GUID) pushes the data into a known blob location in Blob Storage [3] (based on the operation Id) and then starts the execution Orchestration, supplying the operation Id as the orchestration's instance Id. At this point, the function returns a 202 response to the client with a Location header that contains a URL pointing to a second HTTP triggered function that the client can use to poll the status of the operation.
- The orchestration executes the following Activities to run the job:
- Data preparation - data is retrieved from Blob Storage [3] and converted to the input format required by the analysis library. This is written to the File Share [4] at a path generated using the operation Id.
- Control file preparation - the control file for the analysis library is also generated and written to the File Share [4].
- Start the Kubernetes Job using the AKS API [5]. This is done by supplying a configuration document which references the Docker image, specifies the command to execute (which we generate using the operation Id to point at the input and control files and specify the output path). It also includes the various secrets needed to allow this to happen, which are obtained from a KeyVault.
- Check the status of the job, again using the Kubernetes API. This is repeated until the job has completed (or failed), with a delay between each check.
- Once the job has completed, extract the output data and transform back into a JSON document that is pushed back into Blob Storage [3].
- Clean up the job in AKS (jobs must be manually deleted in order to release associated resources).
- Clean up the data in the File Share [4].
- Generate a SAS token for the output data in Blob Storage [3]. We want to be able to give the client a URL that references this blob directly so that they can download their results, and generating a SAS token allows us to do this without needing to grant specific access permissions on the blob itself. The token is valid for a configurable amount of time, which gives the client a window for downloading their data. Once we have the full URL including the SAS token, we return this as the result of the orchestration function, which is then stored by Durable Functions for retrieval when the client asks for the status of the operation.
- While the orchestration is executing, the client is able to poll the status endpoint to see how it's progressing. While the orchestration is in progress we continue to return the 202 response with Location header. Once the orchestration is complete, we are able to return a 302 response with the URL of the result blob in the Location header.
This is a really nice, clean approach to the problem at hand, and the architecture described above gives us some additional benefits out of the box.
Implementing using Durable Functions means that we have a certain level of error handling and recovery built in. If individual activities fail, the functions host can be configured to automatically retry them, and if they repeatedly fail then the orchestration will be halted and put into a Failed state with the exception details logged. This immediately gives us a great starting point when attempting to diagnose issues.
Storing response data in Blob storage allows us to automatically clean up old data using a lifecycle management policy to delete data after a certain amount of time. This means we can define a clear data retention policy and rely on the Azure infrastructure to enforce it for us.
Adopting a serverless approach to implementation gives us maximum flexibility in running our function; if there are few calls, we're not incurring the costs of continually hosting our endpoint or Docker instances running the analysis library, but if we need to allow a higher volume of calls then the infrastructure will ramp up as needed - and we can use the policy features in Azure API Management to act as gatekeeper, ensuring that this does not happen unexpectedly.
Finally, we are future proofing the API in several ways. Firstly, API Management allows us to apply a versioning policy to URLs and to route requests for different versions of the API to different endpoints. This simplifies handling of any breaking changes to the API as we can easily continue to host multiple versions of the API side by side, and have APIM route requests as necessary until we deprecate older versions. It's also possible to use transformations at the APIM layer to map requests to an older version of an API into the format expected by newer versions, giving us maximum flexibility.
Secondly, as the legacy code for the analysis library is migrated to a more modern, cloud-first approach, the internal API ([4] in the diagram) can be swapped out to point at the newer implementation without the need for changes to the client.
This is a great example of how serverless technologies can be used to expose legacy software to end users in a controlled way, allowing you to reap some of the benefits of a cloud first approach without fully rewriting and migrating existing software.