Multi-layer Caching with the Decorator Pattern
TL;DR; Querying a Databricks SQL Serverless endpoint for analytical data is fast once the cluster is warm, but cold-start latency and query execution times make it unsuitable as the direct backing store for a web API. We solved this with two layers of caching—Azure Blob Storage for persistence across restarts, and IMemoryCache for sub-millisecond in-process reads—implemented cleanly using the Decorator pattern.
The Performance Challenge
As part of a recent project, we were building an analytical web API that serves sales data — products, retailers, historical figures and projections — to a React front-end. The data is produced by an ETL process that runs in Databricks and writes the results to Delta tables in a data lake. The natural way to query that data is through Databricks SQL Serverless: it handles the complex analytical workloads, scales well, and integrates cleanly with the rest of the stack.
There's a catch, though. Databricks SQL Serverless clusters can be paused when idle, and cold-start latency can add several seconds—sometimes tens of seconds—to the first request after a period of inactivity. Even on a warm cluster, query execution time for some of the larger datasets runs into multiple seconds. For a web API that needs to feel responsive, that's a problem.
The good news is that most of the data retrieved from Databricks is reference data: it changes infrequently, and when it does change, it changes in a controlled, versioned way. That observation is what makes aggressive caching safe — and it's the insight that shaped the approach described here.
Key observation: the application works with specific named versions of the sales data. Within a given version, the data is completely immutable. There's no risk of serving stale data from a cache, because the data simply doesn't change once a version is published.
Understanding the Data Access Requirements
All data access in the application flows through a single interface, ISalesDataRepository. Here's a trimmed version of its key methods:
public interface ISalesDataRepository
{
Task<string> GetLatestVersionIdAsync();
Task<VersionInfo> GetVersionAsync(string versionId);
Task<Product[]> GetProductsAsync(string versionId);
Task<Retailer[]> GetRetailersAsync(string versionId);
Task<SalesSummary[]> GetSalesAsync(string versionId, DateRange dateRange);
Task<SalesSummary[]> GetSalesByIdsAsync(string versionId, string[] ids);
}
The versioning model is central to everything. A version is created by the ETL process and is immutable once published. The application operates within a specific version — most requests include a versionId that scopes the data being retrieved.
Not all data falls into the same category, though. When we analyse the methods, three distinct types emerge:
- Fully immutable within a version — products, retailers, and other reference data. Once fetched for a given version, these can be cached indefinitely. They will never change.
- Near-real-time — the latest version ID. This needs a short time-to-live (we use five minutes) to pick up new versions when they're published without hammering the source on every request.
- On-demand lookups — targeted queries like
GetSalesByIdsAsync, where the combination of parameters is effectively unbounded. Caching these isn't meaningful; they always go direct to Databricks.
That analysis drives a selective caching strategy rather than a blanket one. Not everything is worth caching, and not everything can be cached with the same TTL.
The Decorator Pattern: A Primer
If you haven't used the Decorator pattern before, the idea is straightforward. A decorator implements the same interface as the class it wraps. It adds behaviour—before, after, or instead of—delegating calls to the inner implementation. The consumer doesn't know, or care, whether it's talking to a "real" implementation or a decorator. It just uses ISalesDataRepository.
This is a natural fit for caching. The actual data access code stays clean and focused on talking to Databricks. The caching logic lives entirely in the decorators. The two concerns don't touch each other.
In our case, the chain looks like this:
Client
└─► MemoryCachingSalesDataRepository (Layer 2: in-process, sub-millisecond)
└─► BlobStorageCachingSalesDataRepository (Layer 1: shared, persistent, fast)
└─► SalesDataRepository (Real implementation: Databricks SQL)
Each layer wraps the one below it. The MemoryCachingSalesDataRepository doesn't know it's wrapping a blob cache—it just knows it has an ISalesDataRepository to delegate to when it misses. Dependency injection wires the chain together; the decorators themselves have no knowledge of each other.
Layer 1: Azure Blob Storage Cache
Why Blob Storage?
An in-process memory cache is lost when the API restarts or when a new instance spins up. Our application runs in Azure Container Apps, which scales out to multiple replicas and restarts during deployments. Without a persistent cache layer, every new instance would need to hit Databricks on its first request—exactly the cold-start problem we're trying to avoid.
Blob Storage is cheap, fast for reads, and shared across all replicas. It's not as fast as in-process memory, but it's orders of magnitude faster than waiting for a Databricks cluster to warm up.
How It Works
BlobStorageCachingSalesDataRepository implements ISalesDataRepository and wraps the real SalesDataRepository. For each cacheable method, it constructs a deterministic blob path based on the data type, version ID, and any relevant parameters — for example, sales/{versionId}/products.bin for the products list.
The core of the implementation is a generic GetOrCreateAsync<T>() helper:
private async Task<T> GetOrCreateAsync<T>(
string blobPath,
Func<Task<T>> factory)
{
var blobClient = _containerClient.GetBlobClient(blobPath);
if (await blobClient.ExistsAsync())
{
var content = await blobClient.DownloadContentAsync();
return MemoryPackSerializer.Deserialize<T>(content.Value.Content.ToArray());
}
// Cache miss: fetch from inner repository
var result = await factory();
// Write to blob storage for next time
var bytes = MemoryPackSerializer.Serialize(result);
await blobClient.UploadAsync(BinaryData.FromBytes(bytes), overwrite: true);
return result;
}
The data is stored as binary blobs using MemoryPack, a high-performance binary serialiser for .NET. Compared to JSON, this keeps blob sizes small and deserialisation fast—both matter at the scale of multiple API instances reading from shared storage.
Preventing the Thundering Herd
A naïve implementation has a race condition that's particularly nasty during cold starts. If the blob doesn't exist and multiple requests arrive concurrently — which is exactly what happens when a new container instance starts up and the front-end fires several API calls at once—they all miss the cache and all hit Databricks simultaneously.
This is the thundering herd problem. In the worst case, you end up making dozens of parallel queries to a cluster that's still warming up.
The solution is a ConcurrentDictionary<string, SemaphoreSlim> keyed by blob path. When a cache miss occurs, we acquire the semaphore for that blob path before proceeding. Inside the lock, we check the blob again—another request may have already populated it while we were waiting. If it's still a miss, we fetch from the inner repository and write the result. Here's the pattern in full:
private async Task<T> GetOrCreateAsync<T>(
string blobPath,
Func<Task<T>> factory)
{
var blobClient = _containerClient.GetBlobClient(blobPath);
// Fast path: blob already exists
if (await blobClient.ExistsAsync())
{
var content = await blobClient.DownloadContentAsync();
return MemoryPackSerializer.Deserialize<T>(content.Value.Content.ToArray());
}
// Slow path: acquire per-blob semaphore to prevent thundering herd
var semaphore = _semaphores.GetOrAdd(blobPath, _ => new SemaphoreSlim(1, 1));
await semaphore.WaitAsync();
try
{
// Double-check: another waiter may have populated the blob
if (await blobClient.ExistsAsync())
{
var content = await blobClient.DownloadContentAsync();
return MemoryPackSerializer.Deserialize<T>(content.Value.Content.ToArray());
}
var result = await factory();
var bytes = MemoryPackSerializer.Serialize(result);
await blobClient.UploadAsync(BinaryData.FromBytes(bytes), overwrite: true);
return result;
}
finally
{
semaphore.Release();
}
}
Only one request per unique blob path reaches Databricks. All others wait for the semaphore, benefit from the result, and return immediately. It's worth noting that this applies per instance — if there are multiple container replicas running, each will independently populate its own copy of the blob. In practice that's fine: the first request to any instance pays the Databricks cost; subsequent requests benefit from the cached blob. If it does become a problem, there are more complex solutions using shared locks - for example, Corvus.Leasing which uses Azure blob storage to provide a means to acquire, release and extend exclusive leases to mediate resource access in distributed processing.
What's Deliberately Not Cached Here
Version lookups — GetLatestVersionIdAsync and GetAvailableVersionsAsync—always go to the source. We want the application to notice when a new version is published within a reasonable time. Caching these in blob storage would give us no meaningful benefit over the in-memory TTL we apply at the layer above.
Targeted on-demand lookups by ID also bypass the blob cache. The combination space — different sets of IDs against different versions — is too large to cache meaningfully.
Graceful Degradation
All blob I/O is wrapped in try/catch. If the cache layer fails for any reason—transient connectivity, permissions, a bad serialisation — it logs a warning and falls through to the inner repository. The application keeps working; it's just slower until the cache is warm again.
Layer 2: In-Memory Cache
Why a Second Layer?
Even a fast Blob Storage read involves a network round-trip and deserialisation overhead. For a busy API serving the same reference data many times per second, that adds up. IMemoryCache keeps deserialised objects in the process's heap. Reads are effectively instantaneous — no network, no deserialisation, just a dictionary lookup.
How It Works
MemoryCachingSalesDataRepository wraps the Blob Storage decorator (which in turn wraps the real repository). It uses the standard cache.GetOrCreateAsync() pattern, with expiry configured per data type:
- Immutable data within a version: no expiry—held in memory until the process restarts.
- Latest version ID: five-minute sliding expiry, so new versions are picked up in a timely manner.
Cache keys incorporate the version ID where relevant, so different versions don't collide.
Here's a representative method:
public async Task<Product[]> GetProductsAsync(string versionId)
{
using var activity = _activitySource.StartActivity("GetProducts");
var cacheKey = $"{ProductsCacheKeyPrefix}{versionId}";
if (_cache.TryGetValue(cacheKey, out Product[]? cached))
{
activity?.SetTag("cache.hit", true);
return cached!;
}
activity?.SetTag("cache.hit", false);
var result = await _innerRepository.GetProductsAsync(versionId);
_cache.Set(cacheKey, result);
return result;
}
Observability with Activity Source
Each method creates an Activity via ActivitySource, which participates in distributed tracing through OpenTelemetry. We record whether the request was a cache hit or miss as a tag on the activity: cache.hit = true/false.
This turns out to be genuinely useful. When we look at the observability dashboard, we can see at a glance what proportion of requests are being served from memory versus falling through to lower layers. It's how we validated that the cache was actually working as expected after deployment.
What Doesn't Get Cached In Memory
Targeted on-demand lookups by ID, as with the blob layer, always delegate to the inner repository. The combination space makes in-memory caching impractical—we'd risk holding enormous amounts of data with a very low hit rate.
Wiring It Together with Dependency Injection
The decorator chain is composed in the service registration. The order matters: each decorator needs to wrap the layer below it, not the one above. We register the concrete types first, then register the ISalesDataRepository abstraction as the fully-composed outermost decorator:
// Innermost: the real Databricks implementation
services.AddSingleton<SalesDataRepository>();
// Middle layer: Blob Storage cache wrapping the real implementation
services.AddSingleton<BlobStorageCachingSalesDataRepository>(sp =>
new BlobStorageCachingSalesDataRepository(
sp.GetRequiredService<SalesDataRepository>(),
sp.GetRequiredService<BlobServiceClient>(),
sp.GetRequiredService<IOptions<SalesDataBlobCacheOptions>>(),
sp.GetRequiredService<ILogger<BlobStorageCachingSalesDataRepository>>(),
sp.GetRequiredService<ActivitySource>()
)
);
// Outermost: in-memory cache wrapping the blob storage cache
// This is what consumers receive when they depend on ISalesDataRepository
services.AddSingleton<ISalesDataRepository>(sp =>
new MemoryCachingSalesDataRepository(
sp.GetRequiredService<BlobStorageCachingSalesDataRepository>(),
sp.GetRequiredService<IMemoryCache>(),
sp.GetRequiredService<ActivitySource>()
)
);
Any component that depends on ISalesDataRepository via the DI container automatically gets the fully-composed chain. The decorators themselves have no knowledge of how they're composed—they just know they have an ISalesDataRepository to delegate to.
Tracing a Request Through the Cache Layers
Let's walk through what happens for a call to GetProductsAsync under three different scenarios.
First Request After Deployment (Cold Everything)
MemoryCachingSalesDataRepository— cache miss; delegates to inner.BlobStorageCachingSalesDataRepository— blob not found; acquires semaphore; delegates to inner.SalesDataRepository— queries Databricks SQL. If the cluster has been idle, this may take several seconds while it warms up.- The result flows back up: written to Blob Storage, then stored in the in-process cache.
This is the expensive path. It only happens once per unique dataset per application instance.
Second Request in the Same Process
MemoryCachingSalesDataRepository— cache hit; returns immediately from memory.
That's it. Sub-millisecond response time regardless of what Databricks is doing.
First Request After a Replica Starts (or a Process Restart)
MemoryCachingSalesDataRepository— cache miss (new process, empty memory cache).BlobStorageCachingSalesDataRepository— blob found; deserialises and returns.- The result is held in memory for all subsequent requests.
The new instance pays a Blob Storage round-trip on its first request for each dataset, but never needs to hit Databricks. The warm-up time for a new replica is a handful of Blob Storage reads rather than a cluster cold-start.
Results and Trade-offs
What We Gained
After the initial warm-up, the vast majority of read requests are served from the in-memory cache in sub-millisecond time. New replicas warm quickly from Blob Storage without touching Databricks. The thundering herd problem is eliminated: Databricks is hit at most once per unique dataset per version per application instance, regardless of how many concurrent requests arrive.
Honest Trade-offs
It would be misleading to present this as a straightforward win with no downsides. There are real trade-offs:
- Complexity. We now have three classes where one might seem simpler at first glance. Each decorator is individually straightforward, but the chain requires understanding to navigate.
- Staleness by design. The latest version ID has a five-minute lag. The application has to tolerate that, and the product team has to accept it. In our case that's fine — new versions aren't published on a minute-by-minute basis — but it's a deliberate constraint.
- Cache invalidation is implicit. When a new version is published, the old version's blobs remain in Blob Storage — they're just never requested for that version again. A separate cleanup process could remove them if storage cost becomes a concern, but for now the cost is negligible.
- Memory pressure. Keeping large datasets in
IMemoryCacheindefinitely is a deliberate choice that works because our process's memory budget accommodates it. For larger datasets or more memory-constrained environments, you'd want to think carefully about size limits and eviction policies. - The Blob Storage layer adds latency on cold reads. If the Databricks cluster happens to be warm when a blob is missing, going via Blob Storage is actually slower than going direct. In practice, the cluster being warm on a cold-start scenario is the exception rather than the rule—but it's worth being aware of.
As always, the answer is "it depends." This approach made sense for our workload profile. For a different set of constraints — larger datasets, more frequent version changes, tighter memory budgets — some of these trade-offs might tip the other way.
Conclusions
The Decorator pattern is a clean fit for layered caching because it keeps caching logic entirely separate from data access logic. Adding a new cache tier is additive — it doesn't require changes to existing classes. The chain is composed by the DI configuration, not by the decorators themselves.
The design decisions that made this work were:
- Understanding which data is truly immutable. The versioning model gave us a strong guarantee that made aggressive, indefinite caching safe.
- Choosing the right storage tier for each layer. Blob Storage for persistence and cross-replica sharing;
IMemoryCachefor the fast path. - Protecting against the thundering herd. The semaphore-based double-check at the Blob Storage layer is easy to overlook but critical at cold start.
Databricks SQL Serverless is a powerful analytical query engine. The trick is to use it for what it's good at—processing and transforming large analytical datasets—and let fast caches absorb the high-frequency, low-latency reads that a web API demands. The Decorator pattern gives us the architectural seam to do that cleanly.
The same pattern applies well beyond Databricks. Anywhere you have a slow or expensive data source serving data that changes infrequently, layering caches using decorators is a maintainable and extensible approach worth considering.
If you've got any questions or would like to discuss anything we've talked about, please feel free to leave a comment below. You can also find me on X/Twitter at @jon_george1.