Scaling API Ingestion with the Queue-of-Work Pattern
TL;DR; The queue-of-work pattern enables massive parallelism for HTTP based API ingestion by breaking large jobs into thousands of independent work items processed by concurrent workers. This approach reduced our data ingestion time from 15 hours to under 2 hours while providing automatic retry handling and fault tolerance at a fraction of the cost of traditional orchestration tools.
Sample code to go with this blog post can be found in GitHub
The Problem: When Sequential API Calls Take Days
It's common in data platforms to need to acquire data from HTTP based APIs. If written well, these APIs will be reliable, fast and will allow you to control filtering, windowing, and pagination. But even if all of this is true, ingesting large amounts of data via an API can be challenging.
We recently faced one of these challenges when building out a new workload on an existing Azure Synapse-based modern data platform: the initial load of data from a source system required us to ingest about 2,000,000 records and the only way of doing this was via an HTTP API. A full synchronization requires tens of thousands of individual API requests, each fetching a page of 100 records of data.
We started by estimating how long this would take with a simple sequential approach. The API averages about 500ms per request. For 20,000 API calls:
20,000 requests × 0.5 seconds = 10,000 seconds = 2.8 hours
We also need to consider the volumes of data being processed. For us, the payload size is generally from 200KB to 1.5MB, with an average of 1MB. So 20,0000 requests yields around 20 GB of data.
That seems manageable, right? But this calculation ignores several real-world factors:
- Network variability: Some requests take 2-3 seconds, especially during peak hours
- Failures and retries: Network issues and transient API errors require retry logic
- Data processing: Each response needs parsing, transformation, and storage
In reality, a test retrieval of a subset of records suggested we were looking at something in the region of 48 hours to do a full ingestion, accounting for failures and retries.
That's too long. Our requirements were clear:
- Scalable: Process massive volumes efficiently through parallelism
- Fault-tolerant: Handle API failures gracefully without losing progress
- Fast: Complete full ingestion in under 12 hours
Why Not Use Synapse Pipelines?
The customer in this scenario already had a well established Azure Synapse platform in place. As a result the first port of call for data ingestion is normally a Synapse pipeline. After all, Synapse is designed for data orchestration, has many built in connectors and is already fully integrated with our data lake. However, we quickly discovered several dealbreakers for this specific use case:
Cost Inefficiency
Synapse pipelines charge per activity execution (approximately £0.00085 per activity run) and per integration runtime hour. Let's break down the cost for our 40,000 API calls:
20,000 activities × £0.00085 = £17 per full ingestion
Integration Runtime: ~£0.18/hour × 48 hours = £8.64
Total: ~£26 per full sync
While £26 doesn't sound expensive, issues outside our control mean it's likely we'd need to do this several times a year. We'll also be running nightly incremental updates; although these process way smaller amounts of data, there will occasionally be bulk updates in the source system that require larger data volumes to be reingested.
Over a year, this all adds up. But more importantly, Synapse's orchestration overhead makes it unsuitable for high-volume, small operations regardless of cost.
Limited Resilience
Synapse pipelines support retry logic, but their error handling is coarse-grained. If one API call in a batch of 1,000 fails, the retry mechanism repeats the entire batch, not just the failed item. There's no built-in concept of a "poison message queue" for persistently failing records.
When we tested this with a deliberately flaky API endpoint, a single bad record caused the same batch to retry indefinitely, blocking progress on thousands of good records. This simply won't work for production ingestion where some records may have data quality issues.
Orchestration Overhead
Synapse pipelines introduce latency between activity transitions. In our testing, even with an empty pipeline activity, there's typically 3-5 seconds of overhead per activity execution. For 20,000 activities:
20,000 activities × 4 seconds overhead = 80,000 seconds = 22 hours
This overhead alone exceeds our entire time budget. The pipeline execution model is optimized for long-running data transformations, not for coordinating thousands of quick API calls.
Lack of Dynamic Scaling
Scaling Synapse pipeline execution requires manual configuration of integration runtime settings. You can't easily spin up hundreds of concurrent workers dynamically based on queue depth, then scale down to zero when work completes. This inflexibility makes it difficult to optimize both cost and performance.
The Queue-of-Work Pattern: A Better Approach
After ruling out Synapse pipelines, we needed a solution that could handle parallelism without orchestration overhead. The queue-of-work pattern emerged as the ideal approach, and it's surprisingly simple in concept: break the large job into thousands of small, independent work items, put them in a queue, and let multiple workers process them concurrently.
The pattern decouples work distribution from work execution, which turns out to be the key to solving all our challenges. Here's how it works:
Pattern Overview
┌─────────────────┐
│ Ingestor │ 1. Breaks large job into small work items
│ (Enqueuer) │ 2. Enqueues items to Azure Storage Queue
└────────┬────────┘
│
▼
┌─────────────────┐
│ Azure Storage │ 3. Durable, distributed queue
│ Queue │ 4. Supports automatic retry & poison handling
└────────┬────────┘
│
▼
┌─────────────────┐
│ Queue │ 5. Dequeues messages
│ Processor │ 6. Dispatches to work item processors
│ (Workers) │ 7. Processes API calls
└────────┬────────┘
│
▼
┌─────────────────┐
│ Data Lake │ 8. Writes results
│ Storage │
└─────────────────┘
Key Components
Work Items: Strongly-typed dataclasses that represent individual units of work (e.g., "fetch assets with IDs 1000-5999")
Work Queue: An abstraction over Azure Storage Queues that handles serialization, dequeuing, and poison message management. Using an abstraction makes testing simpler.
Work Item Processors: Decorated functions that execute the actual work (API calls, data transformation, persistence)
Queue Processor: A loop that dequeues messages, dispatches them to the appropriate processor, and handles errors
Work Item Dispatcher: A registry that maps work item types to their processors using decorator-based registration
How the Pattern Solves Our Challenges
1. Scalability Through Parallelism
The first benefit of this pattern became apparent when we broke down the ingestion workload. Instead of processing 1 million assets sequentially, we could divide them into manageable chunks:
# Breaking down 1 million assets into 200 independent work items
max_asset_id = 1_000_000
batch_size = 5_000
for start_id in range(0, max_asset_id, batch_size):
work_queue.enqueue(
IngestItemsByIdWorkItem(
snapshot_time=timestamp,
correlation_id=correlation_id,
from_id=start_id,
to_id=start_id + batch_size - 1
)
)
This creates 200 independent work items, each responsible for 5,000 assets. Why 5,000? We tested different batch sizes and found this offered the best balance - large enough to amortize queue overhead, small enough that a single failure doesn't waste too much work.
Now here's where it gets interesting. With these work items in the queue, we can run multiple queue processors simultaneously:
- Each processor dequeues and processes messages independently
- No coordination overhead between workers - they don't even know about each other
- Azure Storage Queues handle concurrent access automatically with optimistic concurrency
- Scale up the number of workers simply by deploying more instances
In our production environment, we run this as Azure Container Apps Jobs. We did some testing to find out what level of concurrency can be supported by the target API - sending too many concurrent requests would overload the API and cause failures to spike. As a result of this, during a full ingestion we typically scale to 12 concurrent workers. The math works out well:
20,000 API calls ÷ 12 workers = 1,666 API calls per worker
1,666 calls × 0.5 seconds average = 833 seconds = 14 minutes
In practice, we see full ingestion complete in under 2 hours due to API variability and processing overhead, but this is still a massive improvement over the roughly 48 hours sequential approach.
2. Fault Tolerance Through Retry and Poison Queues
One of the most challenging aspects of large-scale API ingestion is handling failures gracefully. APIs are unreliable - they timeout, return 500 errors, get rate-limited, or occasionally return malformed data. We needed the system to handle these failures without manual intervention.
Azure Storage Queues provide built-in retry semantics through visibility timeouts and dequeue counts. Here's how we leveraged them:
class AzureStorageWorkQueue:
def __init__(self, queue, poison_queue, max_dequeues=5):
self._queue = queue
self._poison_queue = poison_queue
self._max_dequeues = max_dequeues
def dequeue(self):
# Visibility timeout of 300 seconds (5 minutes)
message = self._queue.receive_message(visibility_timeout=300)
# Automatic poison message handling
if message and message.dequeue_count > self._max_dequeues:
# This message has failed 5+ times, move it to poison queue
self._poison_queue.send_message(message.content)
self._queue.delete_message(message)
return None # Skip this problematic message
return DequeuedWorkItem(
work_item=jsonpickle.decode(message.content),
dequeued_message=message
)
Note - to keep the code simpler here, I haven't shown any additional error handling logic around the calls to the storage API. Depending on the exact logic and control mechanisms you have an place, you should consider adding error handling and retry logic around API calls like queue.receive_message.
How this works in practice:
When a worker dequeues a message, it becomes invisible to other workers for 5 minutes. If the worker successfully processes it, the message is deleted. If the worker crashes or the API call fails, the message automatically becomes visible again after 5 minutes for another worker to retry.
Azure Storage Queues track how many times each message has been dequeued. After 5 failed attempts, we consider the message "poisoned". This is a standard term in message processing systems that refers to a message which has repeatedly failed to be dispatched or processed, likely due to bad data or a systematic issue. We don't want to waste resources by continually trying to process the message, but we don't want to lose it either so we move it to a separate "poison queue" for manual investigation.
It should be noted here that even if you've chosen a finite TTL for your message queue, you should ensure that messages on the poison queue never expire by setting their TTL to -1.
This approach has proven remarkably resilient, and it's unusual to see runs end with messages on the poison queue. We've added a check at the end of processing to raise an alert if this does happen so that the bad messages can be investigated. Options also exist inside Azure to monitor message queues for other scenarios - e.g. an excessive number of messages in either queue - and raise alerts.
This approach also aids recovery in the event of a catastrophic failure. For example, during one ingestion run, the source API had a 6-hour outage. This meant that the majority of the queue messages ended up on the poison queue. Once we realised what had happened, we were able to quickly move all of the messages from the poison queue back to the processing queue and rerun them.
3. Speed Through Asynchronous Processing
The queue processor implementation is deliberately simple, which turns out to be a performance advantage:
def process_queue(correlation_id, work_queue, logger, ...):
while not queue_empty:
message = work_queue.dequeue()
if message is None:
queue_empty = True
continue
try:
# Dispatch to appropriate processor
WorkItemDispatcher.dispatch_work_item(
message.work_item,
logger,
**kwargs
)
# Success: remove from queue
work_queue.remove(message)
except Exception as e:
# Failure: message stays in queue for retry
logger.error(f"Error processing: {e}")
This simple loop runs on each worker container. There's no complex orchestration, no distributed locking, no coordination between workers. Each worker independently:
- Dequeues a message (blocking call, returns when message available)
- Processes it
- Deletes the message on success
- Repeats until the queue is empty
We measured the queue overhead itself - dequeue plus delete operations - at consistently under 50ms. This means for a typical 500ms API call, only 10% of the time is queue overhead. For slower API calls (2-3 seconds), the overhead becomes negligible.
The pattern also enables progressive completion. Unlike batch systems where you wait hours to see any results, data appears in the data lake as soon as each work item completes. This was particularly valuable during testing - we could verify the data pipeline was working correctly within minutes rather than waiting for a full ingestion to complete.
4. Clean Separation of Concerns
The dispatcher pattern keeps code maintainable:
# Define a work item type
@dataclass
class IngestItemsByIdWorkItem(WorkItem):
from_id: int
to_id: int
# Register its processor with a decorator
@work_item_processor(IngestItemsByIdWorkItem)
def process_items_by_id(work_item, logger, **kwargs):
api_client = kwargs["api_client"]
writer = kwargs["writer"]
# Fetch data from API
response = api_client.get_items(
build_query_params(work_item.from_id, work_item.to_id)
)
# Persist to data lake
writer.persist_assets(work_item.snapshot_time, response)
Benefits:
- Each processor focuses on one specific task
- Easy to add new work item types without modifying existing code
- Type safety through Python dataclasses
- Testable in isolation
- Clear dependency injection through kwargs
Important Considerations
Before implementing this pattern, there are several important factors to consider:
Data must be able to be partitioned: Since all the work is enumerated up front, you need a way of partitioning your data into a large number of small chunks. For our full ingestion, we chose ID ranges and for our incremental updates we chose time periods. This will be driven in part by the querying options supported by the API.
Idempotency is critical: Since messages can be retried automatically, your processors must be idempotent. If a worker crashes after persisting data but before deleting the message, another worker will process the same message again. In our implementation, we write data to the data lake with consistent file paths based on the work item parameters, so re-processing simply overwrites the existing file with identical data.
Message size limits: Azure Storage Queue messages are limited to 64 KB. Our work items are small (typically <1 KB when serialized), but if you need to pass large payloads, consider storing the data in blob storage and passing a reference in the message.
Visibility timeout tuning: The 5-minute visibility timeout works well for our API calls, but you'll need to adjust this based on your workload. Too short and messages might be retried while still being processed (duplicate work). Too long and failed messages won't be retried quickly enough.
Cost considerations: While Azure Storage Queue costs are minimal (~£0.00028 per 10,000 operations), container runtime costs can add up. With 50 workers running for 2 hours:
12 workers × 2 hours × £0.0001/vCPU-second × 0.5 vCPU × 3600 seconds
= approximately £4.32 per full ingestion
This is cheaper than the Synapse cost but provides much better performance and flexibility.
Avoid sensitive data in messages: Work items should contain only metadata (IDs, timestamps, pagination offsets), not actual sensitive data. The actual data from API responses goes directly to the data lake, not through the queue.
Real-World Implementation Example
Let's walk through a complete flow showing how these components work together for ingesting data:
Step 1: Enqueue Work (Main Orchestrator)
The first step runs in a dedicated "enqueuer" container that breaks the full ingestion job into work items:
class BronzeIngestor:
def enqueue_full_ingestion(self, snapshot_time, correlation_id):
# First, query the API to determine the scope of work
# This makes a single API call to get the maximum asset ID
max_asset_id = self._get_maximum_asset_id()
# Break into batches of 5,000 assets each
batch_size = 5_000
current_id = 0
while current_id < max_asset_id:
self._work_queue.enqueue(
IngestItemsByIdWorkItem(
snapshot_time=snapshot_time,
correlation_id=correlation_id,
from_id=current_id,
to_id=current_id + batch_size - 1
)
)
current_id += batch_size
This enqueuing process typically completes in under a 30 seconds for 20,000 work items. The snapshot_time ensures all workers use the same timestamp for file paths, making the data lake files consistent. The correlation_id ties all work items to the same ingestion job for tracing.
Depending on how many items you end up needing to enqueue, and what else lives in your queue, you will also need to consider error and recovery scenarios here. What if you fail half way through enqueuing work items? Whilst the items themselves need to be idempotent, we don't want to put multiple of the same item on the queue if we can avoid it.
If you have one queue per process, then the simplest option will be to ensure the queue is empty before you start processing, and have a recovery process which clears messages down in case of error.
Step 2: Process Queue (Worker Containers)
Multiple worker containers run simultaneously, each executing the same code but processing different messages:
class BronzeIngestor:
def process_queue(self, correlation_id):
# Set up dependencies that processors will need
processor_kwargs = {
"api_client": self._api_client,
"writer": self._bronze_writer
}
# This runs until the queue is empty
# Each worker container runs this independently
process_queue(
correlation_id=correlation_id,
work_queue=self._work_queue,
logger=self._logger,
get_processor_kwargs=lambda msg: processor_kwargs
)
The process_queue function (shown in the "Speed" section above) is just a simple loop. When the queue is empty, the worker exits gracefully.
Step 3: Execute Work (Registered Processor)
The dispatcher routes each work item to its registered processor. The @work_item_processor decorator registers this function to handle IngestItemsByIdWorkItem instances:
@work_item_processor(IngestItemsByIdWorkItem)
def process_items_by_id(work_item, logger, **kwargs):
api_client = kwargs["api_client"]
writer = kwargs["writer"]
# Each work item handles a range of IDs (e.g., 1000-5999)
# But the API paginates responses, so we need an inner loop
offset = 0
limit = 100 # API returns 100 assets per page
while True:
# Make API call for one page of results within this ID range
response = api_client.get_items(
build_query_params(
work_item.from_id,
work_item.to_id,
offset,
limit
)
)
assets = json.loads(response)["assets"]
if not assets:
break # No more results, we're done with this work item
# Write this page to the data lake
# File path includes snapshot_time for consistency across workers
writer.persist_assets(
work_item.snapshot_time,
work_item.from_id,
work_item.to_id,
offset,
response
)
offset += limit
For a work item covering IDs 1000-5999 (5,000 assets), this typically makes 50 API calls (100 assets per page). The entire work item takes about 25 seconds to process, which fits comfortably within the 5-minute visibility timeout.
Step 4: Experimentation and tuning
Once you have a working implementation, then some experimentation is needed to validate and tune batch sizes and the amount of parallelism you can support.
When considering batch sizes, there are a variety of factors to take account of. Firstly, how many API requests will be required to ingest a single batch? As mentioned above, our 5,000 asset batches result in around 50 API calls.
If any one of them fails, they will all be retried. If you can't rely on the API to consistently ingest an entire batch of data without failure, you should tune your batch size to minimise the impact of this.
Also, how much data will the API return in a single batch? If you need to process an entire batch of data at once (even if it requires multiple API calls to retrieve), and your payload size is large, will you have enough memory to process it all?
How long will a single batch take to process? This has an effect on your queue visibility timeout; if your batch takes 5 minutes to process but your queue visibility timeout is 2 minutes, this means different instances of your queue processor will likely end up processing the same message.
When considering parallelism, this is mainly down to the API you're ingesting from. Depending on your deployment architecture (discussed below) the overall cost for the process may be similar regardless of whether you run 10, 20 or 50 processors in parallel. However, the API you're ingesting from might not be able to cope with this - at worst, you could end up crashing the API, or making it unresponsive for other users. In our scenario, the API we retrieve the data from is the same as that used by the front end application, so we needed to avoid this.
Clearly the two factors are linked. For example when evaluating the API we are ingesting from we discovered that:
- larger batch sizes are more efficient because the system behind the API caches the query, so retrieving subsequent pages for a batch is relatively fast compared to retrieving the first page.
- smaller page sizes are better, as the API struggled to serialize large pages of data quickly.
- the API could reliably cope with running 12 processors in parallel; more than that started to significantly impact production performance for other users. However, we also established that we got better results when running the ingestion process outside working hours.
This experimentation process needs to be done carefully. If you are experimenting on a production API, it's likely you need to work with the owners to ensure you don't end up rendering their API unsable. If you're working in a sandbox environment, you need to bear in mind that it will likely have a different allocation of compute resources to the production environment, so you may need to include levers to allow you to tune the process once you reach production.
Underpinning this experimentation is ensuring you have baked in observability so that you can evaluate the resulting telemetry to inform your decision - more on this later.
Deployment Architecture
In production, this pattern runs on Azure Container Apps, which provides the perfect hosting model for this workload.
Container Jobs
Azure Container Apps Jobs are designed for workloads that run to completion then exit - exactly what we need.
We have a single container job which is called with arguments to either run the enqueuing or processing. The job can be started multiple times to support parallelism.
Container size
ACA offers a variety of combinations of CPU and memory, which allows you to select an appropriate. The exact options depend on whether you're using a Dedicated plan or a Consumption one. We used a Consumption plan, which allows you to size your container from as small as 0.25 cores and 0.5 GB RAM up to 4 cores and 8 GB RAM (as of March 2026).
Since we're not doing a lot of processing, but we are processing reasonably large chunks of data, we chose the 0.5 core and 1 GB RAM option, and it's serving us well. As with most other things, choosing the right container size is a matter of doing some initial calculations and then experimenting until you achieve consistent performance and reliability.
Orchestration
Although we're running the enqueuing and processing logic in Azure Container apps, we're still orchestrating things via a Synapse pipeline. We chose this route for consistency; we're orchestrating all of the other processes in our data platform via Synapse pipelines and we didn't want to introduce other approaches.
We've created a pipeline that can trigger an Azure Container App Job, passing in the necessary parameters. This can run either as a fire-and-forget process, or it can poll for job completion. We use the former for the full ingestion, and the latter for the incremental ingestion as this allows us to immediately trigger data processing once ingestion is complete.
Cost Optimization
Container Apps Jobs only consume resources while running:
- Enqueue job: Runs for ~1 minute, costs negligible
- Worker jobs: Run for ~2 hours during full ingestion
- Zero cost when idle - no minimum running instances required
This is significantly more cost-effective than keeping functions warm or maintaining always-on compute resources.
Scaling Strategy
While Azure Container Apps supports automatic queue-based scaling using KEDA, we've found that starting with a fixed number of workers (12) works well for predictable workloads. For unpredictable workloads, especially when the target API can support a higher rate of requests than ours, you could configure scaling rules based on queue depth:
Queue depth > 1000 messages → Scale to 50 workers
Queue depth < 100 messages → Scale to 10 workers
Queue empty → Scale to 0
Observability
OpenTelemetry tracing provides end-to-end visibility. Each work item creates its own trace span, making it easy to:
- Identify slow API endpoints
- Track which work items failed and why
- Identify failure patterns
- Measure end-to-end ingestion duration
- Correlate all work items for a specific ingestion job using the correlation_id
Benefits Beyond the Original Requirements
While we initially focused on solving the core challenges of scale, fault tolerance, and speed, the pattern has delivered several unexpected benefits:
Incremental ingestion came for free: Once the framework was in place, adding incremental ingestion was trivial. We created a new work item type (IngestItemsModifiedSinceWorkItem) that queries for recently modified assets instead of ID ranges. The dispatcher, queue processing, and retry logic all work identically.
Debugging became significantly easier: When something goes wrong, the poison queue contains the exact work item that failed, complete with all parameters. We can inspect it, fix the underlying issue (often a data quality problem in the source system), then manually re-queue it without reprocessing thousands of successful items. This has saved hours of debugging time.
Testing improved dramatically: Previously, testing the full ingestion pipeline required running against the actual API or building complex mocks. Now we can test individual processors in isolation by constructing work items with test data. Integration tests can enqueue a few work items and verify the results without processing the entire dataset.
Cost visibility is excellent: Azure Storage Queue operations cost approximately £0.28 per million operations. For our 20,000-message ingestion, that's less than £0.01 in queue costs. The predictable, pay-per-use model makes capacity planning straightforward.
Ability to modify the structure of the written data: Data is received from our source system arrives as partially formatted JSON. This can cause issues with some processing libraries, such as Polars. We ended up rewriting the retrieved data using the ndjson format, which Polars can process with no issues.
When to Use This Pattern
The queue-of-work pattern is ideal when:
- You need to make thousands or millions of API calls
- Work can be divided into independent, idempotent units
- Fault tolerance is critical (some items may fail, but others should continue)
- You want to scale horizontally by adding more workers
- Processing time per item varies significantly
- You need to prioritize certain types of work
It may be overkill for:
- Small-scale operations (< 100 items)
- Tightly coupled sequential processing
- Real-time streaming (consider event-driven architectures instead)
- Simple ETL where Synapse pipelines or Data Factory are sufficient
Expansion to a second data source
We've since extended the pattern to a second data source in the same solution, Amazon Elasticsearch. This required us to go through the same process of evaluation and experimentation again, and brought with it a new consideration.
This time, the main benefit of the pattern is not in error handling, as the Elasticsearch service is highly resiliant and less susceptible to transient errors. However, it is highly scalable so means we can use a much higher degree of parallelism than with our original service.
Finally, size of individual data items is relatively small, but the volume of items we are retrieving is high, as each item represents a user event in the system. Regardless, we found that retrieving a day's worth of data at a time worked well.
Because of the high degree of parallelism permitted, a full ingestion of several year's worth of data takes seconds rather than the hours it would likely take if using a Synapse pipeline with an HTTP connector.
However, we had an additional consideration for this source; an IP allow-list on the service. In Synapse pipelines, this is dealt with using a Self Hosted Integration Runtime (SHIR) to allow you to ingest data from a well-known IP address. Fortunately, we already had the necessary infrastructure in place in our Azure Container App Environment, but this is something to be wary of when introducing additional moving parts into a solution.
Conclusion
The queue-of-work pattern transformed what initially seemed like an intractable data ingestion problem into a manageable, scalable solution. By decoupling work distribution from execution through Azure Storage Queues, we achieved the performance and reliability goals that seemed out of reach with traditional orchestration approaches.
The pattern has run in production for since mid 2024, ingesting large volumes of records weekly from multiple source systems. It's proven robust enough that ingestion failures are now rare, and when they do occur, the poison queue pattern makes debugging straightforward.
Key takeaways from this implementation:
- Plan for the worst case: In a perfect world, we could potentially have lived with the sequential ingestion, as we'd only need to do the full ingestion once and then keep everything in sync. However, there are plenty of error scenarios, and these are what you need to make sure your process can handle.
- Simple is fast: The basic queue-and-worker model introduces minimal overhead (<50ms per operation) compared to complex orchestration systems
- Built-in resilience is valuable: Azure Storage Queues' native retry semantics and dequeue counting eliminated the need for custom retry logic
- Horizontal scaling works: Adding workers linearly improves throughput without coordination overhead
- Progressive completion aids debugging: Seeing results immediately rather than waiting hours made development and testing significantly faster
- Determining batch sizes and concurrency limits is an experimental process: It will likely take time to determine the limits of the API you are using and there will be a number of factors to take into account. If you have a sandbox environment for the API, it will likely have different characteristics to production. And even if you find the limits of concurrency that an API can support, the owners of that API will likely not want you to push it to the limit as this could negatively impact other users.
When this pattern works well:
This approach excels when you can break work into independent units that don't depend on each other's results. The API ingestion scenario is ideal - fetching assets with IDs 1000-5999 has no dependency on fetching assets 6000-11999. If your workload requires sequential processing or complex dependencies between tasks, this pattern may not be the best fit.
If you've made it this far, thanks for reading! If you've got questions about implementing this pattern or would like to discuss your specific use case, feel free to leave a comment below. And as a final reminder, sample code to go with this blog post can be found in GitHub.