Skip to content
Scaling API Ingestion with the Queue-of-Work Pattern

Scaling API Ingestion with the Queue-of-Work Pattern

Jonathan George

The queue-of-work pattern enables massive parallelism for API ingestion by breaking large jobs into thousands of independent work items processed by concurrent workers. This approach reduced data ingestion time for our use case from 15 hours to under 2 hours while providing automatic retry handling and fault tolerance at a fraction of the cost of traditional orchestration tools.
Polars Workloads on Microsoft Fabric

Polars Workloads on Microsoft Fabric

Barry Smart

Polars now ships inside Microsoft Fabric by default. Here's how to use it alongside Fabric's other analytics tools and what that means for your data workflows.
Practical Polars: Code Examples for Everyday Data Tasks

Practical Polars: Code Examples for Everyday Data Tasks

Barry Smart

Unlock Python Polars with this hands-on guide featuring practical code examples for data loading, cleaning, transformation, aggregation, and advanced operations that you can apply to your own data analysis projects.
Under the Hood: What Makes Polars So Scalable and Fast?

Under the Hood: What Makes Polars So Scalable and Fast?

Barry Smart

Polars gets its speed from a strict type system, lazy evaluation, and automatic parallelism. Here's how each piece works under the hood.
Polars: Faster Pipelines, Simpler Infrastructure, Happier Engineers

Polars: Faster Pipelines, Simpler Infrastructure, Happier Engineers

Barry Smart

We've migrated our own IP and several customers from Pandas and Spark to Polars. The benefits go beyond raw speed: faster test suites, lower platform costs, and an API developers actually enjoy using.
Top Features of Notebooks in Microsoft Fabric

Top Features of Notebooks in Microsoft Fabric

Jessica Hill

Discover the top key features of notebooks in Microsoft Fabric.
DuckLake in Perspective: Advanced Features and Future Implications

DuckLake in Perspective: Advanced Features and Future Implications

Barry Smart

Explore DuckLake's advanced capabilities including built-in encryption, sophisticated conflict resolution, and the strategic implications for future data architecture. Understand how DuckLake enables new business models and positions itself against established lakehouse formats.
DuckLake in Practice: Hands-On Tutorial and Core Features

DuckLake in Practice: Hands-On Tutorial and Core Features

Barry Smart

Get hands-on with DuckLake through a comprehensive tutorial covering installation, basic operations, file organization, snapshots, and time travel functionality. Learn how DuckLake's database-backed metadata management works in practice.
Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Barry Smart

DuckDB Labs introduces DuckLake, a revolutionary approach to lakehouse architecture that solves fundamental problems with existing formats by bringing database principles back to data lake metadata management.
What is a Data Lakehouse?

What is a Data Lakehouse?

Carmel Eve

What exactly is a Data Lakehouse? This blog gives a general introduction to their history, functionality, and what they might mean for you!
DuckDB in Practice: Enterprise Integration and Architectural Patterns

DuckDB in Practice: Enterprise Integration and Architectural Patterns

Barry Smart

DuckDB comes pre-installed in Microsoft Fabric Python notebooks, so code developed locally deploys straight to production with enterprise monitoring, governance, and OneLake integration.
DuckDB in Depth: How It Works and What Makes It Fast

DuckDB in Depth: How It Works and What Makes It Fast

Barry Smart

Dive deep into the technical details of DuckDB, exploring its columnar architecture, vectorized execution, SQL enhancements, and the performance optimizations that make it exceptionally fast on a single machine.
DuckDB: the Rise of In-Process Analytics and Data Singularity

DuckDB: the Rise of In-Process Analytics and Data Singularity

Barry Smart

Modern laptops can now handle datasets up to a billion rows, yet 94% of query spending goes on big-data compute that isn't needed. DuckDB brings analytical SQL directly into your process.
Creating Quality Gates in the Medallion Architecture with Pandera

Creating Quality Gates in the Medallion Architecture with Pandera

Liam Mooney

This blog explores how to implement robust validation strategies within the medallion architecture using Pandera, helping you catch issues early and maintain clean, trustworthy data.
Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Next Steps)

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Next Steps)

James Broome

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.
Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Pipeline Definition)

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Pipeline Definition)

James Broome

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.
Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Architecture Overview)

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Architecture Overview)

James Broome

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.
Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Introduction)

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Introduction)

James Broome

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.