Data Engineering

Supercharge Your Dev Containers on Windows

Mike Evans-Larah24/07/2025

Running VS Code Dev Containers on Windows? Clone repos inside WSL filesystem to eliminate I/O bottlenecks and boost performance dramatically.

DuckLake in Perspective: Advanced Features and Future Implications

Barry Smart30/06/2025

Explore DuckLake's advanced capabilities including built-in encryption, sophisticated conflict resolution, and the strategic implications for future data architecture. Understand how DuckLake enables new business models and positions itself against established lakehouse formats.

DuckLake in Practice: Hands-On Tutorial and Core Features

Barry Smart30/06/2025

Get hands-on with DuckLake through a comprehensive tutorial covering installation, basic operations, file organization, snapshots, and time travel functionality. Learn how DuckLake's database-backed metadata management works in practice.

Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Barry Smart30/06/2025

DuckDB Labs introduces DuckLake, a revolutionary approach to lakehouse architecture that solves fundamental problems with existing formats by bringing database principles back to data lake metadata management.

What is a Data Lakehouse?

Carmel Eve13/05/2025

What exactly is a Data Lakehouse? This blog gives a general introduction to their history, functionality, and what they might mean for you!

DuckDB in Practice: Enterprise Integration and Architectural Patterns

Barry Smart30/04/2025

Learn how to integrate DuckDB into enterprise environments, including Microsoft Fabric deployment, and explore the architectural patterns it enables for modern data processing workflows.

DuckDB in Depth: How It Works and What Makes It Fast

Barry Smart30/04/2025

Dive deep into the technical details of DuckDB, exploring its columnar architecture, vectorized execution, SQL enhancements, and the performance optimizations that make it exceptionally fast on a single machine.

DuckDB: the Rise of In-Process Analytics and Data Singularity

Barry Smart30/04/2025

Explore the concept of the 'data singularity' and how in-process analytics tools like DuckDB are transforming how we work with data by leveraging modern hardware capabilities.

Creating Quality Gates in the Medallion Architecture with Pandera

Liam Mooney25/04/2025

This blog explores how to implement robust validation strategies within the medallion architecture using Pandera, helping you catch issues early and maintain clean, trustworthy data.

Working locally with spark dev containers

Ian Griffiths10/01/2025

Running Spark locally in a dev container can significantly improve development feedback loops. This first article explains why, and the rest of the series will show how.

Per-Property Rows from JSON in Spark on Microsoft Fabric

Ian Griffiths23/08/2024

Spark doesn't always interpret JSON how we'd like. For example, if each key/value pair in a JSON object is conceptually one item, Spark won't give you a row per item by default. This article shows how to nudge Spark in the right direction.

Introduction to Python Logging in Synapse Notebooks

Jonathan George07/03/2024

The first step on the road to implementing observability in your Python notebooks is basic logging. In this post, we look at how you can use Python's built in logging inside a Synapse notebook.

Star Schemas are fundamental to unleashing value from data in Microsoft Fabric

Barry Smart09/11/2023

Ralph Kimble's 1996 Star Schema principles still apply in Cloud Native Analytics.

Adopt A Product Mindset To Maximise Value From Microsoft Fabric

Barry Smart31/08/2023

In this post I describe how adopting a product mindset will help you to extract maximum value from Microsoft Fabric.

Exploring Strategies Enabled By Microsoft Fabric

Barry Smart25/08/2023

Explore building situational awareness and leveraging strategic opportunities with Microsoft Fabric in this concise overview.

Developing a Data Mesh Inspired Vision Using Microsoft Fabric

Barry Smart14/08/2023

Explore Microsoft Fabric, inspired by Data Mesh, for a data-driven strategy. Learn to approach a Data Mesh vision using this powerful tool.

How Does Microsoft Fabric Measure Up To Data Mesh?

Barry Smart07/08/2023

Explore Data Mesh's influence on Microsoft Fabric, addressing gaps in data product marketplace, standards, master data management, and governance.

Microsoft Fabric Is A Socio-Technical Endeavour

Barry Smart01/08/2023

Creating a successful organisation-wide data and analytics platform isn't just about architecture, schemas and semantic models. It's also about culture, organisational design and people. This blog explores the socio-technical nature of data and analytics and how this should influence your approach to adoption of Microsoft Fabric.

Azure Synapse Analytics versus Microsoft Fabric: A Side by Side Comparison

Barry Smart23/05/2023

In this Microsoft Fabric vs Synapse comparison we examine how features map from Azure Synapse to Fabric.

Data validation in Python: a look into Pandera and Great Expectations

Liam Mooney08/03/2023

Implement Python data validation with Pandera & Great Expectations in this comparison of their features and use cases.

How to setup Python, PyEnv & Poetry on Windows

James Dawson07/03/2023

Explore using Python virtual environments & Poetry on Windows for smoother workflows, with a script & guide to enhance your dependency management experience.

How To Implement Continuous Deployment of Python Packages with GitHub Actions

Liam Mooney09/02/2023

Discover using GitHub Actions for auto-updates to Python packages on PyPI, assessing its role in Continuous Deployment.

Customizing Lake Databases in Azure Synapse Analytics

Ed Freeman24/10/2022

Explore Custom Objects in Lake Databases for user-friendly column names, calculated columns, and pre-defined queries in Azure Synapse Analytics.

How to create a semantic model using Synapse Analytics Database Templates

Barry Smart21/10/2022

Explore Azure Synapse Analytics Database Templates and learn to create semantic models in this 2nd blog of the series.

Continuous Integration with GitHub Actions

Liam Mooney28/09/2022

This post gives an overview of Continuous Integrations and shows how you can implement it with GitHub Actions, with an accompanying example Python project

How to apply behaviour driven development to data and analytics projects

Barry Smart02/09/2022

In this blog we demonstrate how the Gherkin specification can be adapted to enable BDD to be applied to data engineering use cases.

What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?

Ed Freeman12/07/2022

Explore Azure Synapse's 'Shared Metadata Model' feature. Learn how it syncs Spark tables with SQL Serverless, its benefits, and tradeoffs.

Extract insights from tag lists using Python Pandas and Power BI

Barry Smart22/06/2022

Discover how to extract insights from spreadsheets and CSV files using Pandas and Power BI in this blog post.

Introduction to Containers and Docker

Liam Mooney11/01/2022

Explore containerisation & Docker for app development & deployment. Learn to create containerised applications with examples in this intro guide.

How to test Azure Synapse notebooks

James Broome10/05/2021

Explore data with Azure Synapse's interactive Spark notebooks, integrated with Pipelines & monitoring tools. Learn how to add tests for business rule validation.

How Azure Synapse unifies your development experience

Ian Griffiths11/12/2020

Modern analytics requires a multi-faceted approach, which can cause integration headaches. Azure Synapse's Swiss army knife approach can remove a lot of friction.

How to use SQL Notebooks to access Azure Synapse SQL Pools & SQL on demand

Howard van Rooijen30/09/2020

Wishing Azure Synapse Analytics had support for SQL notebooks? Fear not, it's easy to take advantage rich interactive notebooks for SQL Pools and SQL on Demand.

Azure Synapse for C# Developers: 5 things you need to know

James Broome29/05/2020

Did you know that Azure Synapse has great support for .NET and #csharp? Learning new languages is often a barrier to digital transformation, being able to use existing people, skills, tools and engineering disciplines can be a massive advantage.

Import and export notebooks in Databricks

Ed Freeman09/09/2019

Learn to import/export notebooks in Databricks workspaces manually or programmatically, and transfer content between workspaces efficiently.

Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."

Ed Freeman04/07/2019

Explore solutions for JsonDecodeError in Databricks CLI & Clusters. Learn how pre-built CLIs/SDKs simplify requests & authentication in REST APIs.

Using Databricks Notebooks to run an ETL process

Carmel Eve10/05/2019

Explore data analysis & ETL with Databricks Notebooks on Azure. Utilize Spark's unified analytics engine for big data & ML, and integrate with ADF pipelines.

Using Python inside SQL Server

Ed Freeman16/01/2018

Learn to use SQL Server's Python integration for efficient data handling. Eliminate clunky transfers and easily operationalize Python models/scripts.

Who We Are

What We Do

Who We Help

What We Think

Contact Us

Supercharge Your Dev Containers on Windows

Mike Evans-Larah24/07/2025

DuckLake in Perspective: Advanced Features and Future Implications

Barry Smart30/06/2025

DuckLake in Practice: Hands-On Tutorial and Core Features

Barry Smart30/06/2025

Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Barry Smart30/06/2025

What is a Data Lakehouse?

Carmel Eve13/05/2025

DuckDB in Practice: Enterprise Integration and Architectural Patterns

Barry Smart30/04/2025

DuckDB in Depth: How It Works and What Makes It Fast

Barry Smart30/04/2025

DuckDB: the Rise of In-Process Analytics and Data Singularity

Barry Smart30/04/2025

Creating Quality Gates in the Medallion Architecture with Pandera

Liam Mooney25/04/2025

Working locally with spark dev containers

Ian Griffiths10/01/2025

Per-Property Rows from JSON in Spark on Microsoft Fabric

Ian Griffiths23/08/2024

Introduction to Python Logging in Synapse Notebooks

Jonathan George07/03/2024

Star Schemas are fundamental to unleashing value from data in Microsoft Fabric

Barry Smart09/11/2023

Adopt A Product Mindset To Maximise Value From Microsoft Fabric

Barry Smart31/08/2023

Exploring Strategies Enabled By Microsoft Fabric

Barry Smart25/08/2023

Developing a Data Mesh Inspired Vision Using Microsoft Fabric

Barry Smart14/08/2023

How Does Microsoft Fabric Measure Up To Data Mesh?

Barry Smart07/08/2023

Microsoft Fabric Is A Socio-Technical Endeavour

Barry Smart01/08/2023

Azure Synapse Analytics versus Microsoft Fabric: A Side by Side Comparison

Barry Smart23/05/2023

Data validation in Python: a look into Pandera and Great Expectations

Liam Mooney08/03/2023

How to setup Python, PyEnv & Poetry on Windows

James Dawson07/03/2023

How To Implement Continuous Deployment of Python Packages with GitHub Actions

Liam Mooney09/02/2023

Customizing Lake Databases in Azure Synapse Analytics

Ed Freeman24/10/2022

How to create a semantic model using Synapse Analytics Database Templates

Barry Smart21/10/2022

Continuous Integration with GitHub Actions

Liam Mooney28/09/2022

How to apply behaviour driven development to data and analytics projects

Barry Smart02/09/2022

What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?

Ed Freeman12/07/2022

Extract insights from tag lists using Python Pandas and Power BI

Barry Smart22/06/2022

Introduction to Containers and Docker

Liam Mooney11/01/2022

How to test Azure Synapse notebooks

James Broome10/05/2021

How Azure Synapse unifies your development experience

Ian Griffiths11/12/2020

How to use SQL Notebooks to access Azure Synapse SQL Pools & SQL on demand

Howard van Rooijen30/09/2020

Azure Synapse for C# Developers: 5 things you need to know

James Broome29/05/2020

Import and export notebooks in Databricks

Ed Freeman09/09/2019

Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."

Ed Freeman04/07/2019

Using Databricks Notebooks to run an ETL process

Carmel Eve10/05/2019

Using Python inside SQL Server

Ed Freeman16/01/2018