Browse our archives by topic…
Blog
Browse our blogs activity over the years…
DuckDB: the Rise of In-Process Analytics and Data Singularity
Modern laptops can now handle datasets up to a billion rows, yet 94% of query spending goes on big-data compute that isn't needed. DuckDB brings analytical SQL directly into your process.
Creating Quality Gates in the Medallion Architecture with Pandera
This blog explores how to implement robust validation strategies within the medallion architecture using Pandera, helping you catch issues early and maintain clean, trustworthy data.
What are record types in C# / .NET?
Records are primarily meant for representing data. They are usually immutable and allow you to copy, equate, and print, object properties.
C# 12.0: ref readonly
C# 12.0 adds a new way to annotate parameters: ref readonly. This seems like it should mean exactly the same as the older in annotation. This post explains why this new syntax is useful.
Power BI: using label encoded vs one-hot encoded data
Understand why label encoding is the preferred technique for encoding categorical data for analysis in Power BI over one-hot encoding.
Co/Contravariance in C# Interfaces
Covariance and contravariance in C# generic interfaces, unpacked from first principles using implicit reference conversions — demystified with simple examples.
Encoding categorical data for Power BI: Label vs one-hot
One-hot encoding and label encoding are two methods used to encode categorical data. Understand the specific advantages and disadvantages of these techniques.
Power BI Images That Pop: Intuitive, easy-to-maintain reports
Explore integrating icons, pictograms and images into Power BI in the optimal way to enhance the user experience and minimise effort required to build and maintain reports.
Spark dev containers: packaging code for testability
Once you've thoroughly tested your code against the local Spark service in your dev container, you'll want to run it in a real Spark cluster. This posts shows how to deploy such code to Microsoft Fabric.
Spark dev containers: writing tests
Having seen earlier in the series how to configure a dev container to run Spark locally, this post shows how to write tests that use that local Spark service.
Spark dev containers: running Spark locally
Configure a Docker-based dev container that runs Spark locally — shorten your inner dev loop and make automated tests practical with the right base image.
Working locally with spark dev containers
Running Spark locally in a dev container can significantly improve development feedback loops. This first article explains why, and the rest of the series will show how.
C# 12.0: collection expressions
C# 12.0 provides a new, simpler syntax for initializing expressions. It typically generates the most efficient code possible, although as you'll see, it's useful to understand the choices it makes.
Why Power BI developers should care about TMDL
Power BI's adoption of TMDL improves the readability of the semantic model, enables version control and enhances collaboration and efficiency for developers.
Women of Silicon Roundabout: Day 2
Women of Silicon Roundabout is the UK's largest women in tech event. Day two topics included: green tech, burnout, and Python!
Women of Silicon Roundabout: Day 1
Women of Silicon Roundabout is the UK's largest women in tech event. Day one topics included: AI, career pathways, and generations of Women in Tech.
C# 12.0: inline arrays
A new feature in C# 12.0 enables data types to define fixed-size arrays that don't require separate array objects on the heap. Learn how this is useful in performance-oriented and interop scenarios.
There's something wrong with the Pandas API on Spark
Fix the following issues: Errors converting large datasets to pandas, pandas for Spark is very slow, and pandas for Spark column reduction doesn't reduce data.
How .NET 9.0 boosted JSON Schema performance by 32%
We benchmarked endjin's JSON Schema library on .NET 9.0 and saw large performance gains. There are even more gains to be had with new System.Text.Json features.
How .NET 9.0 boosted AIS.NET performance by 9%
.NET 9.0 has shipped, and for the fourth year running, we benchmarked endjin's AIS.NET library and were very happy to see substantial performance gains, with no extra work required.
Carbon Optimised Data Pipelines: Next Steps
Extending carbon-optimised pipelines: choose between Azure regions at runtime, work around Wait activity limits, and adapt the pattern beyond the UK.
Carbon Optimised Data Pipelines: Pipeline Definition
A portable Data Factory, Synapse, or Fabric pipeline that calls the Carbon Intensity API and waits for the greenest scheduling window — no custom code.
Modern Compute: Compute-Intensive Workloads
We have a wide range of computational mechanisms at our disposal, some of which emerged thanks to recent advances in AI. In this post, we look at the kinds of workloads that can take advantage of these.
C# 12.0: primary constructors
C# 12.0's most prominent new feature is the primary constructor syntax. This post describes how it works, and looks at some pros and cons.
Carbon Optimised Data Pipelines: Architecture Overview
Translating carbon-optimised scheduling into a modern data pipeline architecture for Microsoft Fabric, Azure Synapse and Azure Data Factory.