Big Data | endjin

DuckLake in Perspective: Advanced Features and Future Implications

Barry Smart30/06/2025

Explore DuckLake's advanced capabilities including built-in encryption, sophisticated conflict resolution, and the strategic implications for future data architecture. Understand how DuckLake enables new business models and positions itself against established lakehouse formats.

DuckLake in Practice: Hands-On Tutorial and Core Features

Barry Smart30/06/2025

Get hands-on with DuckLake through a comprehensive tutorial covering installation, basic operations, file organization, snapshots, and time travel functionality. Learn how DuckLake's database-backed metadata management works in practice.

Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Barry Smart30/06/2025

DuckDB Labs introduces DuckLake, a revolutionary approach to lakehouse architecture that solves fundamental problems with existing formats by bringing database principles back to data lake metadata management.

What is a Data Lakehouse?

Carmel Eve13/05/2025

What exactly is a Data Lakehouse? This blog gives a general introduction to their history, functionality, and what they might mean for you!

DuckDB in Practice: Enterprise Integration and Architectural Patterns

Barry Smart30/04/2025

Learn how to integrate DuckDB into enterprise environments, including Microsoft Fabric deployment, and explore the architectural patterns it enables for modern data processing workflows.

DuckDB in Depth: How It Works and What Makes It Fast

Barry Smart30/04/2025

Dive deep into the technical details of DuckDB, exploring its columnar architecture, vectorized execution, SQL enhancements, and the performance optimizations that make it exceptionally fast on a single machine.

DuckDB: the Rise of In-Process Analytics and Data Singularity

Barry Smart30/04/2025

Explore the concept of the 'data singularity' and how in-process analytics tools like DuckDB are transforming how we work with data by leveraging modern hardware capabilities.

Creating Quality Gates in the Medallion Architecture with Pandera

Liam Mooney25/04/2025

This blog explores how to implement robust validation strategies within the medallion architecture using Pandera, helping you catch issues early and maintain clean, trustworthy data.

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Next Steps)

James Broome14/11/2024

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Pipeline Definition)

James Broome08/11/2024

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Architecture Overview)

James Broome23/10/2024

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Introduction)

James Broome18/10/2024

Intelligently scheduling cloud data pipelines based on carbon impact can optimize both environmental sustainability and operational efficiency.

Launchpad to Success: Building and Leading Your Data Team

Barry Smart28/06/2024

This guide captures the essential points that leaders should consider when setting up a new data team.

Data is a socio-technical endeavour

Barry Smart19/04/2024

Our experience shows that the the most successful data projects rely heavily on building a multi-disciplinary team.

Data and AI Engineering Maturity - Fix our problems before we hit the buffers

Matthew Adams27/03/2024

As data and AI become the engine of business change, we need to learn the lessons of the past to avoid expensive failures.

SQLbits 2024 - The Best Bits

Barry Smart26/03/2024

This is a summary of the sessions I attended at SQLbits 2024 - Europe's largest expert led data conference. This year SQLBits was hosted at Farnborough IECC, Hampshire.

Introduction to Python Logging in Synapse Notebooks

Jonathan George07/03/2024

The first step on the road to implementing observability in your Python notebooks is basic logging. In this post, we look at how you can use Python's built in logging inside a Synapse notebook.

Adopt A Product Mindset To Maximise Value From Microsoft Fabric

Barry Smart31/08/2023

In this post I describe how adopting a product mindset will help you to extract maximum value from Microsoft Fabric.

Exploring Strategies Enabled By Microsoft Fabric

Barry Smart25/08/2023

Explore building situational awareness and leveraging strategic opportunities with Microsoft Fabric in this concise overview.

Developing a Data Mesh Inspired Vision Using Microsoft Fabric

Barry Smart14/08/2023

Explore Microsoft Fabric, inspired by Data Mesh, for a data-driven strategy. Learn to approach a Data Mesh vision using this powerful tool.

How Does Microsoft Fabric Measure Up To Data Mesh?

Barry Smart07/08/2023

Explore Data Mesh's influence on Microsoft Fabric, addressing gaps in data product marketplace, standards, master data management, and governance.

Microsoft Fabric Is A Socio-Technical Endeavour

Barry Smart01/08/2023

Creating a successful organisation-wide data and analytics platform isn't just about architecture, schemas and semantic models. It's also about culture, organisational design and people. This blog explores the socio-technical nature of data and analytics and how this should influence your approach to adoption of Microsoft Fabric.

Copilot - Are You Ready to Unleash the Power of AI in Self Service Analytics?

James Broome09/06/2023

Explore AI-powered self-service reporting with tools like Copilot in Power BI and Microsoft Fabric, balancing benefits and pitfalls.

Microsoft Fabric: Announced

Ed Freeman23/05/2023

Microsoft Fabric unifies Power BI, Data Factory & Data Lake on Synapse infrastructure, reducing cost & time while enabling citizen data science.

What is OneLake?

Ed Freeman23/05/2023

Explore OneLake, Microsoft Fabric's core storage for data in Azure & other clouds. Discover its role in Fabric workloads, the OneDrive equivalent for data storage.

Azure Synapse Analytics versus Microsoft Fabric: A Side by Side Comparison

Barry Smart23/05/2023

In this Microsoft Fabric vs Synapse comparison we examine how features map from Azure Synapse to Fabric.

Intro to Microsoft Fabric

Ed Freeman23/05/2023

Microsoft Fabric unifies data & analytics, building on Azure Synapse Analytics for improved data-level interoperability. Explore its offerings & pros/cons.

Ask the right questions to get your data insights projects back on track

Matthew Adams30/03/2023

Learn about the thinking behind endjin's Power BI Maturity assessment by applying Wardley Doctrine, and asking more questions.

SQLbits 2023 - The Best Bits

Barry Smart23/03/2023

This is a summary of the sessions I attended at SQLbits 2023 in Newport Wales, which is Europe's largest expert led data conference.

Data validation in Python: a look into Pandera and Great Expectations

Liam Mooney08/03/2023

Implement Python data validation with Pandera & Great Expectations in this comparison of their features and use cases.

Customizing Lake Databases in Azure Synapse Analytics

Ed Freeman24/10/2022

Explore Custom Objects in Lake Databases for user-friendly column names, calculated columns, and pre-defined queries in Azure Synapse Analytics.

How to create a semantic model using Synapse Analytics Database Templates

Barry Smart21/10/2022

Explore Azure Synapse Analytics Database Templates and learn to create semantic models in this 2nd blog of the series.

What is a Lake Database in Azure Synapse Analytics?

Ed Freeman18/10/2022

Explore Lake Databases in Azure Synapse Analytics: analyze Dataverse data, share Spark tables, and design models with Database Templates.

Insight Discovery (part 6) – How to define business requirements for a successful cloud data & analytics project

James Broome14/10/2022

Many data projects fail to deliver the impact they should for a simple reason – they focus on the data. This series of posts explains a different way of thinking that will set up your data & analytics projects for success. Using an iterative, action-oriented, insight discovery process, it demonstrates tools and techniques that will help you to identify, define and prioritize requirements in your own projects so that they deliver maximum value. It also explores the synergy with modern cloud analytics platforms like Azure Synapse, explaining how the process and the architecture actively support each other for fast, impactful delivery.

What are Synapse Analytics Database Templates and why should you use them?

Barry Smart12/10/2022

Explore Azure Synapse Analytics Database Templates and learn to leverage them in modern data pipelines.

Insight Discovery (part 5) – Deliver insights incrementally with data pipelines

James Broome07/10/2022

Discover a unique, action-oriented approach to data projects for maximum impact. Learn how to prioritize requirements and leverage cloud analytics platforms.

Insight Discovery (part 4) – Data projects should have a backlog

James Broome30/09/2022

This series focuses on maximizing data projects' impact via an iterative, insight discovery process, and synergy with cloud platforms like Azure Synapse.

Insight Discovery (part 3) – Defining Actionable Insights

James Broome23/09/2022

Discover a unique, action-oriented approach to data projects for maximum impact. Learn how to synergize with platforms like Azure Synapse for fast delivery.

Insight Discovery (part 2) – successful data projects start by forgetting about the data

James Broome16/09/2022

Many data projects fail to deliver the impact they should for a simple reason – they focus on the data. This series of posts explains a different way of thinking that will set up your data & analytics projects for success. Using an iterative, action-oriented, insight discovery process, it demonstrates tools and techniques that will help you to identify, define and prioritize requirements in your own projects so that they deliver maximum value. It also explores the synergy with modern cloud analytics platforms like Azure Synapse, explaining how the process and the architecture actively support each other for fast, impactful delivery.

Insight Discovery (part 1) – why do data projects often fail?

James Broome09/09/2022

Discover a unique, action-oriented approach to data projects for maximum impact. Learn how to synergize with platforms like Azure Synapse for fast delivery.

How to apply behaviour driven development to data and analytics projects

Barry Smart02/09/2022

In this blog we demonstrate how the Gherkin specification can be adapted to enable BDD to be applied to data engineering use cases.

Sharing access to synchronized Shared Metadata Model objects in Azure Synapse Analytics

Ed Freeman01/09/2022

Learn how to grant non-admin users access to Spark synchronized objects with SQL Serverless in Synapse Analytics using the Shared Metadata Model.

What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?

Ed Freeman12/07/2022

Explore Azure Synapse's 'Shared Metadata Model' feature. Learn how it syncs Spark tables with SQL Serverless, its benefits, and tradeoffs.

Excel, data loss, IEEE754, and precision

Ian Griffiths05/07/2022

Explore the impact of Excel's numeric precision rules on identifiers and learn about infamous data loss incidents caused by misuse in this post.

SQLbits 2022 - The Best Bits

Barry Smart25/03/2022

This is a summary of the sessions I attended at SQLbits 2022 in London, which is Europe's largest expert led data conference.

A visual approach to demand management and prioritisation

Barry Smart03/02/2022

Explore a simple, visual approach to prioritisation that aids decision-making and stakeholder engagement.

Testing Power BI Reports with the ExecuteQueries REST API

James Broome14/01/2022

Explore DAX queries for scenario-based testing in Power BI reports to ensure data model validity, rule adherence, and security maintenance.

Why you should care about the Power BI ExecuteQueries API

James Broome07/01/2022

Explore the benefits of Power BI's new ExecuteQueries REST API for developers in tooling, process, and integrations.

Managing schemas in Azure Synapse SQL Serverless

James Broome27/08/2021

Explore Azure Synapse's SQL Serverless for on-demand data lake queries, its benefits, and challenges in managing schemas and maintaining data sync.

Data is the new soil

Barry Smart24/05/2021

Thinking of data as the new soil is useful in highlighting the key elements that enable a successful data and analytics initiative.

How to test Azure Synapse notebooks

James Broome10/05/2021

Explore data with Azure Synapse's interactive Spark notebooks, integrated with Pipelines & monitoring tools. Learn how to add tests for business rule validation.

Do robots dream of counting sheep?

Barry Smart08/02/2021

Some of my thoughts inspired whilst helping out on the farm over the weekend. What is the future of work given the increasing presence of machines in our day to day lives? In which situations can AI deliver greatest value? How can we ease the stress of digital transformation on people who are impacted by it?

How to safely reference a nullable activity output in Azure Synapse Pipelines and Azure Data Factory

Ed Freeman02/02/2021

Discover Azure Data Factory's null-safe operator for referencing activity outputs that may not always exist. Learn to use it effectively.

Learning from Covid-19

Barry Smart12/01/2021

Summary of key themes from the Doing Data Together conference hosted virtually by The Scotsman newspaper and Edinburgh University in November 2020. The conference agenda was pivoted to focus on the use of data to help tackle the Covid-19 pandemic. It provided a fascinating insight into the lessons learned.

How Azure Synapse unifies your development experience

Ian Griffiths11/12/2020

Modern analytics requires a multi-faceted approach, which can cause integration headaches. Azure Synapse's Swiss army knife approach can remove a lot of friction.

How do I know if my data solutions are accurate?

James Broome02/11/2020

Data insights are useless, and even dangerous (as we've seen recently at Public Health England) if they can't be trusted. So, the need to validate business rules and security boundaries within a data solution is critical. This post argues that if you're doing anything serious with data, then you should be taking this seriously.

How to fix the "You need permission to access workspace..." error in Azure Synapse Analytics

Ed Freeman19/10/2020

Fix the "You need permission" error in Azure Synapse Analytics with this guide, addressing its causes and solutions for Data Engineers/Developers.

How to use the Azure CLI to manage access to Synapse Studio

Ed Freeman16/10/2020

Assign roles in Synapse Studio for Azure Synapse Analytics devs using Azure CLI. Accessible by Owners/Contributors of the resource.

The Public Health England Test and Trace Excel error could have been prevented by this one simple step

James Broome12/10/2020

Despite the subsequent media reporting, the loss of 16,000 Covid-19 test results at Public Health England wasn't caused by Excel. This post argues that a lack of an appropriate risk and mitigation analysis left the process exposed to human error, which ultimately led to the loss of data and inaccurate reporting. It describes a simple process that could have been applied to prevent the error, and how it will help if you're worried about ensuring quality or reducing risk in your own business, technology or data programmes.

Does Azure Synapse Link redefine the meaning of full stack serverless?

James Broome06/10/2020

Explore Azure Synapse Link for Cosmos DB's impact on 'full stack serverless', No-ETL, and pay-as-you-query analytics.

How to use SQL Notebooks to access Azure Synapse SQL Pools & SQL on demand

Howard van Rooijen30/09/2020

Wishing Azure Synapse Analytics had support for SQL notebooks? Fear not, it's easy to take advantage rich interactive notebooks for SQL Pools and SQL on Demand.

ArrayPool vs MemoryPool—minimizing allocations in AIS.NET

Ian Griffiths28/09/2020

Tracking down unexpected allocations in a high-performance .NET parsing library.

Deploy an Azure Synapse Analytics workspace using an ARM Template

Ed Freeman05/08/2020

Explore deploying Azure Synapse Analytics workspaces using ARM templates, a popular infrastructure deployment method for organizations.

Azure Synapse Analytics: How serverless is replacing the data warehouse

James Broome15/07/2020

Serverless data architectures enable leaner data insights and operations. How do you reap the rewards while avoiding the potential pitfalls?

Talking about Azure Synapse on Microsoft Mechanics!

Jess Panni18/06/2020

I was recently invited on to Microsoft Mechanics to talk about the new on-demand SQL Serverless offering within Azure Synapse. If you have been following along with my previous blog posts you will know that we've been hard at work applying Azure Synapse against real customer workloads. In the video I take you through the service by solving a real-world IoT problem for one of our telco customers.

Benchmarking Azure Synapse Analytics - SQL Serverless, using Polyglot Notebooks

James Broome11/06/2020

New Azure Synapse Analytics service offers SQL Serverless for on-demand data lake queries. We tested its potential as a Data Lake Analytics replacement.

Does Azure Synapse Analytics spell the end for Azure Databricks?

James Broome26/05/2020

Explore why Microsoft's new Spark offering in Azure Synapse Analytics is a game-changer for Azure Databricks investors.

5 Reasons why Azure Synapse Analytics should be on your roadmap

James Broome21/05/2020

Explore 5 key reasons to choose Azure Synapse Analytics for your cloud data needs, based on years of experience in driving customer outcomes.

Why Power BI developers should care about the read/write XMLA endpoint

James Broome12/05/2020

Whilst "read/write XMLA endpoint" might seem like a technical mouthful, its addition to Power BI is a significant milestone in the strategy of bringing Power BI and Analysis Services closer together. As well as closing the gap between IT-managed workloads and self-service BI, it presents a number of new opportunities for Power BI developers in terms of tooling, process and integrations. This post highlights some of the key advantages of this new capability and what they mean for the Power BI developer.

Testing Power BI Reports using SpecFlow and .NET

James Broome01/05/2020

Ensure Power BI report quality by connecting to tabular models, executing scenario-based specs, and validating data, business rules, and security.

Recording of Azure Oxford talk on combatting illegal fishing with Azure (for less than £10/month)

Carmel Eve24/04/2020

Jess and Carmel recently gave a talk at Azure Oxford on Combatting illegal fishing with Machine Learning and Azure - for less than £10 / month. The recording of that talk is now available for viewing!The talk focuses on the recent work we completed with OceanMind. They run through how to construct a cloud-first architecture based on serverless and data analytics technologies and explore the important principles and challenges in designing this kind of solution. Finally, we see how the architecture we designed through this process not only provides all the benefits of the cloud (reliability, scalability, security), but because of the pay-as-you-go compute model, has a compute cost that we could barely believe!

Testing Power BI Dataflows using SpecFlow and the Common Data Model

James Broome23/04/2020

Ensure reliable insights with endjin's automated quality gates for validating Power BI Dataflows in complex solutions.

Azure Analysis Services - how to save money with automatic shutdown

James Broome17/04/2020

Explore Azure Analysis Services for scalable analytics. Control costs via automation with Powershell & Azure DevOps in multi-environment setups.

Building a proximity detection pipeline

Carmel Eve16/04/2020

Endjin's blog post details their project with OceanMind, using a serverless architecture and machine learning to detect illegal fishing.

Azure Analysis Services: How to update the expression for a calculated column from .NET

James Broome09/04/2020

Learn how to update Azure Analysis Services model schemas in .NET apps using AMO SDK for user-driven analysis.

Optimising C# for a serverless environment

Carmel Eve26/03/2020

Tour our OceanMind project using Azure Functions for real-time marine telemetry processing. Learn optimization techniques for serverless environments.

Azure Analysis Services - How to process an asynchronous model refresh from .NET

James Broome19/03/2020

Learn to use Azure Analysis Services in custom apps via REST API in .NET for efficient async model refreshes.

Introducing Ais.Net - High-Performance Parsing in C#

Ian Griffiths18/03/2020

Explore endjin's high-performance .NET AIS parser, developed for OceanMind, used for tracking commercial ships.

Azure Analysis Services: How to execute a DAX query from .NET

James Broome12/03/2020

Explore endless possibilities with dynamic DAX queries in C# for Azure Analysis Services integration in custom apps using the provided code samples.

British Science Week - inspiring the next generation of data scientists

James Broome10/03/2020

The theme of this year's British Science Week (6 - 15 March 2020) is "Our Diverse Planet". We'll be getting involved by speaking to school children about the work we've been doing with Oxfordshire-based OceanMind (part of the Microsoft AI for Good programme) to help them combat illegal fishing, hopefully inspiring some of the next generation of data scientists!

Azure Analysis Services - How to query all the measures in a model from .NET

James Broome24/02/2020

Explore .NET querying methods for integrating Azure Analysis Services beyond data querying into dynamic UIs and APIs.

Azure Analysis Services: How to open a connection from .NET

James Broome12/02/2020

Learn to integrate Azure Analysis Services in apps by establishing server connections. Follow this guide with code samples for essential scenarios.

Azure Analysis Services - integration options using .NET, REST APIs and PowerShell

James Broome27/01/2020

Explore Azure Analysis Services in custom apps using SDKs, PowerShell cmdlets & REST APIs. Learn to choose the right framework in this guide.

Azure Analysis Services: 8 reasons why you might want to integrate into a custom application

James Broome27/01/2020

Explore Azure Analysis Services' versatility in bespoke analysis products & processes, and learn how it can unlock data insights beyond traditional BI.

AI for Good Hackathon

Ian Griffiths21/01/2020

Endjin attended Microsoft's AI for Good hackathon at the IET in London, highlighting the potential of tech to amplify good deeds.

Building a secure data solution using Azure Data Lake Store (Gen2)

Carmel Eve14/12/2019

In this blog we discuss building a secure data solution using Azure Data Lake. Data Lake has many features which enable fine grained security and data separation. It is also built on Azure Storage which enables us to take advantage of all of those features and means that ADLS is still a cost effective storage option!This post runs through some of the great features of ADLS and runs through an example of how we build our solutions using this technology!

Speaking at NDC London: Combatting illegal fishing with Machine Learning and Azure

Carmel Eve06/12/2019

In January 2020, Carmel is speaking about creating high performance geospatial algorithms in C# which can detect suspicious vessel activity, which is used to help alert law enforcement to illegal fishing. The input data is fed from Azure Data Lake Storage Gen 2, and converted into data projections optimised for high-performance computation. This code is then hosted in Azure Functions for cheap, consumption based processing.

C#, Span and async

Ian Griffiths17/10/2019

Explore how ref struct types like Span<T> in C# enhance performance but pose async method challenges, and learn mitigation techniques.

Increasing performance via low memory allocation in C#

Carmel Eve13/09/2019

Explore techniques for high-performance, low-allocation code in Azure Functions using C#, including data streaming, list preallocation, and Span<T>.

Import and export notebooks in Databricks

Ed Freeman09/09/2019

Learn to import/export notebooks in Databricks workspaces manually or programmatically, and transfer content between workspaces efficiently.

Demystifying machine learning using neural networks

Carmel Eve09/07/2019

Machine learning often seems like a black box. This post walks through what's actually happening under the covers, in an attempt to de-mystify the process!Neural networks are built up of neurons. In a shallow neural network we have an input layer, a "hidden" layer of neurons, and an output layer. For deep learning, there is simply more hidden layers which allows for combining neuron's inputs and outputs to build up a more detailed picture.If you have an interest in Machine Learning and what is really happening, definitely give this a read (WARNING: Some algebra ahead...)!

Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."

Ed Freeman04/07/2019

Explore solutions for JsonDecodeError in Databricks CLI & Clusters. Learn how pre-built CLIs/SDKs simplify requests & authentication in REST APIs.

Using Databricks Notebooks to run an ETL process

Carmel Eve10/05/2019

Explore data analysis & ETL with Databricks Notebooks on Azure. Utilize Spark's unified analytics engine for big data & ML, and integrate with ADF pipelines.

Endjin is a Snowflake Partner

Howard van Rooijen05/05/2019

Snowflake is a cloud native data warehouse platform, that enabled data engineering, data science, data lakes, data sharing and data warehousing. Endjin are very excited to announce our partnership.

Exploring Azure Data Factory - Mapping Data Flows

Carmel Eve30/04/2019

Mapping Data Flows are a relatively new feature of ADF. They allow you to visually build up complex data transformation sequences. This can aid in the streamlining of data manipulation and ETL processes, without the need to write any code! This post gives a brief introduction to the technology, and what this could enable!

Snowflake Connector for Azure Data Factory - Part 2

Jess Panni25/04/2019

Snowflake Connector for Azure Data Factory - Part 1

Jess Panni25/04/2019

Explore the lack of a native Azure Data Factory connector for Snowflake and discover alternatives for integration between these popular platforms.

A conversation about .NET, The Cloud, Data & AI, teaching software engineers and joining endjin with Ian Griffiths

Ian Griffiths31/12/2018

When he joined endjin, Technical Fellow Ian sat down with founder Howard for a Q&A session. This was originally published on LinkedIn in 5 parts, but is republished here, in full. Ian talks about his path into computing, some highlights of his career, the evolution of the .NET ecosystem, AI, and the software engineering life.

Cosmos DB - Request Units charged for processing a Gremlin API request

Howard van Rooijen07/12/2018

If you're using the Gremlin API for Cosmos DB, you can now see how much each operation costs in Request Units.

Overflowing with dataflow part 1: An overview

Carmel Eve12/11/2018

This is the first blog in a series about dataflow. The series focuses on TPL dataflow, but this post gives an overview of dataflow as a whole.The crucial thing to understand when using dataflow is that the data is in control. In most conventional programming languages, the programmer determines how and when the code will run. In dataflow, it is the data that drives how the program executes. The movement of data controls the flow of the program.

Using Python inside SQL Server

Ed Freeman16/01/2018

Learn to use SQL Server's Python integration for efficient data handling. Eliminate clunky transfers and easily operationalize Python models/scripts.

Snap Back to Reality – Month 2 & 3 of my Apprenticeship

Ed Freeman07/11/2017

Learn what types of things an apprentice gets up to at endjin a few months after joining. You could be learning about Neural Networks: algorithms which mimic the way biological systems process information. You could be attending Microsoft's Future Decoded conference, learning about Bots, CosmosDB, IoT and much more. Hopefully, you wouldn't be in hospital after a ruptured appendix!

How to plan your cloud transformation journey

Howard van Rooijen21/07/2017

We've been helping customers adopt Microsoft Azure since 2010, we have produced a lot of thought leadership to help people think about the steps required, the risk involved and how to plan a successful adoption.

Creating a PowerBI report with DirectQuery and multiple SQL Database sources using Elastic Query

Alice Waddicor18/01/2017

Learn to build a Power BI dashboard using DirectQuery and ElasticQuery across multiple databases with Alice Waddicor.

AWS vs Azure vs Google Cloud Platform - Storage & Content Delivery

Jess Panni02/08/2016

Year 2 as a software engineering apprentice at endjin

Alice Waddicor23/03/2016

Alice reflects on year 2, being given more responsibility, diving deeper into all aspects of software delivery, and the good habits she's been building.

Machine Learning - the process is the science

James Broome02/03/2016

What do machine learning and data science actually mean? This post digs into the detail behind the endjin approach to structured experimentation, arguing that the "science" is really all about following the process, allowing you to iterate to insights quickly when there are no guarantees of success.

Embracing Disruption - Financial Services and the Microsoft Cloud

Howard van Rooijen26/02/2016

We have produced an insightful booklet called "Embracing Disruption - Financial Services and the Microsoft Cloud" which examines the challenges and opportunities for the Financial Service Industry in the UK, through the lens of Microsoft Azure, Security, Privacy & Data Sovereignty, Data Ingestion, Transformation & Enrichment, Big Compute, Big Data, Insights & Visualisation, Infrastructure, Ops & Support, and the API Economy.

Machine Learning - mad science or a pragmatic process?

James Broome19/02/2016

This post looks at what machine learning really is (and isn't), dispelling some of the myths and hype that have emerged as the interest in data science, predictive analytics and machine learning has grown. Without any hard guarantees of success, it argues that machine learning as a discipline is simply trial and error at scale – proving or disproving statistical scenarios through structured experimentation.

Who We Are

What We Do

Who We Help

What We Think

Contact Us

DuckLake in Perspective: Advanced Features and Future Implications

Barry Smart30/06/2025

DuckLake in Practice: Hands-On Tutorial and Core Features

Barry Smart30/06/2025

Introducing DuckLake: Lakehouse Architecture Reimagined for the Modern Era

Barry Smart30/06/2025

What is a Data Lakehouse?

Carmel Eve13/05/2025

DuckDB in Practice: Enterprise Integration and Architectural Patterns

Barry Smart30/04/2025

DuckDB in Depth: How It Works and What Makes It Fast

Barry Smart30/04/2025

DuckDB: the Rise of In-Process Analytics and Data Singularity

Barry Smart30/04/2025

Creating Quality Gates in the Medallion Architecture with Pandera

Liam Mooney25/04/2025

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Next Steps)

James Broome14/11/2024

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Pipeline Definition)

James Broome08/11/2024

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Architecture Overview)

James Broome23/10/2024

Carbon Optimised Data Pipelines - minimise CO2 emissions through intelligent scheduling (Introduction)

James Broome18/10/2024

Launchpad to Success: Building and Leading Your Data Team

Barry Smart28/06/2024

Data is a socio-technical endeavour

Barry Smart19/04/2024

Data and AI Engineering Maturity - Fix our problems before we hit the buffers

Matthew Adams27/03/2024

SQLbits 2024 - The Best Bits

Barry Smart26/03/2024

Introduction to Python Logging in Synapse Notebooks

Jonathan George07/03/2024

Adopt A Product Mindset To Maximise Value From Microsoft Fabric

Barry Smart31/08/2023

Exploring Strategies Enabled By Microsoft Fabric

Barry Smart25/08/2023

Developing a Data Mesh Inspired Vision Using Microsoft Fabric

Barry Smart14/08/2023

How Does Microsoft Fabric Measure Up To Data Mesh?

Barry Smart07/08/2023

Microsoft Fabric Is A Socio-Technical Endeavour

Barry Smart01/08/2023

Copilot - Are You Ready to Unleash the Power of AI in Self Service Analytics?

James Broome09/06/2023

Microsoft Fabric: Announced

Ed Freeman23/05/2023

What is OneLake?

Ed Freeman23/05/2023

Azure Synapse Analytics versus Microsoft Fabric: A Side by Side Comparison

Barry Smart23/05/2023

Intro to Microsoft Fabric

Ed Freeman23/05/2023

Ask the right questions to get your data insights projects back on track

Matthew Adams30/03/2023

SQLbits 2023 - The Best Bits

Barry Smart23/03/2023

Data validation in Python: a look into Pandera and Great Expectations

Liam Mooney08/03/2023

Customizing Lake Databases in Azure Synapse Analytics

Ed Freeman24/10/2022

How to create a semantic model using Synapse Analytics Database Templates

Barry Smart21/10/2022

What is a Lake Database in Azure Synapse Analytics?

Ed Freeman18/10/2022

Insight Discovery (part 6) – How to define business requirements for a successful cloud data & analytics project

James Broome14/10/2022

What are Synapse Analytics Database Templates and why should you use them?

Barry Smart12/10/2022

Insight Discovery (part 5) – Deliver insights incrementally with data pipelines

James Broome07/10/2022

Insight Discovery (part 4) – Data projects should have a backlog

James Broome30/09/2022

Insight Discovery (part 3) – Defining Actionable Insights