Browse our archives by topic…
Data Engineering
Per-Property Rows from JSON in Spark on Microsoft Fabric
Spark doesn't always interpret JSON how we'd like. For example, if each key/value pair in a JSON object is conceptually one item, Spark won't give you a row per item by default. This article shows how to nudge Spark in the right direction.
Introduction to Python Logging in Synapse Notebooks
The first step on the road to implementing observability in your Python notebooks is basic logging. In this post, we look at how you can use Python's built in logging inside a Synapse notebook.
Star Schemas are fundamental to unleashing value from data in Microsoft Fabric
Ralph Kimble's 1996 Star Schema principles still apply in Cloud Native Analytics.
Adopt A Product Mindset To Maximise Value From Microsoft Fabric
In this post I describe how adopting a product mindset will help you to extract maximum value from Microsoft Fabric.
Exploring Strategies Enabled By Microsoft Fabric
Explore building situational awareness and leveraging strategic opportunities with Microsoft Fabric in this concise overview.
Developing a Data Mesh Inspired Vision Using Microsoft Fabric
Explore Microsoft Fabric, inspired by Data Mesh, for a data-driven strategy. Learn to approach a Data Mesh vision using this powerful tool.
How Does Microsoft Fabric Measure Up To Data Mesh?
Explore Data Mesh's influence on Microsoft Fabric, addressing gaps in data product marketplace, standards, master data management, and governance.
Microsoft Fabric Is A Socio-Technical Endeavour
Creating a successful organisation-wide data and analytics platform isn't just about architecture, schemas and semantic models. It's also about culture, organisational design and people. This blog explores the socio-technical nature of data and analytics and how this should influence your approach to adoption of Microsoft Fabric.
Azure Synapse Analytics versus Microsoft Fabric: A Side by Side Comparison
In this Microsoft Fabric vs Synapse comparison we examine how features map from Azure Synapse to Fabric.
Data validation in Python: a look into Pandera and Great Expectations
Implement Python data validation with Pandera & Great Expectations in this comparison of their features and use cases.
How to setup Python, PyEnv & Poetry on Windows
Explore using Python virtual environments & Poetry on Windows for smoother workflows, with a script & guide to enhance your dependency management experience.
How To Implement Continuous Deployment of Python Packages with GitHub Actions
Discover using GitHub Actions for auto-updates to Python packages on PyPI, assessing its role in Continuous Deployment.
Customizing Lake Databases in Azure Synapse Analytics
Explore Custom Objects in Lake Databases for user-friendly column names, calculated columns, and pre-defined queries in Azure Synapse Analytics.
How to create a semantic model using Synapse Analytics Database Templates
Explore Azure Synapse Analytics Database Templates and learn to create semantic models in this 2nd blog of the series.
Continuous Integration with GitHub Actions
This post gives an overview of Continuous Integrations and shows how you can implement it with GitHub Actions, with an accompanying example Python project
How to apply behaviour driven development to data and analytics projects
In this blog we demonstrate how the Gherkin specification can be adapted to enable BDD to be applied to data engineering use cases.
What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?
Explore Azure Synapse's 'Shared Metadata Model' feature. Learn how it syncs Spark tables with SQL Serverless, its benefits, and tradeoffs.
Extract insights from tag lists using Python Pandas and Power BI
Discover how to extract insights from spreadsheets and CSV files using Pandas and Power BI in this blog post.
Introduction to Containers and Docker
Explore containerisation & Docker for app development & deployment. Learn to create containerised applications with examples in this intro guide.
How to test Azure Synapse notebooks
Explore data with Azure Synapse's interactive Spark notebooks, integrated with Pipelines & monitoring tools. Learn how to add tests for business rule validation.
How Azure Synapse unifies your development experience
Modern analytics requires a multi-faceted approach, which can cause integration headaches. Azure Synapse's Swiss army knife approach can remove a lot of friction.
How to use SQL Notebooks to access Azure Synapse SQL Pools & SQL on demand
Wishing Azure Synapse Analytics had support for SQL notebooks? Fear not, it's easy to take advantage rich interactive notebooks for SQL Pools and SQL on Demand.
Azure Synapse for C# Developers: 5 things you need to know
Did you know that Azure Synapse has great support for .NET and #csharp? Learning new languages is often a barrier to digital transformation, being able to use existing people, skills, tools and engineering disciplines can be a massive advantage.
Import and export notebooks in Databricks
Learn to import/export notebooks in Databricks workspaces manually or programmatically, and transfer content between workspaces efficiently.
Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."
Explore solutions for JsonDecodeError in Databricks CLI & Clusters. Learn how pre-built CLIs/SDKs simplify requests & authentication in REST APIs.
Using Databricks Notebooks to run an ETL process
Explore data analysis & ETL with Databricks Notebooks on Azure. Utilize Spark's unified analytics engine for big data & ML, and integrate with ADF pipelines.
Using Python inside SQL Server
Learn to use SQL Server's Python integration for efficient data handling. Eliminate clunky transfers and easily operationalize Python models/scripts.