Skip to content
· 13 min watch
Barry Smart By Barry Smart Director of Data & AI

A deep dive into how we go about testing data wrangling logic in Microsoft Fabric, and achieve cleaner, more modular code that is easier to understand and maintain.

About this talk

Course

In part 3 of this course Barry Smart, Director of Data and AI, walks through a demo showing how to apply a test driven development approach to Microsoft Fabric Notebooks that will allow you to establish a set of tests that can be automated, whilst also driving code that is clean, extensible, re-usable and easy to understand.

He will focus on the notebook which applies the data wrangling steps to "project to gold". Focusing in on the logic which is used to clean and enrich the passenger data for the Titanic.

Barry splits this logic into 3 notebooks:

  • The first defines the functionality as a series of discrete data wrangling functions, wrapped up in a Titanic Wrangler class. He uses the Pandas pipe method to chain these individual functions together to perform all of the tasks necessary to clean and enrich the passenger data.
  • The second notebook tests this functionality, by using the Arrange, Act, Assert (AAA) pattern.
  • The final notebook puts this functionality in use as part of the wider "project to gold" process which projects a fact table and a set of dimension tables to the Gold area of the lake in Delta format.

Barry begins the video by explaining the architecture that is being adopted in the demo including Medallion Architecture and DataOps practices. He explains how these patterns have been applied to create a data product that provides Diagnostic Analytics of the Titanic data set. This forms part of an end to end demo of Microsoft Fabric that we will be providing as a series of videos over the coming weeks.

Chapters:

  • 00:00 Introduction and Video Overview
  • 00:46 Project to Gold Pipeline
  • 01:08 Benefits and Pitfalls of Notebooks
  • 04:20 Addressing Notebook Pitfalls with DataOps
  • 05:12 Test-Driven Development in Data Engineering
  • 08:03 Implementing the Titanic Wrangler Class
  • 09:17 Testing the Titanic Wrangler Class
  • 10:52 Running the Code in Production
  • 12:15 Conclusion and Next Steps

From Descriptive to Predictive Analytics with Microsoft Fabric:

Microsoft Fabric End to End Demo Series:

Microsoft Fabric First Impressions:

Decision Maker's Guide to Microsoft Fabric

and find all the rest of our content here.

About the presenter

Barry Smart

Director of Data & AI

Barry Smart

Barry has spent over 25 years in the tech industry; from developer to solution architect, business transformation manager to IT Director, and CTO of a £100m FinTech company. In 2020 Barry's passion for data and analytics led him to gain an MSc in Artificial Intelligence and Applications.