Skip to content
Mike Evans-Larah By Mike Evans-Larah Software Engineer III
Building data quality into Microsoft Fabric

Data quality issues are one of the biggest silent killers of analytics initiatives. Teams invest significant time and resources into building dashboards and reports, only to discover their data pipeline was feeding them incorrect information all along.

The hidden cost of poor data quality

We’ve seen this play out in many organizations: ETL jobs run without errors, reports refresh on schedule, and stakeholders confidently act on the data in front of them - until someone takes a closer look and uncovers a critical flaw in the underlying data.

The impact goes beyond just incorrect numbers. Poor data quality erodes trust in your analytics platform, creates expensive firefighting exercises, and can lead to costly business decisions based on faulty information.

We've learned that traditional approaches to data quality - where you build first and validate later - simply don't work. By the time you discover quality issues, they've already propagated through your entire analytics ecosystem.

A validation-first approach

The solution isn't just better testing (although testing forms an important part of the story too - see James Broome's talk on How to ensure quality and avoid inaccuracies in your data insights) - it's rethinking how to approach data quality from the ground up. We use a "validation-first" mindset, where data quality checks become first-class citizens in the pipeline design.

Here are four strategic principles we follow when building data quality into Fabric implementations:

Principle 1: Validate early and often in your pipeline design

The earlier you catch data quality issues, the cheaper they are to fix. We weave validation checks throughout the entire data pipeline.

In Fabric, this means building validation logic directly into your data engineering pipelines using notebooks and dataflows. Instead of waiting until data reaches your lakehouse, we validate at multiple checkpoints:

  • Source validation: Check data quality as it enters Fabric from external systems
  • Transformation validation: Verify data integrity after each major transformation step
  • Business rule validation: Ensure data meets your organization's specific requirements
  • Output validation: Final checks before data reaches consumption layers

This multi-layered approach aligns well with medallion architecture principles, where data quality is maintained at each stage of the data lifecycle. For a deep-dive of how this can look in practice, see Creating Quality Gates in the Medallion Architecture with Pandera.

Principle 2: Build rich, actionable error reporting from the start

Generic error messages like "data validation failed" are useless when an urgent issue arises in production. We design our validation systems to provide rich, contextual information that helps teams quickly identify and resolve problems.

Microsoft Fabric Weekly is a summary of the week's top news to help you build on the Microsoft Fabric Platform.

This means building validation reports that are user-friendly and include details like: specific error descriptions, which records are problematic and why, and suggested next steps for resolution.

These can be shared in the form of HTML-based emails, Power BI reports, or custom web applications, depending on the needs of the team.

NOTE: Be wary to ensure that any sensitive information is appropriately redacted when sharing validation reports.

Principle 3: Create feedback loops that teams actually use

The best validation system in the world is worthless if teams ignore the alerts. We design feedback systems that integrate naturally into how teams already work.

Alerts should arrive in the right channel at the right time (and be directed at the right people). Combined with rich, contextual information in the validation reports, this ensures that teams have everything they need to take action.

Fabric's integration with Microsoft Teams and email systems makes it easy to create notification workflows that fit into existing team communication patterns.

Principle 4: Integrate quality metrics into your monitoring strategy

Data quality shouldn't be an afterthought in your monitoring approach - it should be a core metric alongside performance and availability. Validation results should be captured and analyzed just like any other operational metric.

Fabric's built-in workspace monitoring capabilities, combined with Power BI dashboards, enable teams to track data quality metrics alongside other operational KPIs and visualize trends over time.

Steps can be taken to automate the collection and reporting of these metrics, ensuring that data quality remains a top priority.

The results

Implementing these principles in Fabric environments, you can expect:

  • Faster issue resolution: Teams can identify and fix data problems in minutes rather than hours or days
  • Increased stakeholder confidence: Business users trust the data because they see consistent quality
  • Reduced firefighting: Fewer emergency meetings about "why the numbers are wrong"

Perhaps most importantly, teams can feel more confident about their analytics outputs. When you know your data quality processes are robust, you can focus on deriving insights rather than constantly second-guessing your numbers.

Mike Evans-Larah

Software Engineer III

Mike Evans-Larah

Mike is a Software Engineer at endjin with over a decade of experience in solving business problems with technology. He has worked on a wide range of projects for clients across industries such as financial services, recruitment, and retail, with a strong focus on Azure technologies.