Skip to content
Carmel Eve By Carmel Eve Software Engineer II · 2 min read
Optimising DAX: Why Cardinality Matters

Hello again! In the previous post, we covered VertiPaq's three encoding techniques (value encoding, hash encoding, and run-length encoding) for compressing your data. From all of that, we can see that alongside the number of columns, the biggest factor in how much room your model takes up is the cardinality of those columns - the number of unique values.

Why Cardinality Drives Everything

Think back to the encoding techniques:

Hash encoding stores a dictionary of unique values. Higher cardinality means a larger dictionary, which means more memory.

Run-length encoding stores how many times values repeat in sequence. Higher cardinality means values change more frequently, so you get shorter runs and less compression.

So a column with 10 unique values and a million rows will compress brilliantly. A column with a million unique values and a million rows will barely compress at all.

And it's not just about memory. Scanning a high-cardinality column is slower too - there's simply more data to read through in the run-length encoded representation. If you do a SUM over a column with high cardinality, it will take noticeably longer than the same operation over a low-cardinality column.

Practical Ways to Reduce Cardinality

Some columns are naturally high-cardinality and you can't do much about it (primary keys, for example). But there are often easy wins hiding in your model.

Datetimes

Datetime columns are a classic culprit. A datetime with second-level precision across a few years can easily have millions of unique values. Some easy ways to reduce this:

  • Split the date and time into separate columns - times are repeated each day, and dates will have far lower cardinality than the combined column.
  • Reduce time precision - do you really need seconds, or would hours suffice?
  • Remove the time entirely if it's not needed for your analysis.

Floating-Point Precision

Fix floating-point precision before import. A column of decimal values calculated to 15 decimal places has far higher cardinality than the same values rounded to 2 places. If your reporting doesn't need that precision, it's an easy win.

Pre-Aggregation

Group data prior to Power BI where you can. If you can aggregate in your ETL pipeline, you reduce both row count and column cardinality. This is particularly relevant if you're loading transactional data that could be summarised at a higher grain.

Finding Problem Columns

You can use DAX Studio to inspect your model size and identify "problem" columns with unexpectedly high cardinality or large memory footprints. You can do this by opening your Power BI report, opening DAX studio, and connecting to the model. If you then go on the "Advanced" tab and click "View Metrics", this will run the VertiPaq analyzer and give you your model statistics (number of columns, column sizes, cardinality, etc.).

Screenshot of DAX Studio showing model statistics with high-cardinality columns highlighted

It's worth doing this periodically - you might be surprised which columns are taking up the most space.

What's Next

So cardinality matters a lot for compression and scan speed. But it also has a major impact on something else: the cost of traversing relationships between tables. Look out for the next post where we'll dig into that.

FAQs

Why does column cardinality matter in Power BI? Higher cardinality means larger hash encoding dictionaries, less effective run-length encoding, and therefore larger model sizes and slower column scans. Reducing cardinality where possible is one of the most impactful model optimisations.
How can I reduce datetime cardinality in Power BI? Split date and time into separate columns, reduce time precision (e.g. round to nearest hour), or remove the time component if it's not needed for analysis.

Carmel Eve

Software Engineer II

Carmel Eve

Carmel is a software engineer and LinkedIn Learning instructor. She worked at endjin from 2016 to 2021, focused on delivering cloud-first solutions to a variety of problems. These included highly performant serverless architectures, web applications, reporting and insight pipelines, and data analytics engines. After a three-year career break spent travelling around the world, she rejoined endjin in 2024.

Carmel has written many blog posts covering a huge range of topics, including deconstructing Rx operators, agile estimation and planning and mental well-being and managing remote working.

Carmel has released two courses on LinkedIn Learning - one on the Az-204 exam (developing solutions for Microsoft Azure) and one on Azure Data Lake. She has also spoken at NDC, APISpecs, and SQLBits, covering a range of topics from reactive big-data processing to secure Azure architectures.

She is passionate about diversity and inclusivity in tech. She spent two years as a STEM ambassador in her local community and taking part in a local mentorship scheme. Through this work she hopes to be a part of positive change in the industry.

Carmel won "Apprentice Engineer of the Year" at the Computing Rising Star Awards 2019.