Skip to content
  1. We help small teams achieve big things.
  2. Our Technical Fellow just became a Microsoft MVP!
  3. We run the Azure Weekly newsletter.
  4. We just won 2 awards.
  5. We just published Programming C# 8.0 book.
  6. We run the Power BI Weekly newsletter.
  7. Our NDC London 2020 talk is now available online!
  8. We are school STEM ambassadors.
Carmel Eve By Carmel Eve Software Engineer I
Data modelling with Power BI - Loading and shaping data

I have recently written a series of blog post on DAX and Power BI. But in order to effectively work with the data we need to structure the underlying data model as best to support higher level representations we need. Data modelling includes loading, shaping, cleansing and enhancing the data.

Loading data

In Power BI we can load data from multiple sources:

  • Databases
  • Files
  • Web
  • Azure
  • Online services
  • Power BI datasets & dataflows

There are also two options for data loading: import or direct query.

In direct querying a direct connection is established to the data source. In this mode the data stays in the source, and it's queried each time the report is loaded. This does restrict you in a couple of ways:

  • You are limited to one database
  • You can't add calculated tables on the model

You can however add calculated columns to the model.

Best practices

There are a few best practices for data loading. It is a good idea to reduce the memory used by the model as quickly as possible. To achieve this, you can:

  • Reduce the rows
  • Reduce the columns

It can also be a good idea to merge tables which have a common column. This can reduce the number of tables and therefore the complexity within the model.

Shaping data

There are many different ways in which we can shape data to support better reporting. Some of these can be done on import and some are done once the data has been loaded. This can include:

  • Combining tables with the same schema
  • Filtering rows / columns
  • Appending queries
  • Merging queries
  • Renaming columns
  • Fixing data types

For example...

We can connect to an excel workbook:

Connecting to an excel workbook from Power BI.

And from this navigator we can transform the data via the query editor. We can see the steps performed on the data in the query editor:

Viewing steps performed using the query editor.

We can even rename the steps to make it clearer what's been done to the data:

Rename "ChangedType1" step to "Changed type of chocolate".

Power BI is usually pretty good at guessing the types of the data, but sometimes these types need to be updated. Using this query editor we can remove null data, replace values, etc.

A view over the data.

Showing a variety of transform steps, changing types, replacing nulls, splitting columns, renaming columns.

We can see all the individual steps in the query. These include removing blank rows, replacing empty values with nulls, and splitting the name column in to first and last name. After these transformations the data looks like this:

Showing the data after these transformations.

We can also see that there are values in this table which are in a list form. We also need to take a look at the model and see if our table structure makes sense. In our model we have the above table, along with two others. One for the parents:

Showing parents table.

And one linking the children with their parents:

Showing the link table between children and parents.

Now looking at this model we have three tables:

  • Parents - which contains information about parents, their date of birth and the city they live in.
  • Children - which contains the children and connects those children with their parents.
  • Children2 - which contains information about the children (their favourite colour, animal, etc.)

It would make sense to merge the two tables which contain data about the children:

Merging the children and the links tables.

And once we've done that we can then see the merge in the applied steps (including the removal of duplicate data and renaming of the expanding Children2 columns).

Showing the steps for renaming the merged columns.

Now we have two tables: parents and children. These tables can be linked in the data model:

Showing the relationship between the two tables.

Once we've done this, we can add additional values to the parents table based off of their children:

Showing calculated column on parents table which calculates the number of children each parent has.

The final thing we could do is expand the listed values in the "Interests" column. A list like this will be difficult to represent in the visuals because you will end up with a filter for each permutation:

Showing bar chart with bar for each combination of interests.

Duplicating the table copies all of the query steps into a second table. If we instead referenced the table, then each change in the first table would propagate into the second. Once we have duplicated it, we can perform the following steps:

Steps for splitting up interests in a separate table.

Which leaves us with an "Interests" table like this, which is linked to our children table.

Showing the interests table.

With our data model now complete we can set the Children2 table (which we merged with the Children table) to be hidden in report view and our data model looks like this:

Showing the relationships between all of the tables.

And we can now start building up interesting visuals, combining values from both tables:

Showing a graph of children count by interest and whether they like chocolate.

We have built up a model which allows us to support fairly complex visualisations but which is also simple enough to be discoverable.

Carmel Eve

Software Engineer I

Carmel Eve

Carmel has recently graduated from our apprenticeship scheme.

Over the past four years she has been focused on delivering cloud-first solutions to a variety of problems. These have ranged from highly-performant serverless architectures, to web applications, to reporting and insight pipelines and data analytics engines. She has been involved in every aspect of the solutions built, from deployment, to data structures, to analysis, querying and UI, as well as non-functional concerns such as security and performance.

Throughout her apprenticeship, she has written many blogs, covering a huge range of topics. She has also given multiple talks focused on serverless architectures. The talks highlighted the benefits of a serverless approach, and delved into how to optimise the solutions in terms of performance and cost.

She is also passionate about diversity and inclusivity in tech. Last year, she became a STEM ambassador in her local community and is taking part in a local mentorship scheme. Through this work she hopes to be a part of positive change in the industry.

Carmel won "Apprentice Engineer of the Year" at the Computing Rising Star Awards 2019.