Power Query - Where can you use it? - Power BI
What is Power Query?
Power Query is data processing / transformation engine capable of performing a wide variety of ETL tasks and allowing users to "mash-up" data across sources. It has a graphical interface for building up the queries and transformations. Under the covers, it uses the Power Query M formula language for expressing its queries, which can be edited directly for advanced queries. There are hundreds of connectors that can used for reading data across different sources.
The engine began life as part of the Power Pivot extension for Excel, used for getting and transforming data to use in Pivot Tables.
Since then, it has been integrated across many more applications in the Microsoft ecosystem, and we'll take a look at them in this series of posts. In this post, we look at Power BI integrations.
Power BI
There are a few different ways of using Power Query from within Power BI.
Power BI Desktop
Firstly, you can use Power BI Desktop to utilize Power Query.
Similar to Excel, you can start by either using the 'Get Data' button or select one of the common data sources to use a wizard interface to connect to a data source, or use 'Transform Data' to open the Power Query Editor directly.
Once you have finished building your query, the data is loaded into the report's model and can be used for building visualizations.
How data is refreshed in Power BI depends on the storage modes for the datasets used in the report. Each dataset can use one of the following storage modes:
- Import
- DirectQuery
- LiveConnect
- Push
Only datasets using Import mode require refreshing; the DirectQuery and LiveConnect query the underlying datasets on every user interaction, and Push mode requires data to be pushed into the dataset externally.
For Import mode datasets, refresh can either be done manually (for example, using the 'Refresh' button in Power BI Desktop) or, once the report is published to a Power BI workspace, refresh can be scheduled to run on a regular basis.
Power BI Dataflows
In Power BI online, you can create dataflows, which are reusable units of transformation logic powered by Power Query. You need either a Pro or PPU (Premium Per User) license to create dataflows, or to be using a Premium capacity workspace.
To create a dataflow, in the PBI workspace, select 'Dataflow' under the + New
menu:
Then the basic option is to create a new table in your dataflow. This will take you to page where you can select your data source, either connecting to an online source or uploading a file:
Once you have selected your data source and configured any necessary credentials for connecting to it, you will be able to edit the transformation using the familiar Power Query editor.
Once you have saved the transformation, the table will be available in the dataflow. You will be offered an opportunity to either refresh the data now or set up a refresh schedule for the dataflow.
If using a refresh schedule, it supports up to 8 refreshes of the data per day:
There is also the option of using incremental refresh for tables in the dataflow. This allows you to process only a subset of the data based on a DateTime column in your table, selecting the period in which to store data and the period in which to refresh data. This can be beneficial for large datasets to reduce refresh time, increase reliability and reduce resource consumption
To consume the tables in the dataflow, you can connect to the dataflow when building reports in Power BI Desktop:
Power BI Datamarts
The third option for using Power Query in Power BI is through Datamarts.
Power BI Datamarts use Power Query in a near identical way to as described above for Power BI Dataflows. Creating a Datamart requires a Premium workspace, but other than that the Power Query features and processes are the same.