Skip to content
Jessica Hill By Jessica Hill Software Engineer I
How to access multi-select choice column choice labels from Azure Synapse Link for Dataverse with PySpark or SQL

When you export a table from Dataverse into Synapse using the Azure Synapse Link for Dataverse, the multi-select choice columns in your linked tables will display a list of numerical choice values. This isn't very meaningful to the end-user, what we really want to display in our choice columns is the associated text choice labels. This blog post explains how to access the multi-select choice column choice labels from Azure Synapse Link for Dataverse using PySpark, Spark SQL or T-SQL through SQL Serverless.

Multi-select choice columns in Dataverse

A multi-select choice column, is a type of column that can be used in a Microsoft Dataverse table. In a multi-select choice column, the user can select one or more values from a set of options using a drop-down list control.

Dataverse multi-select choice column.

A multi-select choice column can be created in Dataverse by ticking the 'Selecting multiple choices is allowed' check-box when creating the choice column.

Creating a Dataverse multi-select choice column.

When you export a table from Dataverse into Synapse using the Azure Synapse Link for Dataverse, the multi-select choice columns in your linked tables will display a semi-colon separated list of numerical choice values.

T-SQL query over the dv_order_details table in the Azure Synapse Link for Dataverse database.

This isn't very meaningful to the end-user, what we really want to display in our choice columns is the associated text choice labels, which are stored in the OptionsetMetadata table.

T-SQL query over the OptionsetMetadata table in the Azure Synapse Link for Dataverse database.

For detailed, contextual background on choice values and choice labels in Dataverse, and the storage of of choice labels in the OptionsetMetadata table of the Azure Synapse Link for Dataverse database, please refer back to first blog post in this series: How to access choice labels from Azure Synapse Link for Dataverse using PySpark.

Replacing the choice values with the choice labels using PySpark in a Synapse Notebook

First you will need to read your data from the Azure Synapse Link for Dataverse database into PySpark dataframes in a Synapse Notebook. For an explanation on how to do this, please refer back to first blog post in this series: How to access choice labels from Azure Synapse Link for Dataverse using PySpark.

Then you can replace the numerical choice values in your linked Dataverse table with the text choice labels using the following PySpark function:

from pyspark.sql.functions import split, explode

def swap_choice_value_with_choice_label(core_table_df, choice_options_df, column_name, table_name, option_set_name):
    # Explode core_table_df on the column_name
    exploded_df = core_table_df.withColumn(column_name, explode(split(core_table_df[column_name], ';')))
    
    # Join exploded_df with the choice_options_df using table_name and option_set_name as additional join conditions
    joined_df = exploded_df.join(
        choice_options_df,
        (exploded_df[column_name] == choice_options_df['Option']) &
        (choice_options_df['EntityName'] == table_name) &
        (choice_options_df['OptionSetName'] == option_set_name),
        "left"
    ).drop(column_name, 'Option', 'EntityName', 'OptionSetName').withColumnRenamed('LocalizedLabel', column_name)
    
    return joined_df
The best hour you can spend to refine your own data strategy and leverage the latest capabilities on Azure to accelerate your road map.

Here is an example implementation of the above PySpark function. This is how we transform a linked Dataverse table called dv_orderdetails, exploding and replacing the numerical choice values in the dv_products column with the text choice labels stored in the LocalizedLabel column of the OptionsetMetadata table:

core_table_df # This is your PySpark dataframe for your linked Dataverse table
choice_options_df # This is your PySpark dataframe for the OptionsetMetadata table

column_name = 'dv_products' # This is the name of the choice column in Synapse
table_name = 'dv_orderdetails' #  This is the schema name of the core table in Dataverse
option_set_name = 'dv_products' # This is the schema name of the choice column in Dataverse

final_df = swap_choice_value_with_choice_label(core_table_df, choice_options_df, column_name, table_name, option_set_name)

And here is the resulting PySpark dataframe:

Final PySpark dataframe in Synapse notebooks with the choice labels.

Replacing the choice values with the choice labels using Spark SQL in a Synapse Notebook

You can also replace the numerical choice values in your linked Dataverse table with the text choice labels using the following Spark SQL query within a Synapse Notebook:

SELECT
    <'insert column 1 name here'>,
    <'insert column 2 name here'>,
    /* List all of the columns you would like to return from your linked table */
    LocalizedLabel AS <'assign alias here'>
FROM (
    SELECT
    <'insert column 1 name here'> AS <'assign alias here'>,
    <'insert column 2 name here'> AS <'assign alias here'>,
    explode(split(<'insert choice column name here'>, ';')) AS <'assign choice column id alias here'>
    FROM <'insert database name here'>.<'insert table name here'>
) tn

LEFT JOIN <'insert database name here'>.OptionsetMetadata om
    ON tn.<'choice column id alias'> = om.Option

Here is an example implementation of the above Spark SQL query. This is how we would query a linked Dataverse table called dv_orderdetails, exploding and replacing the numerical choice values in the dv_products column with the text choice labels stored in the LocalizedLabel column of the OptionsetMetadata table:

SELECT 
    order_details_id,
    customer,
    LocalizedLabel AS products
FROM (
    SELECT
    dv_orderdetailsid AS order_details_id,
    dv_customer AS customer,
    explode(split(dv_products, ';')) AS product_id
    FROM dataverse_edfreemansen.dv_orderdetails
) od

LEFT JOIN dataverse_edfreemansen.OptionsetMetadata om
    ON od.product_id = om.Option;

And here is result of the query:

Result of Spark SQL query to select columns and replace numerical choice values with text choice labels.

Replacing the choice values with the choice labels using T-SQL through SQL Severless

You can also replace the numerical choice values in your linked Dataverse table with the text choice labels using the following T-SQL query within a SQL script in Synapse:

SELECT
    [<'insert column 1 name here'>] AS [<'assign alias here'>],
    [<'insert column 2 name here'>] AS [<'assign alias here'>],
    /* List all of the columns you would like to return from your linked table */
    [LocalizedLabel] AS [<'assign alias here'>]
FROM [<'insert database name here'>].[dbo].[<'insert table name here'>] 
    CROSS APPLY STRING_SPLIT([<'insert choice column name here'>], ';') AS tn

LEFT JOIN [<'insert database name here'>].[dbo].[OptionsetMetadata] om
    ON tn.value = om.[Option]

Here is an example implementation of the above T-SQL query. This is how we would query a linked Dataverse table called dv_orderdetails, exploding and replacing the numerical choice values in the dv_products column with the text choice labels stored in the LocalizedLabel column of the OptionsetMetadata table:

SELECT
    [dv_orderdetailsid] AS [order_details_id],
    [dv_customer] AS [customer],
    [LocalizedLabel] AS [products]
FROM [dataverse_edfreemansen].[dbo].[dv_orderdetails]
    CROSS APPLY STRING_SPLIT([dv_products], ';') AS od

LEFT JOIN [dataverse_edfreemansen].[dbo].[OptionsetMetadata] AS om
    ON od.value = om.[Option];

And here is result of the query:

Result of T-SQL query to select columns and replace numerical choice values with text choice labels.

Conclusion

This blog post has explored how to access multi-select choice column choice labels from Azure Synapse Link for Dataverse using PySpark, Spark SQL or T-SQL through SQL Serverless.

FAQs

Why do multi-select choice columns show numerical values instead of text labels when exported from Dataverse to Synapse? Azure Synapse Link for Dataverse exports multi-select choice columns as semicolon-separated numerical choice values. The human-readable text labels are stored separately in the OptionsetMetadata table and must be joined to get meaningful values.
How do I handle semi-colon separated values in multi-select choice columns using SQL? In T-SQL through SQL Serverless, use CROSS APPLY with STRING_SPLIT to split the semi-colon separated values into individual rows. In Spark SQL, use the explode and split functions to achieve the same result.
What is the OptionsetMetadata table and how do I use it to get choice labels? The OptionsetMetadata table is automatically created by Azure Synapse Link for Dataverse and contains the mapping between numerical choice values (Option column) and their text labels (LocalizedLabel column). Join this table with your data using the Option value, EntityName, and OptionSetName to retrieve the labels.
How do you replace choice values with labels using PySpark in a Synapse Notebook? Use a function that explodes the semi-colon separated values using split and explode functions, then joins the exploded dataframe with the OptionsetMetadata dataframe on the Option value, EntityName, and OptionSetName columns to retrieve the LocalizedLabel text values.
What is the difference between a single-select and multi-select choice column in Dataverse? A multi-select choice column allows users to select one or more values from a drop-down list, whereas a single-select choice column only allows one selection. Multi-select columns are created by ticking the 'Selecting multiple choices is allowed' checkbox when creating the choice column in Dataverse.

Jessica Hill

Software Engineer I

Jessica Hill

Jessica comes from a Biosciences background, having gained a 1st Class Bachelor of Science in Biology from The University of Manchester.

During the lockdown Jessica used the opportunity to explore her interest in technology by taking two Code First Girls online courses; Introduction to Web Development and Python Programming. This led Jessica to look for job opportunities in the technology sector and she joined endjin's 2021 apprenticeship cohort, which had over 200 applicants.

Since joining endjin, Jessica has developed deep expertise in Power BI, DAX, and the Azure data platform. She is a Microsoft Certified Power BI Data Analyst Associate (PL-300) and works across data engineering and analytics projects using technologies such as Microsoft Fabric, Azure Synapse Analytics, Azure Databricks, and Power BI.

Jessica has written extensively about her areas of expertise, with blog posts covering topics from DAX deep dives and Power BI performance optimisation to Azure notebooks and C# pattern matching.