Skip to content
Ed Freeman By Ed Freeman Software Engineer I
Sharing access to synchronized Shared Metadata Model objects in Azure Synapse Analytics

Shared Metadata Model / Shared Metastore recap

As described in the previous article What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?, the shared metadata model synchronizes Spark tables with SQL Serverless such that you can query your Spark table data using normal T-SQL through the SQL Serverless endpoint.

Remember, the underlying data that forms these Spark tables isn't stored in the SQL Serverless DB - rather the underlying data is stored in your ADLS account. So, just like when querying data using OPENROWSET, you must make sure that the caller has the appropriate permissions to read the corresponding files in the lake. This can be done using RBAC (+ABAC) roles at the storage account/filesystem level, or by using ACLs at the directory/file level.

In this article we'll take a look at various scenarios you might encounter when configuring access control on the synchronized objects in the shared metadata model.

How to assign access to the Shared Metadata Model objects

Up until recently, there were two primary ways of assigning access to the synchronized objects:

  1. Assign the sysadmin server role to the target LOGIN. This can be directly assigned, or can be inherited by the Synapse Administrator or Synapse SQL Administrator RBAC roles.
  2. Grant a LOGIN the CONNECT ANY DATABASE and SELECT ALL USER SECURABLES permissions.

(as described in the official docs How to set up access control on synchronized objects in serverless SQL pool).

On their own, these were both quite high-privileged permissions sets, as they applied to all replicated Spark databases. So this wasn't ideal if you wanted to reduce the scope of permissions. (Although in conjunction with storage-level permissions, it was possible to further lock-down what end-users could see, even if they were assigned to these roles.)

But recently there's been an update to Synapse to allow the creation of customized objects in the Spark replicated database. This update provides us with another option:

  1. Create a USER on specific replicated database, and assign db_datareader (or some other role that SQL Serverless supports - see Database-level roles). This must be an AAD security principal (User/Group/App), as currently this is the only authentication mechanism supported by the synchronized tables.

This is much better - you can now assign USERs access to individual databases and add them to a role (most likely db_datareader). Unfortunately GRANT statements aren't supported, so you can't assign more granular access to objects within the database. Maybe that will come in the future - who knows.

Here's the code snippet (creating a contained AAD user):

USE [MyReplicatedSparkDB];

CREATE USER [foo.bar@contoso.com] FROM EXTERNAL PROVIDER;

ALTER ROLE db_datareader ADD MEMBER [foo.bar@contoso.com];

So long as the underlying storage access has been assigned for the backing data of the synchronized tables (using the Storage Blob Data * RBAC permissions, or using ACLs), there shouldn't be anything more to do.

Common errors

Login failed for user '<token-identified principal>'.

This is common if you're logging in via SSMS or Azure Data Studio, or some other SQL IDE. This error is nothing to do with the Shared Metadata Model - rather the AAD user trying to access doesn't have access to the Serverless pool at all. If you're using Synapse RBAC, then the user needs to be granted either the Synapse Administrator or Synapse SQL Administrator roles. Otherwise they need to be given explicit SQL permissions on the server and/or DB.

The SELECT permission was denied on the object...

USER can log-in to the database but has no permissions to access any of the objects (or at least no "SELECT" permissions). They'll need to be assigned to a role like db_datareader to be granted access to the objects. Or, if this is a non-replicated database, then more granular GRANTs can be assigned.

You do not have permission to use the bulk load statement.

In order to use the OPENROWSET command (directly or indirectly (e.g. via a VIEW)), the USER needs to be granted the ADMINISTER DATABASE BULK OPERATIONS permission.

File 'https://...' cannot be opened because it does not exist or it is used by another process.

We've finally reached an error caused by storage level permissions! This implies that the AAD user (or the identity under which the call is being made) doesn't have the appropriate storage level permissions. If it's an AAD identity (which it will be for queries against synchronized objects in the replicated DB), these permissions can be assigned either through the Storage Blob Data * RBAC permissions, or by ACLs against the filesystem/directories/files.

External table '...' is not accessible because content of directory cannot be listed.

Another storage level permissions issue: this is when a user has access to a database and its objects (synchronized external table in this instance), but doesn't have appropriate permissions to the underlying files in the lake. Again, an RBAC role or ACLs are required.

The best hour you can spend to refine your own data strategy and leverage the latest capabilities on Azure to accelerate your road map.
What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?

What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?

Ed Freeman

A lesser known feature of Azure Synapse is the "Shared Metadata Model". Synapse has the capability to automatically synchronize tables created via Synapse Spark with objects you can query via the usual SQL Serverless endpoint, without any additional configuration. This article brings attention to this capability, highlighting the benefits and tradeoffs vs rolling your own SQL Serverless VIEWs.
Customizing Lake Databases in Azure Synapse Analytics

Customizing Lake Databases in Azure Synapse Analytics

Ed Freeman

Great, so I've configured my Lake Database in Azure Synapse Analytics. But since I'm using parquet-backed files, my column names aren't very user-friendly. I also have these calculated columns incorporating business logic that I'd like to query on the fly rather than persist them to backing data. I also want to give specific end-users access to this database and provide them with pre-defined reporting queries to get them up and running as quickly as possible. How can I do this? Enter Custom Objects in Lake Databases - now you can create VIEWs, Stored Procedures, USERs (amongst other objects) in what used to be a read-only database. This article explores the customization options and how it can help you organize your reporting data in Azure Synapse Analytics
What is a Lake Database in Azure Synapse Analytics?

What is a Lake Database in Azure Synapse Analytics?

Ed Freeman

Lake Databases in Azure Synapse Analytics are just great. If you're starting on a new Synapse Analytics project, chances are you can benefit from Lake Databases. Whether you need to analyze business data from Dataverse, share your Spark tables of data with SQL Serverless, or use Database Templates to visually design and define your standardized/semantic analytical data model, there's a use-case for you. In this blog we'll explore what Lake Databases are, why they're useful, and what you need to watch-out for when using them.

Ed Freeman

Software Engineer I

Ed Freeman

Ed is a Software Engineer helping to deliver projects for clients of all shapes and sizes, providing best of breed technology solutions to industry specific challenges. He focusses primarily on cloud technologies, data analytics and business intelligence, though his Mathematical background has also led to a distinct interest in Data Science, Artificial Intelligence, and other related fields.

He also curates a weekly newsletter, Power BI Weekly, where you can receive all the latest Power BI news, for free.

Ed won the Cloud Apprentice of the Year at the Computing Rising Star Awards 2019.