Microsoft Fabric - Local OneLake Tools
Tutorial
Sometimes you need to be able to interact with your cloud data locally - maybe it's to troubleshoot/diagnose an issue, or just to do some analysis using your favourite local tools. Ideally you'd be able to browse your cloud data locally and avoid always having to download new copies of your files just to do this.
Fortunately, there are a couple of tools that allow you to do just this with your OneLake data - namely the OneLake File Explorer and the trusty old Azure Storage Explorer. In this video we take a look at the two tools, and discuss some of the caveats you should be aware of when using them. The full transcript is available below.
The talk contains the following chapters:
- 00:00 Intro
- 00:13 What is the OneLake File Explorer?
- 01:16 Recap: OneLake Lakehouse view
- 02:12 OneLake File Explorer Online Docs
- 02:33 Admin installation settings
- 03:12 Downloading OneLake File Explorer
- 03:30 Demo: Navigating OneLake File Explorer
- 04:24 Adding, deleting and syncing files
- 06:09 Gotcha 1 - "System" file access in OneLake File Explorer
- 07:20 Gotcha 2 - possible egress costs
- 09:55 What is Azure Storage Explorer?
- 10:32 Demo: Connecting to OneLake in Azure Storage Explorer
- 12:32 Navigating OneLake data in Azure Storage Explorer
- 13:57 Roundup & Outro
Useful links:
Microsoft Fabric End to End Demo Series:
- Part 1 - Lakehouse & Medallion Architecture
- Part 2 - Plan and Architect a Data Project
- Part 3 - Ingest Data
- Part 4 - Creating a shortcut to ADLS Gen2 in Fabric
- Part 5 - Local OneLake Tools
- Part 6 - Role of the Silver Layer in the Medallion Architecture
- Part 7 - Processing Bronze to Silver using Fabric Notebooks
- Part 8 - Good Notebook Development Practices
From Descriptive to Predictive Analytics with Microsoft Fabric:
Microsoft Fabric First Impressions:
Decision Maker's Guide to Microsoft Fabric
and find all the rest of our content here.
Transcript
Ed Freeman: In this video we're going to take a look at a couple of tools we can use to browse and modify our OneLake data locally and a couple of things we need to watch out for. Let's crack on. So the first tool we're going to take a look at is the OneLake File Explorer. So this is a Windows File Explorer add on which allows us to see our OneLake data.
In our local file explorer and everything stays synced between the two. It's very much like the OneDrive file explorer that, that exists already for your files and documents, but this is for OneLake and there are use cases right from data engineers, software engineers that need to interrogate the data and wrangle the data locally to maybe do some ad hoc analysis right through to the business users and the knowledge workers who just want to view a CSV file or potentially upload some.
User dropped files into the lake without having to open up their browser. They just want to do it all within their file explorer. So there are some good use cases for both of them. Now there are a couple of things to watch out for, but I'll come come to those a little bit later on. Let's take a look at what the OneLake file explorer looks like.
I'm actually in Fabric here and the you probably remember that the way that we view our files that we've already uploaded from the previous episodes is through the Lakehouse view. The Lakehouse view gives us an explorer over our tables and our files. And at the moment, we don't have any tables.
In the files, we have our raw data. We have both the raw data that we use, the copy data activity in DataFactory, the land registry data. To get in there and we have the ONS shortcut that we created in the last episode. That forms as a whole, all of our raw data. But what if I don't want to use the browser to look at it?
What if I want to look at some these things offline, just net, just browsing these files? This is where the OneLake File Explorer could come in handy for your use case in your organization. So the OneLake File Explorer, I'll post the description, the link in the description so that you can find this yourself.
But essentially it's an application that adds on as a tab to your file explorer in Windows and it gives you access to view all of your the workspaces that you have. Access to and the data within there. Now, it's worth mentioning that only some people can install and that's dependent on the admin setting.
You might not be an admin of your tenant, but in your tenant settings, there is an admin setting which says users can sync data in OneDrive and OneLake with the OneLake File Explorer app. Now, this is configurable either for the entire organization or for... None of the organization, so at the moment, there's no middle ground of being able to do it for a specific set of users.
Maybe that will come in the future. But this has been put in place for the reasons I'll speak about shortly. But anyway, so through this link, we can use the link here to take ourselves to the download center where we can actually download this OneLake File Explorer. And once you've downloaded it and it's logged in you don't really have to do very much configuration at all.
I've already done it in the past. We actually get this tab, as I said, in the file explorer, and you can see access to all of the workspaces that you have access to. So I've got two workspaces here that I've currently got in this tenant that I've, that's got data in it, either in a lake house or a warehouse, for example.
And this fabric end to end demo is the workspace that we've been working with so far throughout the demo series. And in there we've got our Bronze Lake house, and in there we've got our files and tables. We don't have anything in tables, so there's no point looking in there. In the files, we have our raw data, and then we see both our LAN registry and ONS data.
And I also have this other workspace over here, which I can use to navigate to that workspace's data if I wanted to. So as I said earlier, we can fully browse and modify and upload and delete files within here if we wanted to. So I've just got a demo file over here that has about two rows in it.
Just want to demonstrate copying that into a new... And a new folder within raw called, I'm going to call it a user drop zone. So user drop, this is where end users drop their files for further processing, probably up in a notebook or something in Fabric. And if I copy the data in there, so that data has Been dropped in my local file explorer into the OneLake OneLake area.
So if I go back into my Lakehouse view now, and I hit refresh on the ellipses, you can see my data there. So I've got the user drop, oops, user drop, then I can click on countries and it gives me the file preview, which is only a couple of a couple of lines as I said, but that was how easy it was to to upload that.
And I can actually do the reverse now. If I don't want that information anymore, if I delete it, then straight away, the data won't actually show as deleted in here. But what I can do is I can right click on the directory level. Say, sync from one lake and it notices that in that folder has now gone and it will remove it.
And that's how easy it is. Easy as that. And I really recommend you to take a look at the documentation because there are a few specific limitations and with these things, and know, working with OneDrive every so often you might find that there are syncing issues and things get a little bit out of whack with one another.
But you should try it out yourself and try and see whether it would work for your use case. But onto those those potential issues that we have with this, the first one is OneLake stores all of your fabric data and therefore it actually, you can actually browse to see certain warehouse files within here.
Now warehouse files, thankfully, are read only, so you can't actually modify them. But lakehouse files as well in your lakehouse tables, you get access to those. And if you have the permissions to read and write at the workspace level on that lakehouse. Then technically you can actually modify and delete files directly within the file explorer that might be, underlying Delta files that back your Delta tables.
Now that's obviously something that could corrupt your table and you want to be very careful with the users that you give access to. To read to write to those tables and make sure they understand the implications of using a tool like this. And if all you're ever doing is reading or you're uploading files to the file area, it should be no problems.
But if you're having people go in and accidentally deleting files from a Delta table, then you might have some issues. So that's the first thing to be wary of. The second thing to be wary of is unlike OneDrive. One Lake actually charges potentially charges egress costs. Now there's a disclaimer here to say nothing is GA yet, so we don't know exactly what the pricing is going to be once GA lands with this specific networking costs and bandwidth pricing.
But it's quite likely that potentially kind of One Lake will conform to the usual Azure bandwidth pricing constraints. Which essentially gives you a limit to how much data you can take out of Azure for free within a month period. And for free for you, for most tenants, it's a hundred gigabytes that you get.
So you can imagine for smaller or medium enterprises with not many concurrent users and. The sizes of data that you're working with isn't very large and that might not be an issue and certainly not an issue for a vast amount of clients that we've had in the past, but certain scenarios where you're working with big data, where you're working with larger enterprises who have lots of concurrent users, you've got to be cautious when using a tool like this, because you might be downloading quite a lot of data running over those thresholds of free usage and starting to pay in networking costs. That being said, the file explorer doesn't actually download files by default. It will only show you the metadata of those files, just like OneDrive. And to actually interact with that file or open it locally, you have to double click it and it will download.
But you can imagine in big data scenarios... Where you might have, I don't know, a 10 gigabyte file, you download that and suddenly that's a one tenth of your monthly allowance across your whole tenant gone. So that's something to be wary of. And that's why there's that tenant setting that allows you to turn it on and off as an admin.
And that's, but it's also a good argument to make sure that your your cost management Azure side are aware of this and keep track of it in the cost management tools in Azure. But that's, those are the only two things really to be wary of from my perspective. Otherwise, this is a great tool for, to let your end users or your engineers replicate that data locally or sync that data locally so that they can more easily access that should they need to.
So that's the Storage Explorer. So that's, sorry, that's the File Explorer. Now, we'll move on to the Storage Explorer. Azure Storage Explorer, many of you will be familiar with it if you've already been developing data solutions in Azure. Azure Storage Explorer is a really popular local file explorer.
For all of your Azure data. Up until now, it's been primarily used for things like blob storage accounts or Azure Data Lake storage accounts. But it could also be used for OneLake now, and the reason is because OneLake is built on top of Azure Data Lake storage, and therefore it can benefit from all of the same APIs.
So what does Azure Storage Explorer look like? Here we are within Azure Storage Explorer. The first thing you need to do if you've just downloaded is you'll need to authenticate your Azure account. And you may have multiple accounts and multiple tenants, so it'll ask you what tenant you want to log into and what subscriptions you want to sync by default.
But with OneLake, you actually have to create a connection to specific ADLS file systems or containers. And the way you do that is you click on this button over here, this connection button, and you select the ADLS Gen2 container or directory. And the way we're going to be authenticating is going to be Azure Active Directory.
I'm in the correct tenant and I'm using my normal user account, which is already authenticated within Azure Storage Explorer. So now all I need to do is select the display name and the end point within One Lake to go and grab my data. I'm just going to go over to a different screen to grab this, but I'm going to call this Fabric end to end demo, and then I'm going to add the base URL, which is this OneLake.
dfs. fabric. microsoft. com Bit of a mouthful. And at the end of this, I essentially need to point it towards a specific workspace or a specific artifact that I want to sync to Azure Storage Explorer. Now, there are two ways we can do it at the workspace level. That's using the actual workspace name, or it's using the workspace GUID, so the workspace ID.
Now, I prefer things that are human readable so I'm going to just type in fabric end to end demo, and I can keep the spaces in there, it will handle that fine. Now, if you have name clashes, potentially you'll need to use the ID. Root but that's probably not likely to be an issue most of the time.
So if I click next, it'll give me a bit of a, an overview of what I'm configuring. And then I'm going to click connect. And suddenly it's all authenticated and it's found that data in my in my workspace. And now I have two Fabric workspaces alongside each other, just like I did in the File Explorer.
And I can now browse this information. So files, raw. And I have both my ONS and land registry data. Here's the postcode directory within my ONS. This is all in the shortcut, remember, so this is referencing the ADLS source. And another thing that Azure Storage Explorer gives as a feature is this ability to preview.
So now, again, preview, you've got to be aware that this is actually going to retrieve some data, and therefore that will be counting towards your kind of egress threshold. So bear in mind that all of the things that I mentioned earlier about OneLake File Explorer also apply to Azure Storage Explorer because it's all the premise of taking data out of Azure onto your local machine.
But it can be really useful for querying smaller files of data. Azure Storage Explorer has also just added the ability to preview Parquet files. Now, I've had some issues with that. I'm not sure whether they've fixed that yet, but we shall see. Should they fix those issues, then you've got a really easy way of just previewing Parquet CSV files locally if that's something that would be useful for your organization.
So that's it. We've got two different and really useful tools that we can use to to browse and navigate and modify our One Lake data locally. There are the two things that we need to be cautious of, one being the accidentally modifying table files that we don't want to be modifying.
The second is keeping one eye open with regards to, to egress and bandwidth costs. But otherwise, these are really cool tools that you should really take a look at and see and discuss with your team members, whether it's going to be useful for you. Anyway, that's it for this for this video.
Thank you for tuning in. In the next video, we're going to be diving deep into the notebook experience in Fabric to see how we can process our files from our bronze layer into managed Delta tables in our silver layer. As ever, please hit if you've enjoyed this video and hit subscribe to keep following all of our content.
Thanks for watching.