Skip to content
Ed Freeman By Ed Freeman Software Engineer II
Import and export notebooks in Databricks

Sometimes we need to import and export notebooks from a Databricks workspace. This might be because you have a bunch of generic notebooks that can be useful across numerous workspaces, or it could be that you're having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace.

Certain aspects can be done relatively easily, manually. You can export workspace directories by hovering over the drop down in the workspace view in the UI. This is at any level - at the root or in child directories (provided you have access to the directory in question).

You can export files and directories as .dbc files (Databricks archive). If you swap the .dbc extension to .zip, within the archive you'll see the directory structure you see within the Databricks UI. Exporting the root of a Databricks workspace downloads a file called Databricks.dbc.

Azure Weekly is a summary of the week's top Microsoft Azure news from AI to Availability Zones. Keep on top of all the latest Azure developments!

You can also import .dbc files in the UI, in the same manner. This is fine for importing the odd file (which doesn't already exist). However, through the UI there is no way to overwrite files/directories; if you try to import a file/directory that already exists, a copy of that artifact will be created.

An alternative solution is to use the Databricks CLI. The CLI offers two subcommands to the databricks workspace utility, called export_dir and import_dir. These recursively export/import a directory and its files from/to a Databricks workspace, and, importantly, include an option to overwrite artifacts that already exist. Individual files will be exported as their source format.

How it works

First of all, if you don't have the Databricks CLI installed locally, run pip install databricks-cli.

Next, we need to authenticate to the Databricks CLI. The easiest way to do this is to set the session's environment variables DATABRICKS_HOST and DATABRICKS_TOKEN. Otherwise, you will need to run databricks configure --token and insert your values for the host and token when you are prompted. The value for the host is the databricks url of the region in which your workspace lives (for me, that's https://uksouth.azuredatabricks.net). If you don't know where to get an access token, see this link.

Now authentication is out of the way, we can address the subject of this blog.

Export

The general template is:

databricks workspace export_dir "<databricks-source-path>" "<local-path-to-export-to>"

The best hour you can spend to refine your own data strategy and leverage the latest capabilities on Azure to accelerate your road map.

To export the workspace root to the temp folder on your C drive, this would be:

databricks workspace export_dir "/" "C:/Temp/"

If you try to export any files that already exist in your local directory, the CLI will skip those files. You can tell the command to overwrite the local files by passing -o to the command.

databricks workspace export_dir "/" "C:/Temp/" -o

Import

The general template is:

databricks workspace import_dir "<local-path-where-exports-live>" "<databricks-target-path"

For example, if my directories live within (C:/Temp/DatabricksExport/) on my machine, and I want to import them into the root of a Databricks workspace, this is the command:

databricks workspace import_dir "C:/Temp/DatabricksExport" "/"

However, if you're importing any files that already exist, you'll get an error. Get around round this error by, again, adding -o to the command.

databricks workspace import_dir "C:/Temp/DatabricksExport" "/" -o

In an ideal world

A Databricks notebook can by synced to an ADO/Github/Bitbucket repo. However, I don't believe there's currently a way to clone a repo containing a directory of notebooks into a Databricks workspace. It'd be great if Databricks supported this natively. However, using the CLI commands I've shown above, there are certainly ways around this - but we'll leave that as content for another blog!

Ed Freeman

Software Engineer II

Ed Freeman

Ed is a Data Engineer helping to deliver projects for clients of all shapes and sizes, providing best of breed technology solutions to industry specific challenges. He focusses primarily on cloud technologies, data analytics and business intelligence, though his Mathematical background has also led to a distinct interest in Data Science, Artificial Intelligence, and other related fields.

He also curates a weekly newsletter, Power BI Weekly, where you can receive all the latest Power BI news, for free.

Ed won the Cloud Apprentice of the Year at the Computing Rising Star Awards 2019.