Skip to content
Elisenda Gascon By Elisenda Gascon Apprentice Engineer II
Version Control in Databricks

Notebooks provide an interactive and collaborative environment for developing code. As such, in Databricks, notebooks are the main tool for creating workflows. With them, you can develop code using a variety of languages, schedule notebooks to automatically run pipelines, collaborate by sharing notebooks, use real-time co-authoring, and use Git integration for version control.

With Databricks Notebooks, you can apply software engineering best practices, such as using version control to track changes and collaborate on your code. In this post, we will see how to use version control and Git integration with Databricks Notebooks.

Version Control in Databricks Notebooks

By default, Databricks notebooks have version history built in them. When you're working on a notebook, you'll see a tab called ‘Revision history'. Here, you'll find every version of your notebook that has been (automatically) saved.

SHowing a screenshot of the revision history panel of a Databricks notebook.

You can restore an earlier notebook by selecting an earlier version and choosing “Restore this revision”:

Showing a screenshot of the revision history panel of a Databricks notebook. The "Restore this revision" link is highlighted.

Databricks has automated version control, which means that version history is always available in your Databricks notebooks without any configuring needed.

This also means that real time co-authoring of notebooks is possible. Two people can collaborate on the same notebook at the same time and see the changes being made by the other person in real time. In the following screenshot, my colleague has added a cell to my notebook. I can see that they're viewing the notebook as well as their cursor.

Screenshot of a Databricks notebook. At the top of the page, you can see the initials of my colleague who's editing the notebook. You can also see their cursor on the cell they're editing.

Note that all users editing the notebook need to have permissions to use the cluster attached to the notebook in order to run any of the cells. Otherwise the above error message will appear.

Git integration in Databricks Notebooks

We've seen that version control is set up by default in Databricks notebooks. However, this versioning lives in the Databricks environment being used. If this environment were to be deleted, all the work and version history would be lost.

The best hour you can spend to refine your own data strategy and leverage the latest capabilities on Azure to accelerate your road map.

Let's see how to use Git with Databricks notebooks to implement version control.

The recommended way to use Git integration with Databricks is to use Databricks repos.

Discover your Power BI Maturity Score by taking our FREE 5 minute quiz.

Databricks repos provides Git integration within your Databricks environment, allowing developers to use Git functionality such as creating or cloning repositories, managing branches, reviewing changes, and committing them. This allows developers to apply software engineering best practices when using Databricks notebooks.

Let's see how to set up Databricks repos step by step. In this example, we will be setting up version control using GitHub.

Step 1: Create a repo

Azure Databricks supports the following Git providers:

  • GitHub
  • Bitbucket Cloud
  • GitLab
  • Azure DevOps
  • AWS CodeCommit
  • GitHub AE

In this example, we will be setting up version control using GitHub. For this, we have created a private repository in GitHub called databricks-version-control where we will store our code.

Step 2: Get a Git access token

If your repository is set to public, you can skip this step.

Power BI Weekly is a collation of the week's top news and articles from the Power BI ecosystem, all presented to you in one, handy newsletter!

In order to connect to Databricks repos, you will need to create a personal access token (PAT).

In GitHub, go to Settings > Developer settings > Personal access tokens and click on “Generate new token”.

Showing a screenshot of GitHub in Settings > Developer Settings, > Personal access tokens. We are hovering over "Generate new token"

Provide a description for your token and select the scope to define the access for the personal token:

Showing a screenshot of github. We're setting the configuration to generate our PAT.

Once you get the token, make sure you copy and save it somewhere, as it will only be displayed at this stage and you will be needing it shortly.

Step 3: Activate Git integration within Databricks

In Databricks, click on your user email on the top right of your screen and select User Settings. Under the Git Integration tab, choose the Git provider you want to use (GitHub in our example), add the username or email and enter the PAT that you've been provided earlier.

Showing a screenshot of Databricks. We're in the User Settings page, under the Git integration tab. We have set the Git provider to GitHub.

Step 4: Add a repo

Now we're ready to link the repo that we created earlier to Databricks Repos.

Go into “Repos” and select “Add Repo”.

Showing a screenshot of Databricks. The Repos menu is expanded. The "Add Repo" button is highlighted.

Enter the URL to your Git repository and create the repo.

Showing a screenshot of the dialog box to add a repo. We have entered the URL to our Git repository and selected the Git provider as GitHub.

Once your repo has been created, you will see it appear on your menu.

Showing a screenshot of the repos menu again. This time, we can see our repo listed.

Now that your repo is available in Databricks, you can use it like you would in any other IDE. You can create a branch, update it and commit your changes for review.

Step 5: Create a branch

To create a branch, select the “master” branch icon showing next to the repository name. A dialog will open. From here you can see if any changes have been made, and create a branch.

Showing a screenshot of the dialog with an option to create a new branch.

Select Create Branch, and a name, and create the branch.

Showing a screenshot of the dialog to create a branch. We are naming the branch "feature/version-control-demo".

Step 6: Create a notebook

Once in our branch, create a notebook.

Showing a screenshot of Databricks. Under the repos menu, hovering over the repo shows the option the create a new notebook.

Let's add some markdown to our notebook.

Showing a screenshot of the new notebook in Databricks. The notebook has only one cell with a title in markdown that reads "This is a notebook".

Step 7: Review and commit your changes

Now, let's commit these changes. By selecting our branch again, we see the changes we made. This is very similar to the view you get in GitHub or any other Git provider.

Showing a screenshot of the dialog that appears when we select our branch. We can see the changes made to our notebook.

From here, review your changes, add a commit message and description, and commit your changes.

Step 8: Create a PR

Back in GitHub, I can review the changes and create and review a PR like usual.

Showing a screenshot of GitHub. We can see the changes from our last commit.

Source control is now set up.


In this post, we have seen how to use version control in Databricks notebooks and how to implement source control using Git integration with Databricks repos. Version control allows developers to easily collaborate on their work by sharing and reviewing changes. With Databricks repos, you can use Git functionality to clone, push and pull from a remote Git repository, manage branches, and compare differences before committing your work. After that, anyone with access to the repo can see the changes and perform tasks such as creating pull requests, merging or deleting branches, and resolving merge conflicts.

Using Databricks Notebooks to run an ETL process

Using Databricks Notebooks to run an ETL process

Carmel Eve

Here at endjin we've done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a "unified analytics engine for big data and machine learning". It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.
Import and export notebooks in Databricks

Import and export notebooks in Databricks

Ed Freeman

Sometimes it's necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you're having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.
Notebooks in Azure Synapse Analytics

Notebooks in Azure Synapse Analytics

Jessica Hill

This blog post explores interactive notebooks in Azure Synapse Analytics. A Synapse Notebook is a powerful data science tool that can be utilised in a variety of contexts including exploratory data analysis, data cleaning and transformation, data visualisation, statistical modeling and machine learning.

Elisenda Gascon

Apprentice Engineer II

Elisenda Gascon

Elisenda is a mathematics graduate from UCL. During her years at university, Elisenda took a couple of introductory modules in Python and Machine Learning, which led her to take a few online courses on those subjects.

After finishing her mathematics degree, Elisenda's motivation to join endjin was a desire to put her problem solving skills to the test and further develop her understanding of technology. She is currently expanding her knowledge of cloud computing and its various applications, and discovering the fascinating world of Microsoft Azure.