Skip to content
Ed Freeman By Ed Freeman Software Engineer II
Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."

Have you been trying to create a Databricks cluster using the CLI? Have you been getting infuriated by something seemingly so trivial? Well, join the club. Although, get ready to depart the club because I may have the solution you need.

When creating a cluster using the CLI command databricks clusters create, you're required to pass in either a JSON string or a path to a JSON file. I recently opted for the first option. [Note: I'm using PowerShell to talk to the Databricks CLI.]

How it transpired

Here's what (I feel) should have worked.

But running that, we receive the output:

Error: JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 7)

Huh? You asked for a JSON string, I gave you a JSON string. Why are you complaining? Maybe the Databricks CLI wants me to wrap more quotes around my --json argument. Not sure why, but let's try that.

databricks clusters create --json "$configToJson"

Nope. Same error. More quotes?

databricks clusters create --json "'$configToJson'"

Of course, there's no chance that's going to work and it gives us the error:

Error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hmm. What could be happening?

A little bit of Googling later, I find someone who had the same problem as me (who was using the windows command prompt). They alluded to the need to "escape" the double quotes with a backslash within the JSON string. Sounds odd (a backslash is neither the escape character in PowerShell nor the Windows Command Prompt), but let's give it a whirl:

And hey presto, it works:

{ "cluster_id": "0704-090525-blocs355" }

But... why?

At this point, it's very easy to shrug this off now that it's working and not bother to try and understand why it's now working. I mentioned that "a backslash is neither the escape character in PowerShell nor the Windows Command Prompt", but there are numerous common languages for which it is the escape character. After locating the Databricks CLI GitHub repo, I saw that it was written in Python, a language which uses the backslash as an escape character.

Under the covers, the Databricks CLI is using the json.loads() method to parse our --json argument, and the error we're getting is a JSONDecodeError coming from that json package. The loads() method takes a string in the form '{ "name":"John", "age":30, "city":"New York"}' and converts it to a Python dictionary. This should work with the argument we used in the first attempt earlier, but it doesn't. My hypothesis (which I'm unsure how to prove) is that the JSON string argument is implictly being wrapped in double quotes in the process of being passed to the Python method, so the method receives this:

which isn't a valid representation of a string in Python (because the 2nd double quote closes the first double quote), and cannot be parsed into a dictionary. Add in the interior double quote escaping, and all's good - we now have a string that Python can understand.

A sensible alternative

I mentioned earlier that you can also pass in a path to a JSON file using the --json-file parameter. This works exactly how you'd expect (no funny business with escaping quotes since you're passing it as a .json file as opposed to a JSON string). In our case, however, having to define the JSON object elsewhere in our code-base would have been sub-optimal, but YMMV.

I hope this blog has helped some of you!

Import and export notebooks in Databricks

Import and export notebooks in Databricks

Ed Freeman

Sometimes it's necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you're having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.
Using Databricks Notebooks to run an ETL process

Using Databricks Notebooks to run an ETL process

Carmel Eve

Here at endjin we've done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a "unified analytics engine for big data and machine learning". It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.
Notebooks in Azure Databricks

Notebooks in Azure Databricks

Jessica Hill

This blog post explores interactive notebooks in Azure Databricks. An Azure Databricks Notebook is a powerful data science tool that support exploratory data analysis, hypothesis testing, data cleaning and transformation, data visualisation, statistical modeling and machine learning.

Ed Freeman

Software Engineer II

Ed Freeman

Ed is a Data Engineer helping to deliver projects for clients of all shapes and sizes, providing best of breed technology solutions to industry specific challenges. He focusses primarily on cloud technologies, data analytics and business intelligence, though his Mathematical background has also led to a distinct interest in Data Science, Artificial Intelligence, and other related fields.

He also curates a weekly newsletter, Power BI Weekly, where you can receive all the latest Power BI news, for free.

Ed won the Cloud Apprentice of the Year at the Computing Rising Star Awards 2019.