Browse our archives by topic…
Databricks

How to use SQL Notebooks to access Azure Synapse SQL Pools & SQL on demand
Wishing Azure Synapse Analytics had support for SQL notebooks? Fear not, it's easy to take advantage rich interactive notebooks for SQL Pools and SQL on Demand.

Does Azure Synapse Analytics spell the end for Azure Databricks?
Have you or are you about to invest in Azure Databricks? If so, the new Spark offering in Azure Synapse Analytics is likely to have grabbed your attention and rightly so. Why is Microsoft putting yet another Spark offering on the table and what does it mean for you?

Import and export notebooks in Databricks
Sometimes it's necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you're having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.

Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."
Quite often it's beneficial to work with pre-built CLIs/SDKs to interact with your favourite tools, instead of making requests to the underlying REST API. Much of the complexity around constructing requests has been abstracted, and authentication is often easier. The Databricks CLI makes it easier to interact with your Databricks instance, but sometimes you can run into strange errors when constructing the values passed in as arguments. In this blog, we take a look at a JsonDecodeError that can occur when speaking to the Clusters CLI, and look at a way we can avoid this error.

Using Databricks Notebooks to run an ETL process
Here at endjin we've done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a "unified analytics engine for big data and machine learning". It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.

Exploring Azure Data Factory - Mapping Data Flows
Mapping Data Flows are a relatively new feature of ADF. They allow you to visually build up complex data transformation sequences. This can aid in the streamlining of data manipulation and ETL processes, without the need to write any code! This post gives a brief introduction to the technology, and what this could enable!