Data Exploration & Experimentation with Notebooks in Azure

16th October 2020 · 47 min watch

By Ian Griffiths Technical Fellow I

Microsoft invests in Notebook tech like Jupyter & .NET Interactive for data analysis, adding .NET support for Spark and Azure's Databricks integration.

About this talk

SQLBits 2020

Microsoft are investing heavily in Notebook technologies (such as Jupyter and .NET Interactive) to provide an interactive environment for experimental and exploratory data inspection and analysis.

These kinds of environment are becoming increasingly important for a growing range of activities including data cleaning and normalization, data import, statistical analysis, insight generation, and testing hypotheses. And in some applications, it can make sense to take notebooks that started out as an interactively developed set of ad hoc operations and transform them into part of an automated workflow.

Jupyter notebooks are most commonly authored in Python or R. Microsoft has been working to add support for .NET languages, enabling the use of C# and F# in notebooks. They are also adding .NET support for Spark, enabling Spark clusters to be controlled from .NET, and also with a view to being able to run custom .NET code inside the cluster as part of the core processing.

Azure's growing support for notebooks enables this approach across a range of scales. You can work with datasets that fit easily in a single machine's memory, but if you need more firepower, with Azure's Databricks support you can spin you up a server farm to process your data in parallel, enabling you to perform complex computations across massive datasets.

About the presenter

Ian Griffiths

Ian has worked across an extraordinary breadth of computing - from embedded real-time systems and broadcast television to medical imaging and cloud-scale architectures. As Technical Fellow at endjin, he brings this deep cross-domain experience to bear on the hardest technical problems.

A 17-time Microsoft MVP in Developer Technologies, Ian is the author of O'Reilly's Programming C# 12.0 and one of the foremost authorities on the C# language and high-performance .NET development. He's a maintainer of Reactive Extensions for .NET, Reaqtor, and endjin's 50+ open source projects.

Ian has created Pluralsight courses on WPF fundamentals, WPF advanced topics, WPF v4, and the TPL, and has given over 20 talks at conferences worldwide. Technology brings him joy.