Unification is central to Azure Synapse Analytics. As Satya Nadella said when announcing its general availability at the start of December, Synapse "brings together all your data capabilities in one place with deep integration." It has the potential to streamline development by reducing the friction encountered when trying to integrate disparate services.
Although the breadth of services is the key to Synapse's value, it means it can take a while to understand what Synapse is, and what it can do. A list of its specific capabilities (including support for big data, data warehousing, data integration, real-time analytics, and data transformation pipelines, for example) may sound more like a list of separate services than a coherent product. While higher level descriptions such as "the foundation for unified analytics and insights" might shed slightly more light on the purpose, they're bafflingly vague.
Increasingly diverse analytics solutions
At endjin, we have another way of looking at this: Synapse is a response to the increasingly diverse nature of business intelligence. The way organizations use data to produce actionable insights has evolved. Once upon a time, a company might have devised an ETL process to collect data from various internal systems such as CRM and billing into a single data warehouse. However, a one-size-fits-all approach typically won't do a great job of supporting both exploratory data analysis, and optimized automated reporting. So over time, many companies have bifurcated their data processing to support both modes, typically introducing a data lake as an initial common ingestion point. This in turn has often evolved so that there may be multiple different kinds of ingestion mechanisms depending on where the data originates, and multiple models for processing the data—some data exploration might use relatively ordinary Python notebooks, but in other scenarios the volumes of data may necessitate the use of technologies such as Spark. Perhaps machine learning systems will also come into play. And even once some data processing work completes, there may be multiple variations on what to do next: results might be shown in a report, or made available for a business analyst to explore in a tool such as Power BI, but it might also become apparent that the results could be used to drive automated processes directly—why force a human to read something off a screen and then start some process manually if that job could started automatically?
Although the widening diversity in these systems may be able to increase the amount of value organizations can extract from their data, it also increases the amount of technical integration work required. And if each of the tools being brought to bear here is a distinct standalone service or technology, that work can start to become overwhelming.
Unified services simplify integration and governance
This is why a system that can offer a wide range of ways to work with data can be valuable: it can lower the barrier to getting these different mechanisms to work together.
Synapse offers hosted SQL Data Warehouse. It also has a serverless mode, enabling you to run SQL queries directly over files in a data lake. (So if you have a collection of CSV or Parquet files, you can run a SQL query directly over them without needing to load them into a database first.) And its not all about SQL. Synapse offers hosted Spark clusters. It has a native Notebook system supporting Python, Scala, F#, or C#. Synapse also offers a powerful integration mechanism, pipelines (an evolution of ADF), to orchestrate the use of its many features. By helping you move easily from an experimental, investigative mode to automation, pipelines minimize the friction encountered in taking the insights that emerge from data science investigations, and operationalizing them to drive automated processes that translate those insights into practical value. As well as tying together all of Synapse’s native capabilities, pipelines can also integrate with external services including older services such as Hive or Data Lake Analytics, enabling new Synapse-based solutions to benefit from existing investments. It can also integrate with other external services such as Azure Functions, Azure Cognitive Search, or Machine Learning. Furthermore, the integration between Synapse and Azure Purview means you also get a data governance solution that will be able to cover your whole process.
No silver bullet...but
Many customers come to us asking for a silver bullet for their data and analytics needs. While we don't think that's possible in such a complicated and nuanced space, we do believe Synapse is like a Swiss Army Knife that contains all the tools you need in one package. By providing a unified environment and toolset, Synapse ensures that all the various moving parts already fit together, enabling you to focus on the applications-specific aspects of the task.
For more from Ian on data services, see his talks from this year's SQLBits on Data Exploration & Experimentation with Notebooks in Azure and Navigating the Bewildering Array of Data Services in Azure