Insight Discovery (part 1) – why do data projects often fail?
TLDR; The traditional, bottom-up, data modelling approach to data warehousing leads to compromised data platforms that are hard to evolve, expensive to run, and don't meet the needs of the business. Endjin's Insight Discovery process helps you to ask the right questions of the business, so that you can design a data platform that fully meets their needs.
Insight Discovery
This series of posts, and the process that they describe, will help you ensure that your data projects are successful. That the time, money and energy that you or your organisations are investing in strategic data initiatives are well spent and that they deliver real business value.
They describe a different way of thinking, a shift in mindset, in how to approach data projects, that puts the consumer and the outputs front and centre of the process.
None of it is complicated, but putting it into practice can be hard - and it might go against a lot of natural inclinations as to how you've typically done things.
They will explain why some of those common and long-established methodologies are actually setting you up for failure, but how this simple change in approach can reverse that and ensure that whatever you deliver will be valuable.
A familiar story?
Before we dig into the Insight Discovery process in more detail, I'm going to start with a story - one that I'm guessing is familiar to any of you that have worked on data projects for any length of time...
Let's imagine that your business has identified that they could and should be using data to make informed decisions - about internal processes, about your customers, about business strategy. They want to become more "data driven", and set about embarking on creating an enterprise data warehouse or centralised data platform that will act as the "single source of the truth" for all your business intelligence needs.
As part of the BI or data team, you've been tasked with designing and implementing this new data warehouse and you kick off the project by trying to find out what kinds of data you're going to need to get hold of and how it's going to be used. But of course, the business users of the system can't tell you exactly what they want, and they talk about "self-service" BI, and being able to "slice and dice" the metrics by various different dimensions.
So, you start designing a data model - a centralised view of the organisation, containing all the facts and dimensions that the business might care about, in order to report on all the things that they haven't been able to clearly define yet.
It takes a long time, as there's lots of data, lots of questions, lots of unknowns, and lots of compromises needed when consolidating data from different sources into one logical model. And of course, different business users from different teams have different views on how things should be modeled, and what things should be called. So, development of the model is slow.
The enterprise data warehouse technology that you're using, that naturally lends itself to facts and dimension tables, is expensive to run. So you have to make some tough decisions on what granularity of data you can store, and how long you can hold onto it for. And it's really hard to optimize the model for "self service" - because you still don't know what are the exact types of queries that the business is going to want to perform.
But, after some months, maybe a year, you get the first version of the data warehouse out into the business. But guess what - they don't want to use it. Because it doesn't give them the information they need in the way that they need it. Which is frustrating. But now that they've got something to try, they're able to tell you why it's wrong and what changes they'd like you to make.
But it's really slow to make those changes, as any updates to the model need to be considered in light of the whole thing - one team's requests are going to impact another's, and vice versa, so the model starts to become a series of compromises - a long way from a perfect view of the organisation.
The limitations and trade-offs that you have to bake in as you go mean that it's hard to explain the rationale and thinking behind some parts of the model, meaning you need to write and maintain extensive documentation, and become a gatekeeper for any future changes, which slows down innovation with bureaucracy as more and more requests come in.
But eventually, the model, the tools that are used to query it, and the reports that the business asked to be generated, start to gain adoption across the organisation. This should be considered a success, but in fact, by the time that you get to this point, you have a data warehouse that is:
- Slow and costly to develop
- Expensive to run and maintain
- Brittle and hard to evolve
- A compromised view of the organisation
Sound familiar?
So, why do data projects often fail?
Where do things typically go wrong?
"The business users of the system can't tell you exactly what they want".
This is the key thing, the starting point that is accepted all too often, that leads to the compromised output that I described. And the tools and techniques around data modelling that you're all familiar with, in some ways (albeit indirectly), hide this problem from view. Because by not addressing that, we're effectively saying that this is ok:
"We can design the perfect model that will allow you to query anything you want, in any way that you want".
Except, of course, that's not the case. You can't do that, and no amount of modelling will iterate you towards a perfect result. By definition it will always be a compromise.
So we need to think differently. About how we help the business users describe what they need, so that we don't have to build a compromise. And we need to frame that need in a way that relates back to the objectives of the business. So that when we help them, we know we're adding real value.
If you're interested in learning more, I gave a 20 minute talk about the Insight Discover Process for Data & Analytics projects at the Virtual Data Platform Summit 2021, you can see a recording of the talk below:
The rest of this series of posts will help with that, by exploring endjin's Insight Discovery process, explaining how to ask the right questions of the business, and also how underlying architectural and design choices - specifically in modern cloud data platforms like Azure Synapse Analytics - can fully support this process, so that your projects are set up for success.