Skip to content
Barry Smart By Barry Smart Director of Data & AI
How to apply behaviour driven development to data and analytics projects

TLDR; we explore the application of behaviour driven development (BDD) to data and analytics projects. We demonstrate this by writing a Gherkin specification for a specific data engineering requirement and using the Python Behave package to execute it.

In our previous blog Extract insights from tag lists using Python Pandas and Power BI, we highlighted a number of engineering practices that hadn't been applied to the solution and we promised to tackle them in future blogs, one of which was:

"How can we verify that the requirements are being met?"

The obvious answer to the question is to implement a set of tests to prove that the code is working. Packages such as pytest make it possible to script and automate those tests. This will deliver some value to the project.

But we want to take a step back before leaping into solution mode. Fundamentally, we want to explore what becomes before any code (or tests) are written: i.e. the process of exploring, defining and refining the behaviour of the system you are creating in collaboration with end users and other stakeholders; and how then you can make that specification executable in order to verify that your requirements are met. This approach is quite nuanced, but when applied in the correct manner, it can unlock a range of benefits that will accelerate time to value, reduce risk and minimise the total cost of ownership (TCO).

In this blog we will demonstrate how a methodology called behaviour driven development (BDD) can be applied to data and analytics projects. We will do this by using BDD to capture the requirements introduced in the previous blog which are concerned with exploring the relationship between planets and terrains in the Star Wars planets dataset. The output will be an executable specification that we will use to verify that the requirements have been met. Along the way we will highlight how BDD provides a range of other benefits in addition to quality assurance.

What is BDD?

Behaviour driven development (BDD) has become popular in agile software development. The overall objective of BDD is stated succinctly by Gojko Adzic in his book "Specification by Example":

to specify, develop, and deliver the right software, without defects, in very short cycles

The focus of the BDD process is the executable specification. This is a natural language document, written using a specific syntax to describe the desired behaviour of the software system.

Let's explore the key aspects highlighted above in more detail:

  • Natural language (rather than code) is used to enable all stakeholders to get involved in capturing, exploring, challenging, prioritising, refining and evolving the requirements. This enables multi-disciplinary teams (product owners, end users, software engineers, UX specialists and test analysts) to collaborate around a single set of requirements documents. For a broader perspective on this, see Carmel's blog How to enable intra-business communication using user stories, BDD specs and a ubiquitous language

  • A common Syntax is adopted to enforce consistency and to encourage requirements to be defined at a sufficient level of detail. In this blog, we use the Gherkin syntax (more on this below) to encourage authors to adopt a consistent format. One of the key benefits about establishing this ubiquitous approach is that you are modelling the language of the business domain, as the language evolves, or you spot inconsistencies, this will have an impact on the implementation of the code as you hammer out those details.

  • Articulation of the desired behaviour is the focus of the BDD executable specification. Each feature is captured using the popular user story format: "As a [persona / role], I want to [action to perform], so that [business value delivered].". A user story is then expanded with a set of acceptance criteria (scenarios), each of which is captured using the syntax "Given [initial conditions], when [event], then [expected outcome]." This promotes a focus on business value, enables effective prioritisation of features and provides clarity by defining concrete examples.

  • Executable specifications are perhaps the most significant difference between BDD and other more traditional requirements documents. This involves writing a "thin layer" of code which enables the structured natural language statements captured above to be executed as tests over the software. Initially the tests will fail, because there is no software written, but as the software is written to address the acceptance criteria, the tests should then pass.

The BDD process

Application of BDD is based on the following approach:

  1. Capture requirements - by documenting them as features with supporting user stories. User stories adopt the "As a... I want... So that.." syntax.
  2. Prioritise - the user stories, identifying the user story that should be implemented next.
  3. Refine - by adding acceptance criteria to the chosen user story, in the "Given... When... Then..." syntax.
  4. Run the tests - as defined by the acceptance criteria above, initially they will fail.
  5. Write the code - that is required to make the tests pass.
  6. Verify - that the code implements the user story by checking reviewing and test results.
  7. Release - once tests pass, the code is feature complete and is ready for release.
  8. Refactor - code can continue to be refactored with confidence given existence of a comprehensive suite of BDD tests.

This is illustrated below:

Illustration of BDD process

This approach has a range of benefits:

  • Is consistent with Agile principles.
  • Fits well with short iterations and flow-based processes, by allowing the right level of information to be produced just-in-time.
  • Focuses engineering effort on the delivery of the business value with results that are verifiable.
  • Helps to avoid scope creep - it will tell you that you can stop and move on to the next feature, because when the test is "green", you are feature complete. Thus avoiding over polishing / gold plating a solution.
  • Allows the documentation and the code which it implements to be kept in sync. Documentation can "live" with the code in source control and be subject to the same code-ops practices. This results in documentation that is reliable, relevant and incurs minimum overhead to maintain.
  • Empowers multi-disciplinary teams by providing a central artefact around which experts from business, quality assurance and technology domains can collaborate.

The following picture captures this type of multi-disciplinary collaboration in action. This was a hackathon I was involved in about 4 years ago to explore how we could disrupt a 20 year old data driven process in the pensions industry. Here we have the sponsor (CTO, me), a business domain expert, a software engineer and a data expert from endjin working collaboratively to capture and prioritise requirements. By taking this approach, we were able to identify the most critical features and then work collaboratively to build a working prototype. In less than two weeks, we were able to demonstrate how the process could be digitally transformed to reduce the elapsed time for data processing from a time frame measured in months down to hours.

Multi-disciplinary working in action

We will now demonstrate how this approach can be adapted and applied to data and analytics projects. We do this by stepping through the process above, applying it to the Star Wars planets requirement explored in the previous blog.

Step 1 - capture requirements

We begin by authoring requirements using the Gherkin syntax. It comprises, two main parts:

  • Feature - a short title for the feature.
  • One or more User Story for the feature - which adopt the "As a ... I want ... So that ..." syntax.

Feature authoring doesn't need specialist tools such as an IDE. Statements can be captured with readily available tools such as Notepad or Word. There is also a free online Gherkin editor from Specflow (watch out for cookie policy pop-up!)

Turning to the domain we addressed in the previous blog let's consider what types of features and user stories that may emerge if we were to tackle this from first principles.

What is interesting about this stage is that you are forced to articulate the requirements from the perspective of the different personas who are involved in the process. This encourages you to engage with end users and other stakeholders, to understand their needs. This tends to lead to a better outcome - you will build that will the people who use it will actually use it will care about, and it will therefore deliver value.

For example, exploring the requirements of a Star Wars fan, you may may capture features such as the one below. It is written using language that is familiar to the Star Wars fan, capturing their key requirements:

Feature: Planetary terrains

    As a Star Wars fan
    I want to explore planetary terrains
    So that I can get a deeper understanding of the universe

Whereas, exploring the requirements of a data engineer, you may capture features such as the example below. Here the data engineer is thinking about the requirement in a more abstract way: there is no mention of Star Wars, they are thinking about the more generic challenge of reshaping the data in order to meet the higher level requirements.

Feature: Create bridge tables

    As a data engineer
    I want to parse entity tags into bridge tables
    So that I can create many-to-many relationships between entities

What's interesting about the examples provided above is that they capture different perspectives of the requirement, at different levels of abstraction as is appropriate for the specific persona.

  • The Star Wars fans don't want a bridge table. They don't know what a bridge table is. The Star Wars fan wants to explore the different terrains on the different planets, that's their domain / language.
  • The Data Engineer has started to think about the requirements in a more abstract manner. They don't mention Star Wars, they are thinking about how to wrangle the data in order to meet the higher level requirements by expressing the operations they need to perform on the underlying data.

This gets to the crux of BDD: it is fundamentally about establishing a common language that can be used to express the requirements from the perspective of different personas who are involved directly in the process at hand. As you capture more of these statements, you will build up a comprehensive understanding of this common language and how it integrates the different personas. Once the common language is established it will deliver significant benefits: reducing friction in the project, and leading to requirements that are unambiguous and consistent.

Step 2 - Prioritise

We're not going to cover this in depth in this blog. Imagine that you have decomposed the requirements into a set of features, each feature (and the lower level user stories) represent units of work that can be prioritised. Here, you gather the multi-disciplinary team, using techniques such as T shirt sizing and business value to prioritise the features.

The output is a prioritised backlog of features that you can feed into your Sprint / Kanban based processes.

Step 3 - Refine

For each user story that has been prioritised, further effort is required to refine it and establish the acceptance criteria. These are expressed as a set of scenarios that adopt the Gherkin step definition format "Given [initial conditions], when [event], then [expected outcome]" syntax.

This is process is iterative, analysis will spot inconsistencies and identify patterns across acceptance criteria statements, enabling you to adopt a consistent model of the language. The more consistent your language, the more you can reuse specification step language across different scenarios and use cases. One of the not so commonly talked about benefits of BDD is the productivity curve you get compared to traditional unit tests:

Once you reach critical mass, you are not writing new given / when / then statements, you are composing pre-existing ones.

This means you can create new specifications that leverage your existing BDD APIs, allowing you to write new textual specification without needing to implement any more test code.

Let's now imagine that the two example features we provided above have been prioritised and need to be refined.

In the first instance, we refine the feature captured for the Star Wars fan persona:

Feature: Planetary terrains

    As a Star Wars fan
    I want to explore planetary terrains
    So that I can get a deeper understanding of the universe

We expand the feature, bringing it to life by asking Star Wars fans to provide specific "real world" scenarios. For example:

Feature: Planetary terrains 

    As a Star Wars fan
    I want to explore planetary terrains
    So that I can get a deeper understanding of the universe

    Scenario: Two planets which share a terrain
        Given we have a planet called 'Yavin IV'
        And the planet 'Yavin IV' has a 'jungle' terrain
        And the planet 'Yavin IV' has a 'rainforests' terrain
        And we have a planet called 'Dagobah'
        And the planet 'Dagobah' has a 'swamp' terrain
        And the planet 'Dagobah' has a 'jungles' terrain
        When we look for planets with a 'jungle' terrain
        Then the 'Yavin IV' planet should be displayed
        And the 'Dagobah' planet should be displayed

In practice, many scenarios would be developed for each feature. The key principle being that the domain experts should be the authors of the scenarios, with support as necessary from others in the team.

But even this single simple example above of a scenario generates value at this stage by identifying an additional requirement that may not have been captured otherwise: there is a need to rationalise the use of singular and plural examples of some terrains, in this case "jungle" and "jungles". This highlights a benefit of the refinement step: the creation of real world scenarios as acceptance criteria is driving out features that we had not identified. By identifying these new features as early in the lifecycle as possible we are avoiding waste: how many times do nuanced requirements such as this only come to light once the solution is in production?

If it wasn't already captured, we would then capture the new feature (see below), then assess relative priority to determine whether it needs to be implemented immediately or can be put onto the backlog for implementation in the future.

Feature: Consolidate singular and plural examples of terrains 

    As a Star Wars fan
    I want only singular terrains to be displayed
    So that I don't have to reconcile singular and plural examples of terrains in my analysis

    Scenario: Singular and plural example of terrain
        Given we have a terrain called 'jungle'
        And we have a terrain called 'jungles'
        When we consolidate terrains
        Then the 'jungle' terrain should be displayed
        But the 'jungles' terrain should not be displayed

Now we turn our attention to the feature captured for the Data Engineer persona:

Feature: Create bridge tables

    As a data engineer
    I want to parse entity tags into bridge tables
    So that I can create many-to-many relationships between entities

For data features such as this, there will be a more technical focus, where we are seeking to identify:

  • The range of data scenarios that the feature needs to support. This should consider important edge cases.
  • A concrete example of the inputs and the expected outputs for each scenario.

These scenarios may require the preparation of suitable test data and calculation of expected results.

For our Star Wars planets project, we may identify a range of scenarios:

  • Single planet with one terrain.
  • Single planet with multiple terrains.
  • Single planet with no terrains.
  • Multiple planets with mix of no, one and multiple terrains.

For data intensive scenarios such as this, we can exploit Gherkin's table syntax to capture tabular data (see below). This enables the specification to be less verbose but without loosing the ability to get end users and other stakeholders involved in authoring / reviewing / refining and approving the document.

By supplying tables of data to articulate scenarios in this way, it enables people to transition from using Excel as source of test data to Gherkin. This has the benefit of keeping the context of the test together with the input / outputs.

Feature: Create bridge tables

    As a data engineer
    I want to parse entity tags into bridge tables
    So that I can create many-to-many relationships between entities

    Scenario: Single entity with multiple tags
        Given we an entity with the following tags
            | entity | tags                               |
            | Hoth   | tundra, ice caves, mountain ranges |
        When we parse the tags into the bridge table
        Then we expect the bridge table to contain the following entries
            | entity      | attribute       |
            | Hoth        | tundra          |
            | Hoth        | ice caves       |
            | Hoth        | mountain ranges |

What's important about all of the examples above, is that here is there's no notion of any internal implementation - it's not describing what's going on under the covers, just how the system should behave from the perspective of the persona in each case. The authoring of implementation independent specification is important to protect the long term value of the document - we want to avoid re-work as a result of downstream work such as refactoring the code where - where the software or infrastructure is being changed but the functionality it should deliver to end users is not.

Step 4 - Run the tests (expect them to fail)

Now we have the executable specifications, we can now execute them! From this point, for brevity, we'll take only the last feature forward.

The Gherkin specification is saved as a *.feature file in the ./features folder. In this case create_bridge_table.feature.

To enable execution, we have installed the Behave package which is an implementation of BDD for Python.

The documentation for Behave is comprehensive, so please reference that for more information about how to install and use the package.

With the Behave package installed, we simply run it from the command line. As you can see from the example below, Behave parses and executes the specified feature file:

$ behave features/create_bridge_table.feature
Feature: Create bridge tables # features/create_bridge_table.feature:1
  As a data engineer
  I want to parse entity tags into bridge tables
  So that I can create many-to-many relationships between entities
  Scenario: Single planet with multiple terrains  # features/create_bridge_table.feature:7
    Given we have the following input dataset     # None
      | planet_name | terrain                            |
      | Hoth        | tundra, ice caves, mountain ranges |
    When we create the bridge table               # None
    Then we expect the resulting dataset to be    # None
      | planet_name | terrain         |
      | Hoth        | tundra          |
      | Hoth        | ice caves       |
      | Hoth        | mountain ranges |

Failing scenarios:
  features/create_bridge_table.feature:7  Single planet with multiple terrains

0 features passed, 1 failed, 0 skipped
0 scenarios passed, 1 failed, 0 skipped
0 steps passed, 0 failed, 0 skipped, 3 undefined
Took 0m0.000s

You can implement step definitions for undefined steps with these snippets:

@given(u'we have the following input dataset')
def step_impl(context):
    raise NotImplementedError(u'STEP: Given we have the following input dataset')

@when(u'we create the bridge table')
def step_impl(context):
    raise NotImplementedError(u'STEP: When we create the bridge table')

@then(u'we expect the resulting dataset to be')
def step_impl(context):
    raise NotImplementedError(u'STEP: Then we expect the resulting dataset to be')

As expected, our tests fail. The important snippet from the output above is:

0 features passed, 1 failed, 0 skipped
0 scenarios passed, 1 failed, 0 skipped
0 steps passed, 0 failed, 0 skipped, 3 undefined

This indicates that the steps are "undefined" rather having been executed and then "failed". In the output above, Behave helpfully provides three code stubs that can be used as a starting point for defining these steps. We will be use these as the basis for the next step in the process.

Azure Weekly is a summary of the week's top Microsoft Azure news from AI to Availability Zones. Keep on top of all the latest Azure developments!

Note - to execute all specfications, you would simply involve:

$ behave

Behave then discovers all ./features/*.feature files which it will parse and execute in turn.

Step 5 - Write the code

We can now write the code necessary to implement the feature - this involves:

  • Implementing the functionality - in this case a class called PlanetTerrainWrangler with a method create_planet_terrain_bridge_table.
  • A "thin layer" of step code that will bridge the natural language statements in the Gherkin specification to the underlying functionality (the method above) that implements it.

In this example the underlying technology is Python and Pandas, but the same approach can be applied to any data engineering architecture.

This is visuallised in the diagram below:

Illustrating the role that step code plays in bridging gap between specification and functionality

For the step code, we use the stubs provided above in step 4 in the output from Behave, ending up with the following Python code which is saved in the ./features/steps folder in the project:

from behave import given, when, then
from pandas import DataFrame
import numpy as np
from pandas.testing import assert_frame_equal

from wranglers import PlanetTerrainWrangler

def table_to_dataframe(table):

    # Extract records from Behave's table object
    list_of_records = []
    for row in table:
        list_of_records.append(dict(zip(row.headings, row.cells)))

    # Create a Pandas datarframe
    df = DataFrame.from_records(list_of_records)
    # Parse any NaN strings, replacing them with a null value
    df = df.replace(to_replace="NaN", value=np.nan)
    return df

@given("we have the following input dataset")
def extract_dataset(context):
    context.input_dataset = table_to_dataframe(context.table)

@when("we create the bridge table")
def create_bridge_table(context):
    context.result = PlanetTerrainWrangler.create_planet_terrain_bridge_table(

@then("we expect the resulting dataset to be")
def compare_datasets(context):
    context.expected_result = table_to_dataframe(context.table)
        context.result, context.expected_result, check_like=True, check_index_type=False

Tip 1 - note the use of a table_to_dataframe helper function in the "given" and "then" steps. This enables the table syntax in the Gherkin specification to be converted into a Pandas DataFrame object that the create_planet_terrain_bridge_table method both needs as an input parameter and returns as an output.

Tip 2 - in the "then" step we make use of the Pandas assert_frame_equal method to verify that the result generated matches what is expected.

This is a simple example of the approach. In practice, we would:

  • Define the helper function in a separate Python module as it is likely to be used across multiple features.
  • Add more logic to allow for more sophisticated use cases. For example, the current implementation results in all columns being of type "object" (i.e. a string). We may need to add step logic to allow Gherkin scenarios to specify other data types in each column of the dataframe.

Step 6 - Verify

With the step code in place, we can now re-run our tests by executing the Gherkin specification:

$ behave
Feature: Create bridge tables # features/create_bridge_table.feature:1
  As a data engineer
  I want to parse entity tags into bridge tables
  So that I can create many-to-many relationships between entities
  Scenario: Single planet with multiple terrains  # features/create_bridge_table.feature:7
    Given we have the following input dataset     # features/steps/ 0.007s
      | planet_name | terrain                            |
      | Hoth        | tundra, ice caves, mountain ranges |
    When we create the bridge table               # features/steps/ 0.017s
    Then we expect the resulting dataset to be    # features/steps/ 0.005s
      | planet_name | terrain         |
      | Hoth        | tundra          |
      | Hoth        | ice caves       |
      | Hoth        | mountain ranges |

1 feature passed, 0 failed, 0 skipped
1 scenario passed, 0 failed, 0 skipped
3 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.028s
Programming C# 10 Book, by Ian Griffiths, published by O'Reilly Media, is now available to buy.

We now have a specification that passes. In practice, it is unlikely that all steps will pass at the first attempt, this process may find bugs that you need to address.

Step 7 - Release

We now have code that is fully documented and tested, with a detailed audit trail of the tests that have been run, the inputs used, the outputs generated and overall pass / fail result. This provides a powerful feedback loop to end users and stakeholders who have been involved in writing the specifications. Giving them the reassurance that code is in line with their requirements, and therefore can be released.

The automated execution of the Gherkin specifications would typically be integrated into the CI/CD pipelines such that passing all steps becomes a pre-requisite for release of the code, ensuring that quality assurance is baked into the release process.

Step 8 - Refactor

Refactoring and other actions to maintain the code such as bug fixing, patching / upgrading and infrastructure migration will be an ongoing concern.

The presence of implementation independent executable specifications will provide ongoing quality assurance for these tasks. Even under extreme circumstances such as implementing new underlying services (e.g. moving from Pandas to SQL API for data processing) only minor changes should be required to the "thin layer" of step code to get the tests to pass again.


In summary, we have demonstrated how the methodology, tools and technology that support BDD be applied to a data intensive use case.

We have used a simple "toy problem" to illustrate how this is possible. In the real world, the same principles can be applied to enable complex business critical use cases to tackled.

The initial investment required to establish BDD involves creating the DevOps infrastructure and building up the critical mass of statements (and associated step code) to underpin the Gherkin statements that are reflect the language specific to your business domain. Experience shows that once that investment has been made, the downstream benefits are significant:

  • It facilitates multi-disciplinary collaboration by providing an artefact around business stakeholders and technology professionals can collaborate effectively.
  • By expressing requirements as features and user stories, it enables effective prioritisation and rapid delivery cycles.
  • By setting out clear requirements and concrete executable acceptance criteria, defects, waste and re-work are minimised.
  • Time to value is accelerated.
  • The requirements documentation and the code which implements it are kept in sync. We no longer have documentation that gets out of sync with the solution.
  • Individual features can be traced through the end to end lifecycle. Making it possible to establish a "time to value" KPI for your product / service.

For example, we used BDD specs, written predominantly by a payroll expert, to define all of the HMRC payroll rules which we then implemented in code. This approach, allowed our client to become the first firm in the UK to gain HMRC RTI accreditation.


How can I run Gherkin feature files in a Python environment? We recommend using the Behave Python package.
How can I use BDD to develop a data engineering solution? If the datasets required to provide concrete scenarios are relatively small, we recommend using the table syntax supported by Gherkin to enable inputs and expected outputs to be defined in the body of the specification. Then its a case of writing the step code to enable these tables to be translated into a suitable object to enable the underlying functionality to be triggered.

Barry Smart

Director of Data & AI

Barry Smart

Barry has spent over 25 years in the tech industry; from developer to solution architect, business transformation manager to IT Director, and CTO of a £100m FinTech company. In 2020 Barry's passion for data and analytics led him to gain an MSc in Artificial Intelligence and Applications.