Skip to content
Liam Mooney By Liam Mooney Software Engineer I
How To Implement Continuous Deployment of Python Packages with GitHub Actions

TLDR; In this blog post we see how to use GitHub Actions, along with some nifty Python tools, to automatically publish updates to a Python package to a package index, with automated semantic versioning and changelog updates. It also touches on whether the presented solution constitutes Continuous Deployment. This is a follow on from a previous post on CI with GitHub Actions

In my previous post I introduced the components of a GitHub Actions workflow, I suggest you read that, or take a look at the GitHub Action documentation, if you aren't already familiar. This post is a follow-on, it covers how to "complete the loop" - to go from making code changes to having those changes appear in production, where production in this case means having an updated Python package available for download from PyPI. You might call it Continuous Deployment (Or should you? I'll touch on this a little towards the end of the post).

Continuous Integration (CI) is the first part of that loop; as a reminder: CI basically means automating the process of integrating code changes into a repository. It typically goes as follows: you commit code to a remote code repository (e.g. GitHub), this triggers an automated build and test process that is executed on a remote server, if the code builds and the tests pass, the code is automatically integrated into the main branch. The build & test process acts as a quality gate over the main branch and enures it is always in a deployable state. After the CI process succeeds the updated code should be ready to be deployed into production, so that's the next step - i.e.: "closing the loop" (where the loop is often referred to as CI/CD).

Automating the deployment to production step can reap substantial benefits. Just as CI reduces the friction and drama associated with integrating changes into a code base, automating deployment reduces friction associated with deployment. It also helps get software into the hands of users quicker, which makes you more competitive, and, crucially, it tightens the feedback loop between making changes and receiving feedback from users, which allows you to adapt to users' needs quicker – further boosting competitiveness. All of this helps to unlock the value of the investment made in the code change.

Automating deployment in GitHub Actions

I'm using the same sample Python project as in my previous post on CI, you can find the code in this GitHub repository. The structure of the project is shown below.

shapes
 ┣ .github
 ┃ ┗ workflows
 ┃ ┃ ┗ python-app.yml
 ┣ lgm_shapes_test
 ┃ ┣ shapes_2d.py
 ┃ ┣ shapes_3d.py
 ┃ ┗ __init__.py
 ┣ tests
 ┃ ┣ test_shapes_2d
 ┃ ┃ ┗ test_circle.py
 ┃ ┣ test_shapes_3d
 ┃ ┃ ┗ test_sphere.py
 ┣ CHANGELOG.md
 ┣ poetry.lock
 ┣ pyproject.toml
 ┗ README.md

Some notable differences in the project structure compared to the CI post include the addition of a CHANGELOG.md file, which we'll come to later, and the changed name of the folder containing the Python modules which form the package from shapes -> lgm_shapes_test, this was necessary as the name of this folder becomes the name of the package when deployed, and the name 'shapes' was already taken in the package index.

As before, we have a couple of Python modules, shapes_2d.py and shapes_3d.py, containing some source code for working with shapes, and some test code underneath the test folder - test_circle.py and test_sphere.py.

(There are some other files and folders - like .venv and .gitignore - that are in my repository but are not shown in the diagram above as they're not relevant to the topic of this blog post.)

The workflow file

Inside the .github\workflows\ directory at the root of the repo is the workflow file for this repo: python-app.yml. .github\workflows\ is the well-known directory that GitHub Actions looks in for workflow files, so be sure to store them here. The full workflow file, including the new CD piece, is shown below.

name: Python package

on:
  push:
    branches:
      - main
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9"]

    steps:
      - uses: actions/checkout@v3
      - name: Install Poetry
        run: pipx install poetry
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
          cache: "poetry"
      - name: Install dependencies
        run: poetry install
      - name: Test with Pytest
        run: poetry run pytest
      - name: Lint with flake8
        run: poetry run flake8

  release:
    needs: build
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9"]
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - name: Install Poetry
        run: pipx install poetry
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
          cache: "poetry"
      - name: Install dependencies
        run: poetry install
      - name: Prepare package for release
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          poetry run semantic-release publish
      - name: Publish to TestPyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          user: __token__
          password: ${{ secrets.TEST_PYPI_API_TOKEN }}
          repository_url: https://test.pypi.org/legacy/

The CI process is formed by the build job and is unchanged from previously, except for a change to the array value of the python-version key, from ["3.9, 3.10"] --> ["3.9"], this just means that the CI process will run once with Python 3.9 instead of twice with Python 3.9 and 3.10. I have also adjusted the trigger on the workflow - set by the on key near the top - from [push] to

  push:
    branches:
      - main
  pull_request:
Programming C# 12 Book, by Ian Griffiths, published by O'Reilly Media, is now available to buy.

meaning the workflow will be triggered when there is a PR opened and when changes are pushed to a branch with an open PR, and when changes are pushed into main. I've also added a branch protection policy on main to prevent changes being pushed directly to it, meaning the only way to get changes into main on GitHub is to push changes to a branch and submit a PR to get that branch merged into main, therefore that's the only situation in which the push trigger in my workflow will cause a workflow run.

The CD process is formed by the release job, let's walk through that step-by-step to see what's going on.

needs: build

The needs key allows you to specify any jobs that must complete successfully before this job will run. Therefore, in this case, we're saying that the release job should only run when the build job completes successfully, this makes sense - we don't want to deploy to production if the CI process has failed.

if: github.event_name == 'push' && github.ref == 'refs/heads/main'

The if key on a job allows you to specify a condition which must pass in order for the job to run. Here, we're saying that the release job should only run if the event that has triggered the workflow was a commit pushed to main. Given the workflow trigger configuration described earlier, this means that the build job will only run when a PR is merged into main, which is the behaviour we want - we don't want to rebuild the package and push to the package index when a PR is opened, or when a commit is pushed to a PR branch.

runs-on: ubuntu-latest
strategy:
    matrix:
      python-version: ["3.9"]

I covered these two pieces in my previous post; the first line specifies the OS on the VM (or what GitHub calls a runner) that the job will run on to be the latest version of Ubuntu; the next three lines define a matrix of job configurations, typically you would have multiple values in the matrix (as I had previously).

Next is the steps key, which is a child of the job id (release), this is the heart of the job: it defines the steps that define the work that the job executes.

The first step of the release job is

- uses: actions/checkout@v3
  with:
    fetch-depth: 0

which will run the actions/checkout@v3 action with an argument of 0 for the parameter fetch-depth.

The actions/checkout@v3 action copies your repository onto the runner VM to allow the workflow to run scripts and actions against a copy of the code. Specifying fetch-depth: 0 tells the action to copy the full history of commits onto the runner VM, this is necessary as I'm using an automatic semantic versioning tool - Python semantic release - that parses the entire commit history on a repo to figure out what the version of the package should be.

Also, it's necessary to include this action despite having it in the build job, as each job runs in its own fresh runner environment.

The next step

- name: Install Poetry
  run: pipx install poetry

executes the shell command pipx install poetry, this installs a Python package called Poetry which is the dependency management tool I'm using in this project.

The next step

- name: Prepare package for release
  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
      git config user.name github-actions
      git config user.email github-actions@github.com
      poetry run semantic-release publish

runs three commands, the last of which is poetry run semantic-release publish, this is calling the publish command on the 'semantic-release' tool. This command is the core of this step and is an important part of the workflow, so let's talk about it a little.

The semantic-release publish command does a sequence of things:

  1. Updates the CHANGELOG.md file to document what has changed since the previous release
  2. Bumps the package version number by searching the commit history since the latest release, you can see how it does that by reading the official documentation.
  3. Tags the repo at the current state, pushes the tag to the GitHub repository, and creates a release from the tag on GitHub
  4. Builds the sdist and wheel files, which make up the package

I have provided configuration settings to the Python semantic release tool via this section in the pyproject.toml:

[tool.semantic_release]
version_variable = "pyproject.toml:version" # version location
branch = "main"                             # branch to make releases of
changelog_file = "CHANGELOG.md"             # changelog file
build_command = "poetry build"              # build dists
dist_path = "dist/"                         # where to put dists
upload_to_release = true                    # auto-create GitHub release
upload_to_pypi = false                      # don't auto-upload to PyPI
remove_dist = false                         # don't remove dists
patch_without_tag = true                    # patch release by default

I learned these configuration settings and copied the explanatory comments from this online book, you can also find the Python semantic release tool's configuration options from this page of the official documentation.

So, that a pretty nifty command - it's doing a lot for us. Now, for the other parts of this step. The semantic release tool is making changes to our repo via changes to the CHANGELOG.md file and creating the tag & release, it needs permission to do this. GitHub provides a mechanism to provide permission via the secret secrets.GITHUB_TOKEN; the next two lines

  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

set an environment variable called GH_TOKEN with a value equal to this secret. semantic-release publish then uses this environment variable to authenticate with GitHub to make changes on the repo.

The other two commands - git config user.email github-actions@github.com & git config user.email github-actions@github.com - are necessary for reasons that are explained in this bit of GitHub documentation.

The next step

- name: Install dependencies
  run: poetry install

runs poetry install which installs the project's dependencies specified in the poetry.lock file in the repo. This ensures that the environment on the GitHub runner in which the code is being ran has the necessary dependencies installed that the code requires, these are the same dependencies as in the development environment (i.e. the environment used to develop the package), and are the dependencies that will need to be installed by anyone wanting to use this package (i.e. the production environment).

The final step

- name: Publish to TestPyPI
  uses: pypa/gh-action-pypi-publish@release/v1
  with:
    user: __token__
    password: ${{ secrets.TEST_PYPI_API_TOKEN }}
    repository_url: https://test.pypi.org/legacy/

publishes the build outputs to TestPyPI which is a test instance of the official Python Packaging Index (PyPI). I have created an account on TestPyPI and generated an API token, and set the value of this token as secret on GitHub called TEST_PYPI_API_TOKEN. This step is using the pypa/gh-action-pypi-publish action to perform the publication (you could also do this directly with poetry publish); I'm providing the API token value as an argument to the password parameter.

Viewing workflow runs in GitHub

The workflow in action

I've added a new Cylinder class to shapes_3d.py, as the screenshot below shows.

add cylinder class

I've then committed this change to a feature branch (feature/cylinder) with the commit message: 'feat: adds cylinder type', and pushed it up to GitHub; after that I've opened a PR into main, this triggers a workflow run but doesn't meet the conditions required of the release job, so only the build job runs, as the screenshot below shows. The name of the PR is the same as the commit message.

GitHub Actions workflow run PR

Later, I merged the PR into main which triggered an other workflow run (merging the PR causes a push event into main), this time the conditions of the release job were met, as shown by the screenshot below.

GitHub Actions workflow run PR

The version of the package before this change was 0.2.2; the semantic release tool has recognised the 'feat' keyword in the commit message and bumped the minor part of the version number, resulting in a new package version: 0.3.0.

Power BI Weekly is a collation of the week's top news and articles from the Power BI ecosystem, all presented to you in one, handy newsletter!

As I explained earlier, the repo will be tagged at this point in its history and a release will be created from that tag on GitHub, as shown by the screenshot below.

GitHub tag and release

And, the new package version has been pushed to TestPyPI, as shown by the screenshot below.

New package version on TestPyPI

Although the example shown demonstrates the package being pushed to TestPyPI, the process would be exactly the same if you wanted to push to actual PyPI. However, you may want to keep the push to TestPyPI part of your workflow to test that the package can be downloaded correctly; you would have a couple of extra steps in your workflow: push to TestPyPI, then download the package from TestPyPI, then, if the previous steps ran smoothly, push to the actual PyPI.

One area that could be improved in the workflow presented is around preventing slippage between the build and release jobs. Notice the build and release jobs are each installing the package's dependencies via poetry install; we can't be certain that the dependency versions installed in both of these steps is exactly the same, and consequently we cannot be certain that the code behavior being tested in the build job is exactly the same as what's being built and deployed in the release job. In an ideal world, the exact code that is tested in the build job would be getting released in the release job. This could be achieved by building the package artifacts in the build job and caching them, and then have the release job pull them out the cache and use them for publication.

Is this Continuous Deployment?

You may have noticed that I've refrained from using the term "continuos deployment" up to now. Why? Well, I originally wrote the post centred around CD; the first draft was titled "Continuous deployment with GitHub Actions", it contained the exact same workflow as what's shown here - I essentially declared "Continuous Deployment = Done". Now, if you define Continuous Deployment as something like: "automating the process of deploying code changes to production", then I think the workflow presented constitutes CD. But, it glosses over many of the concerns and challenges associated with implementing CD.

If you think what the statement, "automating the process of deploying code changes to production", means - it means removing the human from the loop. In practice that might mean something like this: developer makes changes on feature branch, issues a pull request into main, at which point the CI process kicks in (now, you might have a policy here that prevents branch merging without the approval of a reviewer, but you might not); if the CI passes, the code gets merged, and the new main branch - now including the code changes - gets pushed to production. This is clearly a much riskier situation, you make it much easier to introduce bad code into the production system.

So, if you want to completely take the human out of the loop you need to ask: "what processes do we need in place to be confident before taking the human out of the loop?". Put another way, the work required to go from a situation where everything in your deployment pipeline is automated up to the final "Push to Production button", which has to be pressed by a human, to a situation where you're confident enough to allow a computer to push that button for you (i.e. fully automated), is likely going to be significant.

For reference, Microsoft provide definitions for some of these popular terms - Continuous Integration, Continuous Delivery, Continuous Deployment/ Continuous Delivery (CI/CD), and Continuous Deployment - in this Azure article.

Continuous Integration

Under continuous integration, the develop phase—building and testing code—is fully automated. Each time you commit code, changes are validated and merged to the master branch, and the code is packaged in a build artifact.

Continuous Delivery

Continuous delivery automates the next phase: deliver. Under continuous delivery, anytime a new build artifact is available, the artifact is automatically placed in the desired environment and deployed.

Continuous Integration/ Continuous Delivery (CI/CD)

When teams implement both continuous integration and continuous delivery (CI/CD), the develop and the deliver phases are automated. Code remains ready for production at any time. All teams must do is manually trigger the transition from develop to deploy—making the automated build artifact available for automatic deployment—which can be as simple as pressing a button.

Continuous Deployment

With continuous deployment, you automate the entire process from code commit to production. The trigger between the develop and deliver phases is automatic, so code changes are pushed live once they receive validation and pass all tests. This means customers receive improvements as soon as they’re available.

It's also worth saying that a complete CI/CD process for an application that is running live and interacting with infrastructure, like databases, (such as a web app) is likely to be considerably more complicated than what I've presented. Such an example would usually involve deploying to multiple environments in addition to production, such as an integration environment to test if the code integrates successfully with a replica of the infrastructure setup in the production environment; and a performance environment for performance testing. A full DevOps solution would also utilise infrastructure as code (IaC) for programmatic deployment and management of infrastructure, which would have its own CI/CD processes for automated change management and deployment of infrastructure, just like application code - see my Colleague, James Dawson's, article on GitOps.

Perhaps I'll get into this more in a future post.

@lg_mooney | @endjin

FAQs

What is Continuous Integration? Continuous Integration is a practice for automating the process of *integrating* code changes into the main branch. Under Continuous Integration, when a developer commits changes to main on the remote repository, an automated build and test process is triggered; changes cannot be merged into main unless these processes succeed, they therefore act as a quality gate over the main branch and ensure it is always in a deployable state.
What is Continuous Delivery? Continuous delivery comes the idea of idea continuously delivering values to customers by frequently releasing new software. I.e. automating *delivery*. In practice this means automating everything after the CI process up to a point were the build artifact is ready to be deployed to production at the push of a button.
What is CI/CD? Continuous Integration and Continuous Delivery (CI/CD) is achieved when both Continuous Integration and Continuous Delivery are implemented - automating the process from development up to the point of manually pushing a button to deploy.
What is Continuous Deployment? Under continuous deployment, you fully automate the entire development to release process. As soon as changes pass the CI process and any other validation checks, they’re automatically deployed to production. In other words, the human is removed from the loop; there is no final “push to production” button, it’s fully automated.
What is semantic versioning? Semantic version is particular strategy for versioning software, it provides a common language for software versioning. A semantic version number has three parts a.b.c, for example 2.1.1 – pronounced "two dot one dot one". The first number, a, is called the major version, this is incremented when a breaking change is made. The second, b, is called the minor version, this is incremented when backward-compatible changes are made. And the third number, c, is called the patch version, and is incremented when backward-compatible bug fixes are made.

Liam Mooney

Software Engineer I

Liam Mooney

Liam studied an MSci in Physics at University College London, which included modules on Statistical Data Analysis, High Performance Computing, Practical Physics and Computing. This led to his dissertation exploring the use of machine learning techniques for analysing LHC particle collision data.

Before joining endjin, Liam had a keen interest in data science and engineering, and did a number of related internships. However, since joining endjin he has developed a much broader set of interest, including DevOps and more general software engineering. He is currently exploring those interests and finding his feet in the tech space.