How To Implement Continuous Deployment of Python Packages with GitHub Actions
TLDR; In this blog post we see how to use GitHub Actions, along with some nifty Python tools, to automatically publish updates to a Python package to a package index, with automated semantic versioning and changelog updates. It also touches on whether the presented solution constitutes Continuous Deployment. This is a follow on from a previous post on CI with GitHub Actions
In my previous post I introduced the components of a GitHub Actions workflow, I suggest you read that, or take a look at the GitHub Action documentation, if you aren't already familiar. This post is a follow-on, it covers how to "complete the loop" - to go from making code changes to having those changes appear in production, where production in this case means having an updated Python package available for download from PyPI. You might call it Continuous Deployment (Or should you? I'll touch on this a little towards the end of the post).
Continuous Integration (CI) is the first part of that loop; as a reminder: CI basically means automating the process of integrating code changes into a repository. It typically goes as follows: you commit code to a remote code repository (e.g. GitHub), this triggers an automated build and test process that is executed on a remote server, if the code builds and the tests pass, the code is automatically integrated into the main branch. The build & test process acts as a quality gate over the main branch and enures it is always in a deployable state. After the CI process succeeds the updated code should be ready to be deployed into production, so that's the next step - i.e.: "closing the loop" (where the loop is often referred to as CI/CD).
Automating the deployment to production step can reap substantial benefits. Just as CI reduces the friction and drama associated with integrating changes into a code base, automating deployment reduces friction associated with deployment. It also helps get software into the hands of users quicker, which makes you more competitive, and, crucially, it tightens the feedback loop between making changes and receiving feedback from users, which allows you to adapt to users' needs quicker – further boosting competitiveness. All of this helps to unlock the value of the investment made in the code change.
Automating deployment in GitHub Actions
I'm using the same sample Python project as in my previous post on CI, you can find the code in this GitHub repository. The structure of the project is shown below.
shapes
┣ .github
┃ ┗ workflows
┃ ┃ ┗ python-app.yml
┣ lgm_shapes_test
┃ ┣ shapes_2d.py
┃ ┣ shapes_3d.py
┃ ┗ __init__.py
┣ tests
┃ ┣ test_shapes_2d
┃ ┃ ┗ test_circle.py
┃ ┣ test_shapes_3d
┃ ┃ ┗ test_sphere.py
┣ CHANGELOG.md
┣ poetry.lock
┣ pyproject.toml
┗ README.md
Some notable differences in the project structure compared to the CI post include the addition of a CHANGELOG.md
file, which we'll come to later, and the changed name of the folder containing the Python modules which form the package from shapes
-> lgm_shapes_test
, this was necessary as the name of this folder becomes the name of the package when deployed, and the name 'shapes' was already taken in the package index.
As before, we have a couple of Python modules, shapes_2d.py
and shapes_3d.py
, containing some source code for working with shapes, and some test code underneath the test folder - test_circle.py
and test_sphere.py
.
(There are some other files and folders - like .venv
and .gitignore
- that are in my repository but are not shown in the diagram above as they're not relevant to the topic of this blog post.)
The workflow file
Inside the .github\workflows\
directory at the root of the repo is the workflow file for this repo: python-app.yml
. .github\workflows\
is the well-known directory that GitHub Actions looks in for workflow files, so be sure to store them here. The full workflow file, including the new CD piece, is shown below.
name: Python package
on:
push:
branches:
- main
pull_request:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9"]
steps:
- uses: actions/checkout@v3
- name: Install Poetry
run: pipx install poetry
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install dependencies
run: poetry install
- name: Test with Pytest
run: poetry run pytest
- name: Lint with flake8
run: poetry run flake8
release:
needs: build
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9"]
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install Poetry
run: pipx install poetry
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install dependencies
run: poetry install
- name: Prepare package for release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git config user.name github-actions
git config user.email github-actions@github.com
poetry run semantic-release publish
- name: Publish to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
The CI process is formed by the build
job and is unchanged from previously, except for a change to the array value of the python-version
key, from ["3.9, 3.10"]
--> ["3.9"]
, this just means that the CI process will run once with Python 3.9 instead of twice with Python 3.9 and 3.10. I have also adjusted the trigger on the workflow - set by the on
key near the top - from [push]
to
push:
branches:
- main
pull_request:
meaning the workflow will be triggered when there is a PR opened and when changes are pushed to a branch with an open PR, and when changes are pushed into main
. I've also added a branch protection policy on main
to prevent changes being pushed directly to it, meaning the only way to get changes into main
on GitHub is to push changes to a branch and submit a PR to get that branch merged into main
, therefore that's the only situation in which the push
trigger in my workflow will cause a workflow run.
The CD process is formed by the release
job, let's walk through that step-by-step to see what's going on.
needs: build
The needs
key allows you to specify any jobs that must complete successfully before this job will run. Therefore, in this case, we're saying that the release
job should only run when the build
job completes successfully, this makes sense - we don't want to deploy to production if the CI process has failed.
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
The if
key on a job allows you to specify a condition which must pass in order for the job to run. Here, we're saying that the release
job should only run if the event that has triggered the workflow was a commit pushed to main
. Given the workflow trigger configuration described earlier, this means that the build
job will only run when a PR is merged into main
, which is the behaviour we want - we don't want to rebuild the package and push to the package index when a PR is opened, or when a commit is pushed to a PR branch.
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9"]
I covered these two pieces in my previous post; the first line specifies the OS on the VM (or what GitHub calls a runner) that the job will run on to be the latest version of Ubuntu; the next three lines define a matrix of job configurations, typically you would have multiple values in the matrix (as I had previously).
Next is the steps
key, which is a child of the job id (release
), this is the heart of the job: it defines the steps that define the work that the job executes.
The first step of the release
job is
- uses: actions/checkout@v3
with:
fetch-depth: 0
which will run the actions/checkout@v3
action with an argument of 0
for the parameter fetch-depth
.
The actions/checkout@v3
action copies your repository onto the runner VM to allow the workflow to run scripts and actions against a copy of the code. Specifying fetch-depth: 0
tells the action to copy the full history of commits onto the runner VM, this is necessary as I'm using an automatic semantic versioning tool - Python semantic release - that parses the entire commit history on a repo to figure out what the version of the package should be.
Also, it's necessary to include this action despite having it in the build
job, as each job runs in its own fresh runner environment.
The next step
- name: Install Poetry
run: pipx install poetry
executes the shell command pipx install poetry
, this installs a Python package called Poetry which is the dependency management tool I'm using in this project.
The next step
- name: Prepare package for release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git config user.name github-actions
git config user.email github-actions@github.com
poetry run semantic-release publish
runs three commands, the last of which is poetry run semantic-release publish
, this is calling the publish
command on the 'semantic-release' tool. This command is the core of this step and is an important part of the workflow, so let's talk about it a little.
The semantic-release publish
command does a sequence of things:
- Updates the
CHANGELOG.md
file to document what has changed since the previous release - Bumps the package version number by searching the commit history since the latest release, you can see how it does that by reading the official documentation.
- Tags the repo at the current state, pushes the tag to the GitHub repository, and creates a release from the tag on GitHub
- Builds the sdist and wheel files, which make up the package
I have provided configuration settings to the Python semantic release tool via this section in the pyproject.toml
:
[tool.semantic_release]
version_variable = "pyproject.toml:version" # version location
branch = "main" # branch to make releases of
changelog_file = "CHANGELOG.md" # changelog file
build_command = "poetry build" # build dists
dist_path = "dist/" # where to put dists
upload_to_release = true # auto-create GitHub release
upload_to_pypi = false # don't auto-upload to PyPI
remove_dist = false # don't remove dists
patch_without_tag = true # patch release by default
I learned these configuration settings and copied the explanatory comments from this online book, you can also find the Python semantic release tool's configuration options from this page of the official documentation.
So, that a pretty nifty command - it's doing a lot for us. Now, for the other parts of this step. The semantic release tool is making changes to our repo via changes to the CHANGELOG.md
file and creating the tag & release, it needs permission to do this. GitHub provides a mechanism to provide permission via the secret secrets.GITHUB_TOKEN
; the next two lines
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
set an environment variable called GH_TOKEN
with a value equal to this secret. semantic-release publish
then uses this environment variable to authenticate with GitHub to make changes on the repo.
The other two commands - git config user.email github-actions@github.com
& git config user.email github-actions@github.com
- are necessary for reasons that are explained in this bit of GitHub documentation.
The next step
- name: Install dependencies
run: poetry install
runs poetry install
which installs the project's dependencies specified in the poetry.lock
file in the repo. This ensures that the environment on the GitHub runner in which the code is being ran has the necessary dependencies installed that the code requires, these are the same dependencies as in the development environment (i.e. the environment used to develop the package), and are the dependencies that will need to be installed by anyone wanting to use this package (i.e. the production environment).
The final step
- name: Publish to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
publishes the build outputs to TestPyPI which is a test instance of the official Python Packaging Index (PyPI). I have created an account on TestPyPI and generated an API token, and set the value of this token as secret on GitHub called TEST_PYPI_API_TOKEN
. This step is using the pypa/gh-action-pypi-publish
action to perform the publication (you could also do this directly with poetry publish
); I'm providing the API token value as an argument to the password
parameter.
Viewing workflow runs in GitHub
The workflow in action
I've added a new Cylinder
class to shapes_3d.py
, as the screenshot below shows.
I've then committed this change to a feature branch (feature/cylinder
) with the commit message: 'feat: adds cylinder type', and pushed it up to GitHub; after that I've opened a PR into main
, this triggers a workflow run but doesn't meet the conditions required of the release
job, so only the build
job runs, as the screenshot below shows. The name of the PR is the same as the commit message.
Later, I merged the PR into main
which triggered an other workflow run (merging the PR causes a push
event into main
), this time the conditions of the release
job were met, as shown by the screenshot below.
The version of the package before this change was 0.2.2
; the semantic release tool has recognised the 'feat' keyword in the commit message and bumped the minor part of the version number, resulting in a new package version: 0.3.0
.
As I explained earlier, the repo will be tagged at this point in its history and a release will be created from that tag on GitHub, as shown by the screenshot below.
And, the new package version has been pushed to TestPyPI, as shown by the screenshot below.
Although the example shown demonstrates the package being pushed to TestPyPI, the process would be exactly the same if you wanted to push to actual PyPI. However, you may want to keep the push to TestPyPI part of your workflow to test that the package can be downloaded correctly; you would have a couple of extra steps in your workflow: push to TestPyPI, then download the package from TestPyPI, then, if the previous steps ran smoothly, push to the actual PyPI.
One area that could be improved in the workflow presented is around preventing slippage between the build
and release
jobs. Notice the build
and release
jobs are each installing the package's dependencies via poetry install
; we can't be certain that the dependency versions installed in both of these steps is exactly the same, and consequently we cannot be certain that the code behavior being tested in the build
job is exactly the same as what's being built and deployed in the release
job. In an ideal world, the exact code that is tested in the build
job would be getting released in the release
job. This could be achieved by building the package artifacts in the build
job and caching them, and then have the release
job pull them out the cache and use them for publication.
Is this Continuous Deployment?
You may have noticed that I've refrained from using the term "continuos deployment" up to now. Why? Well, I originally wrote the post centred around CD; the first draft was titled "Continuous deployment with GitHub Actions", it contained the exact same workflow as what's shown here - I essentially declared "Continuous Deployment = Done". Now, if you define Continuous Deployment as something like: "automating the process of deploying code changes to production", then I think the workflow presented constitutes CD. But, it glosses over many of the concerns and challenges associated with implementing CD.
If you think what the statement, "automating the process of deploying code changes to production", means - it means removing the human from the loop. In practice that might mean something like this: developer makes changes on feature branch, issues a pull request into main, at which point the CI process kicks in (now, you might have a policy here that prevents branch merging without the approval of a reviewer, but you might not); if the CI passes, the code gets merged, and the new main branch - now including the code changes - gets pushed to production. This is clearly a much riskier situation, you make it much easier to introduce bad code into the production system.
So, if you want to completely take the human out of the loop you need to ask: "what processes do we need in place to be confident before taking the human out of the loop?". Put another way, the work required to go from a situation where everything in your deployment pipeline is automated up to the final "Push to Production button", which has to be pressed by a human, to a situation where you're confident enough to allow a computer to push that button for you (i.e. fully automated), is likely going to be significant.
For reference, Microsoft provide definitions for some of these popular terms - Continuous Integration, Continuous Delivery, Continuous Deployment/ Continuous Delivery (CI/CD), and Continuous Deployment - in this Azure article.
Continuous Integration
Under continuous integration, the develop phase—building and testing code—is fully automated. Each time you commit code, changes are validated and merged to the master branch, and the code is packaged in a build artifact.
Continuous Delivery
Continuous delivery automates the next phase: deliver. Under continuous delivery, anytime a new build artifact is available, the artifact is automatically placed in the desired environment and deployed.
Continuous Integration/ Continuous Delivery (CI/CD)
When teams implement both continuous integration and continuous delivery (CI/CD), the develop and the deliver phases are automated. Code remains ready for production at any time. All teams must do is manually trigger the transition from develop to deploy—making the automated build artifact available for automatic deployment—which can be as simple as pressing a button.
Continuous Deployment
With continuous deployment, you automate the entire process from code commit to production. The trigger between the develop and deliver phases is automatic, so code changes are pushed live once they receive validation and pass all tests. This means customers receive improvements as soon as they’re available.
It's also worth saying that a complete CI/CD process for an application that is running live and interacting with infrastructure, like databases, (such as a web app) is likely to be considerably more complicated than what I've presented. Such an example would usually involve deploying to multiple environments in addition to production, such as an integration environment to test if the code integrates successfully with a replica of the infrastructure setup in the production environment; and a performance environment for performance testing. A full DevOps solution would also utilise infrastructure as code (IaC) for programmatic deployment and management of infrastructure, which would have its own CI/CD processes for automated change management and deployment of infrastructure, just like application code - see my Colleague, James Dawson's, article on GitOps.
Perhaps I'll get into this more in a future post.