Wednesday, January 12, 2022

Python Monorepo

Work in progress

Documenting the process of creating a Python monorepo1 with tooling that supports an enjoyable developer experience.

Why monorepo?

Most applications consist of more than a single deployable component. For example, a public API for a web app, a private API for internal consumption and job processors just to name a few.

These components are often closely related within a domain2. One option is to define this domain once and distribute it to the components as a package (e.g. NPM3, PyPI4, etc.). Any changes to the package will need to be redistributed to the components as a new version, requiring an update to those components. If the package changes often, the developer experience of making a change and then updating the components quickly becomes toilsome.

Alternatively we could keep all the deployable components and the domain definition together in a single repository. This reduces the friction of sharing the domain with the components. The support for monorepos differs between programming languages, with this post focusing on to achieve such with Python.

Structure

The repository will be structured around Clean Architecture5. This approach defines a core consisting of the domain models, interfaces and business logic handlers. The core interfaces are then implemented in an infrastructure layer, where specific technologies and dependencies are chosen. Finally, the deployable components (e.g. API) depend on the infrastructure and core layers to provide the functionality for the presentation layer.

onion architecture

The example domain will be an address book where we can save addresses. The structure will be as follows:

G cluster_libs /libs cluster_core core cluster_infrastructure infrastructure cluster_api /api Address Address AddressRepository AddressRepository Address->AddressRepository SaveAddressHandler SaveAddressHandler Address->SaveAddressHandler AddressRepository->SaveAddressHandler MemoryAddressRepository MemoryAddressRepository AddressRepository->MemoryAddressRepository FastApi FastApi SaveAddressHandler->FastApi MemoryAddressRepository->FastApi

Features

The monorepo will support the following features:

  • Simple and predictable dependency management
    • Easy to add, remove and document packages
    • Consistent package versions installed
  • Testing
    • Debugging should be easy
  • Typechecking
  • Linting
    • Enforce consistent best practices
    • Sorting imports
    • Configurable
  • Formatting
    • Automated fixes
    • Enforce consistent style for readability
    • Configurable

All the above should support both the command line and an integrated development environment (IDE), Visual Studio Code6 in this case. Command line support is important to enable scripting for use in CI/CD pipelines as well provide an escape hatch to prevent IDE lockin.

Simple and predictable dependency management

A simple approach to Python package management would be to use pip7. This is fine if you work on a single Python repository on your machine. Dependency conflicts can arise once you start working on multiple Python repositories as dependencies are installed into the global scope. To isolate the dependencies of each repository, we can use virtual environments (venv)8 - which requires some ceremony to ensure we are installing and running Python within the venv.

Poetry9 is a dependency management tool for Python that abstracts away the above complexity, providing a consistent interface for installing and running Python code.

Creating projects

A new project can be created using:

poetry new [project-name]

The example has two projects libs and api.

Each has a pyproject.toml that uses Poetry as the build-system as Poetry and defineds the dependencies used.

[tool.poetry]
name = "libs"
version = "0.1.0"
description = ""
authors = ["tsukiy0 <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.9"

[tool.poetry.dev-dependencies]
black = "^21.12b0"
Faker = "^11.3.0"
flake8 = "^4.0.1"
isort = "^5.10.1"
mypy = "^0.931"
pytest = "^6.2.5"
pytest-asyncio = "^0.17.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

By default, Poetry does not install packages in a local venv folder in the project. This can be an issue for IDEs that resolve tools from a local venv, resulting in broken tool intergrations like linting, formatting and testing. To fix this, we can configure Poetry to install to a local venv10 in a poetry.toml.

[virtualenvs]
in-project = true

Adding dependencies

poetry add [dependency-name]

Removing dependencies

poetry remove [dependency-name]

Reinstall

poetry env remove .venv
poetry install

Testing

poetry run pytest

Linting

poetry run flake8 ./[project-name] ./tests
poetry run isort --profile black .

Typechecking

poetry run mypy .

Typechecking

poetry run mypy .

Footnotes

  1. https://github.com/tsukiy0/python-monorepo-template

  2. https://martinfowler.com/bliki/BoundedContext.html

  3. https://www.npmjs.com/

  4. https://pypi.org/

  5. https://docs.microsoft.com/en-us/dotnet/architecture/modern-web-apps-azure/common-web-application-architectures#clean-architecture

  6. https://code.visualstudio.com/

  7. https://packaging.python.org/en/latest/tutorials/installing-packages/

  8. https://docs.python.org/3/library/venv.html

  9. https://python-poetry.org/

  10. https://python-poetry.org/docs/configuration/#virtualenvsin-project