Thursday, January 20, 2022

Managing Python dependencies

The Python community has adopted several approaches to managing dependencies to meet the requirements of developers. This post will touch on the following:

  • pip
  • virtualenv
  • poetry

We will demonstrate the usage of each and where they shine and highlight the tradeoffs.

Demo project

The project only requires docker1 to be installed.

Contents

There are three sub-projects, creatively named project_1, project_2, and project_3.

Each project depends on pandas2 and the main.py script prints the pandas versions used.

Run

To startup the demo environment, run:

docker-compose run env /bin/bash

This will start a shell in a container where Python is installed and the three sub-projects mounted. Validate this by printing out the folder contents:

ls
# project_1  project_2  project_3

Using pip

pip34 is a command line tool for adding and removing Python packages from PyPI (or a custom repository5).

Using pip is the simplest way to manage a project's dependencies. Dependencies can be explicitly declared in a requirements.txt and installed by referencing it.

Install the packages in project_1:

pip install -r requirements.txt
# Successfully installed numpy-1.22.1 pandas-1.3.5 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0

Print the version used:

python main.py
# 1.3.5

The version 1.3.5 matches what was installed. Looks good to me.

Conflicting dependencies

Dependencies installed using pip are stored globally. When multiple projects use the same dependencies, but different versions, a conflict arises.

Install the packages in project_2:

pip install -r requirements.txt
# Installing collected packages: pandas
#   Attempting uninstall: pandas
#     Found existing installation: pandas 1.3.5
#     Uninstalling pandas-1.3.5:
#       Successfully uninstalled pandas-1.3.5
# Successfully installed pandas-1.2.5

project_2 requires pandas=1.2.5, which is different from the pandas=1.3.5 requirement of project_1. pip resolved the conflict by uninstalling 1.3.5 and using 1.2.5 instead.

Print the version in project_2:

python main.py
# 1.2.5

Print the version in project_1:

python main.py
# 1.2.5

In a real world scenario, if project_1 was using APIs in pandas only available to 1.3.5 it would likely break.

A simple solution would be to remember to reinstall before starting work in a project. Though simple, it can be easy to forget and still prevents us running the two projects simultaneously.

Isolating dependencies with virtualenv

virtualenv6 is a tool for creating isolated Python environments - that is it installs dependencies local to a project.

Setup a virtual environment in project_1:

python3 -m venv .venv

This created a .venv directory, which will house the installed dependencies.

Activate the environment:

source .venv/bin/activate

This does some magic (pending further research), whereby both python and pip are setup to run from within .venv.

which python
# /opt/app/project_1/.venv/bin/python
which pip
# /opt/app/project_1/.venv/bin/pip

We can now proceed to install the dependencies in project_1 again:

pip install -r requirements.txt

The correct version should be printed:

python main.py
# 1.3.5

While the virtual environment is active, the dependencies installed within it are used - effectively isolating the project from the previous globally installed packages.

To deactivate the virtual environment, run:

deactivate

This is a simple way to solve the issues we had using just pip, enabling us to isolate project dependencies, but there is still some ceremony in creating and activating the virtual environment - but not too bad!

Much more with poetry

Poetry is a community tool for managing Python projects through the standard pyproject.toml78. This file houses metadata about the project, such as its dependencies. It is similar to NodeJS's package.json9.

It is worth noting that both pip and virtualenv are part of the Python standard library.

Install the dependencies in project_3:

poetry install
# Installing dependencies from lock file
#
# Package operations: 5 installs, 0 updates, 0 removals
#
#   • Installing six (1.16.0)
#   • Installing numpy (1.22.1)
#   • Installing python-dateutil (2.8.2)
#   • Installing pytz (2021.3)
#   • Installing pandas (1.3.5)

Print the version:

poetry run python main.py
# 1.3.5

Poetry installs the dependencies into a managed virtual environment10. This can be inspected using:

poetry env info
# Virtualenv
# Python:         3.9.10
# Implementation: CPython
# Path:           /root/.cache/pypoetry/virtualenvs/project-3-AfR2mWZY-py3.9
# Valid:          True
#
# System
# Platform: linux
# OS:       posix
# Python:   /usr/local

Poetry is built on pip and virtualenv while offering additional features such as:

  • Lock files11 for consistent installs
  • Depedency constraints when upgrading
    • e.g. pandas = "^1.3.5" supports all minor and patch upgrades 1.*.*
  • Separation between dependencies and development dependencies
    • Development dependencies are anything not required to run the program in production (e.g. test framework, linting tools, etc.)
  • Local dependency support12
    • libs = {path = "../libs", develop = true}

Conclusion

The Python ecosystem is always changing to suit the needs of the community, with dependency management being one of the cornerstones. There will always be a new shiny thing, but it is always worth understanding the problems being solved by each tool.

I recommend trying Poetry as it builds on the standard tools pip and virtualenv, while exposing a familiar command line interface comparable to other languages npm13 for JavaScript, dotnet14 for .NET, etc.

Footnotes

  1. https://docs.docker.com/get-docker/

  2. https://pypi.org/project/pandas/

  3. https://pip.pypa.io/en/stable/

  4. https://realpython.com/what-is-pip/

  5. https://packaging.python.org/en/latest/guides/hosting-your-own-index/

  6. https://virtualenv.pypa.io/en/latest/

  7. https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/

  8. https://www.python.org/dev/peps/pep-0621/

  9. https://docs.npmjs.com/cli/v8/configuring-npm/package-json

  10. https://python-poetry.org/docs/managing-environments/

  11. https://python-poetry.org/docs/basic-usage/#installing-with-poetrylock

  12. https://python-poetry.org/docs/dependency-specification/#path-dependencies

  13. https://docs.npmjs.com/cli/v7/commands

  14. https://docs.microsoft.com/en-us/dotnet/core/tools/