The Python community has adopted several approaches to managing dependencies to meet the requirements of developers. This post will touch on the following:
pip
virtualenv
poetry
We will demonstrate the usage of each and where they shine and highlight the tradeoffs.
Demo project
The project only requires docker
1 to be installed.
Contents
There are three sub-projects, creatively named project_1
, project_2
, and project_3
.
Each project depends on pandas
2 and the main.py
script prints the pandas
versions used.
Run
To startup the demo environment, run:
docker-compose run env /bin/bash
This will start a shell in a container where Python is installed and the three sub-projects mounted. Validate this by printing out the folder contents:
ls
# project_1 project_2 project_3
pip
Using pip
34 is a command line tool for adding and removing Python packages from PyPI (or a custom repository5).
Using pip
is the simplest way to manage a project's dependencies. Dependencies can be explicitly declared in a requirements.txt
and installed by referencing it.
Install the packages in project_1
:
pip install -r requirements.txt
# Successfully installed numpy-1.22.1 pandas-1.3.5 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0
Print the version used:
python main.py
# 1.3.5
The version 1.3.5
matches what was installed. Looks good to me.
Conflicting dependencies
Dependencies installed using pip
are stored globally. When multiple projects use the same dependencies, but different versions, a conflict arises.
Install the packages in project_2
:
pip install -r requirements.txt
# Installing collected packages: pandas
# Attempting uninstall: pandas
# Found existing installation: pandas 1.3.5
# Uninstalling pandas-1.3.5:
# Successfully uninstalled pandas-1.3.5
# Successfully installed pandas-1.2.5
project_2
requires pandas=1.2.5
, which is different from the pandas=1.3.5
requirement of project_1
. pip
resolved the conflict by uninstalling 1.3.5
and using 1.2.5
instead.
Print the version in project_2
:
python main.py
# 1.2.5
Print the version in project_1
:
python main.py
# 1.2.5
In a real world scenario, if project_1
was using APIs in pandas
only available to 1.3.5
it would likely break.
A simple solution would be to remember to reinstall before starting work in a project. Though simple, it can be easy to forget and still prevents us running the two projects simultaneously.
virtualenv
Isolating dependencies with virtualenv
6 is a tool for creating isolated Python environments - that is it installs dependencies local to a project.
Setup a virtual environment in project_1
:
python3 -m venv .venv
This created a .venv
directory, which will house the installed dependencies.
Activate the environment:
source .venv/bin/activate
This does some magic (pending further research), whereby both python
and pip
are setup to run from within .venv
.
which python
# /opt/app/project_1/.venv/bin/python
which pip
# /opt/app/project_1/.venv/bin/pip
We can now proceed to install the dependencies in project_1
again:
pip install -r requirements.txt
The correct version should be printed:
python main.py
# 1.3.5
While the virtual environment is active, the dependencies installed within it are used - effectively isolating the project from the previous globally installed packages.
To deactivate the virtual environment, run:
deactivate
This is a simple way to solve the issues we had using just pip
, enabling us to isolate project dependencies, but there is still some ceremony in creating and activating the virtual environment - but not too bad!
poetry
Much more with Poetry is a community tool for managing Python projects through the standard pyproject.toml
78. This file houses metadata about the project, such as its dependencies. It is similar to NodeJS's package.json
9.
It is worth noting that both pip
and virtualenv
are part of the Python standard library.
Install the dependencies in project_3
:
poetry install
# Installing dependencies from lock file
#
# Package operations: 5 installs, 0 updates, 0 removals
#
# • Installing six (1.16.0)
# • Installing numpy (1.22.1)
# • Installing python-dateutil (2.8.2)
# • Installing pytz (2021.3)
# • Installing pandas (1.3.5)
Print the version:
poetry run python main.py
# 1.3.5
Poetry installs the dependencies into a managed virtual environment10. This can be inspected using:
poetry env info
# Virtualenv
# Python: 3.9.10
# Implementation: CPython
# Path: /root/.cache/pypoetry/virtualenvs/project-3-AfR2mWZY-py3.9
# Valid: True
#
# System
# Platform: linux
# OS: posix
# Python: /usr/local
Poetry is built on pip
and virtualenv
while offering additional features such as:
- Lock files11 for consistent installs
- Depedency constraints when upgrading
- e.g.
pandas = "^1.3.5"
supports all minor and patch upgrades1.*.*
- e.g.
- Separation between dependencies and development dependencies
- Development dependencies are anything not required to run the program in production (e.g. test framework, linting tools, etc.)
- Local dependency support12
libs = {path = "../libs", develop = true}
Conclusion
The Python ecosystem is always changing to suit the needs of the community, with dependency management being one of the cornerstones. There will always be a new shiny thing, but it is always worth understanding the problems being solved by each tool.
I recommend trying Poetry as it builds on the standard tools pip
and virtualenv
, while exposing a familiar command line interface comparable to other languages npm
13 for JavaScript, dotnet
14 for .NET, etc.
Footnotes
-
https://packaging.python.org/en/latest/guides/hosting-your-own-index/ ↩
-
https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/ ↩
-
https://docs.npmjs.com/cli/v8/configuring-npm/package-json ↩
-
https://python-poetry.org/docs/basic-usage/#installing-with-poetrylock ↩
-
https://python-poetry.org/docs/dependency-specification/#path-dependencies ↩