Recommended environments for machine learning on Arch Linux with Python
Arch Linux runs a rolling release. This gets you the most up-to-date packages. This also means you may need to update certain drivers or libraries to new versions.
Python dependencies are installed into system or user packages, for the installed python version. This can be problematic if you have different versions of python packages needed as other dependencies also require these files.
Note: I believe they added a feature in newer versions of pip that will prevent you from installing python dependencies to these user and system packages.
I recommend that you create an environment for your project and then install your dependencies into that environment. For projects that utilize special drivers or libraries, like GPU drivers, CUDA, Python version you want a more encapsulating environment to handle this.
Here are some options for creating a new environment for your project.
Anaconda
Anaconda is a distribution of python that aims to simplify package management. It allows you to install specific CUDA versions, python versions, and other dependencies.
Look into making an environment.yml for your Anaconda environment like requirements.txt.
Docker/Podman
Docker/Podman is another option where you can utilize installing specific versions of any dependencies. This option is also useful to utilize deployments of your application up on cloud services that support Docker containers like Amazon SageMaker, Vast.AI.
Cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.
This is a tool designed from the ground up to support containerization of machine learning models. This could be an option for you.
Distrobox
Use any linux distribution inside your terminal. Enable both backward and forward compatibility with software and freedom to use whatever distribution you’re more comfortable with.
Distrobox basically allows you to run another linux distro comparable to how docker works for containers. Then you can install all your dependencies and run your project in there.
Considerations
Consider these options for what may work for you.
- Anaconda is easier for some people to use, as it works similar to how pip works.
- Docker/Podman is probably the most flexible if you get the container setup well. End-user level tools that have docker integration you can see as reference. It can be difficult to debug issues with video card getting access in the container but once it’s working it does well.
- Cog may be a good option if you want a full system designed for machine learning containers, but I personally have not used it. I find the Docker/Podman rough easier to work with.
- Distrobox can be great because you can get more control over it but can be more involved for others to utilize and possibly more overhead. Also possibly more limited if you want to deploy it up into a cloud GPU service.
venv
In a lot of cases a venv
can be fine if you can utilize the more up-to-date
versions. The problem comes when you need to update to a new version, and then
you need to try, and hack up the requirements to work for multiple versions of
things like PyTorch or TensorFlow.