Building the Python Wheels

Proteus now builds Python wheels as a package family rather than a single monolithic wheel:

proteus-python
pure-Python shim package
installs the host backend by default
proteus-python-backend-host
native host backend wheel for Linux and macOS
proteus-python-backend-cu12
native CUDA 12 superset backend wheel for Linux x86_64

The user-facing import surface remains:

import proteus

At runtime, the shim discovers installed backend entry points and prefers the highest-priority backend:

cuda12
host

Packaging Model

The shim package is built from the repository root with setuptools.

The backend packages are built from dedicated subprojects:

packaging/python/backend-host
packaging/python/backend-cu12

Those backend projects use scikit-build-core and the repository CMake build, but they install their native payload into backend-local packages instead of directly under proteus/:

proteus_backend_host/
proteus_backend_cu12/

Each backend package contains:

backend-local _proteus.*
backend-local libproteus.*
repaired vendored LLVM/Clang shared libraries
backend registration metadata under the proteus.backends entry-point group

The shim package exports:

proteus.active_backend
proteus.available_backends()
the selected backend's native API re-exported from proteus

Build-Time Requirements

All backend wheel builds require:

Python with build
CMake
pybind11
scikit-build-core
a usable LLVM/Clang installation discovered via LLVM_INSTALL_DIR

Platform-specific requirements:

macOS host backend:
Homebrew LLVM 22
delocate
Linux host backend:
the manylinux_2_28 LLVM container
auditwheel
Linux CUDA backend:
the CUDA-capable manylinux_2_28 LLVM container
CUDA Toolkit 12 with libnvptxcompiler_static.a
auditwheel

Local Builds

Shim Wheel

python3 -m venv /tmp/proteus-wheel-venv
/tmp/proteus-wheel-venv/bin/python -m pip install -U pip build setuptools setuptools-scm
/tmp/proteus-wheel-venv/bin/python -m build --wheel --outdir dist .

Host Backend Wheel on macOS arm64

python3 -m venv /tmp/proteus-wheel-venv
/tmp/proteus-wheel-venv/bin/python -m pip install -U pip build scikit-build-core pybind11 delocate

brew install llvm@22

MACOSX_DEPLOYMENT_TARGET=14.0 \
LLVM_INSTALL_DIR=/opt/homebrew/opt/llvm \
  /tmp/proteus-wheel-venv/bin/python -m build --wheel \
  --outdir wheelhouse \
  packaging/python/backend-host

Host Backend Wheel on Linux

bash packaging/python/image-scripts/build-manylinux-llvm-container.sh

python -m pip install -U pip build cibuildwheel
CIBW_MANYLINUX_X86_64_IMAGE=ghcr.io/olympus-hpc/proteus-manylinux-llvm:22.1.3 \
  python -m cibuildwheel packaging/python/backend-host --output-dir wheelhouse

CUDA 12 Backend Wheel on Linux

bash packaging/python/image-scripts/build-manylinux-cuda-llvm-container.sh

python -m pip install -U pip build cibuildwheel
CIBW_MANYLINUX_X86_64_IMAGE=ghcr.io/olympus-hpc/proteus-manylinux-cuda-llvm:12.4.1-22.1.3 \
  python -m cibuildwheel packaging/python/backend-cu12 --output-dir wheelhouse

Linux Container Images

The Linux wheel workflows use prebuilt GHCR images.

Host backend image inputs:

packaging/python/image-scripts/manylinux-llvm.Dockerfile
packaging/python/image-scripts/build-llvm-manylinux.sh
packaging/python/image-scripts/build-manylinux-llvm-container.sh

CUDA backend image inputs:

packaging/python/image-scripts/manylinux-cuda-llvm.Dockerfile
packaging/python/image-scripts/build-llvm-manylinux.sh
packaging/python/image-scripts/build-manylinux-cuda-llvm-container.sh

The CUDA image installs:

LLVM 22.1.3
CUDA Toolkit 12.4
static libnvptxcompiler_static.a

Installed-Wheel Verification

Do not treat build-tree imports as sufficient. Install from built wheels into a clean environment.

Default Host Install

/tmp/proteus-wheel-venv/bin/python -m pip install \
  --no-index \
  --find-links dist \
  --find-links wheelhouse \
  dist/proteus_python-*.whl

export PROTEUS_CLANGXX_BIN=/path/to/clang++-22

/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_backend_loader.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_smoke.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_validation.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_std_headers.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_invalid_clang_override.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_wheel_layout.py

CUDA Superset Install

Install the shim first, then the CUDA backend wheel from the local wheelhouse:

/tmp/proteus-wheel-venv/bin/python -m pip install \
  --no-index \
  --find-links dist \
  --find-links wheelhouse \
  dist/proteus_python-*.whl

/tmp/proteus-wheel-venv/bin/python -m pip install \
  --no-index \
  --find-links wheelhouse \
  proteus-python-backend-cu12==<version>

On CPU-only machines, validate the host path:

export PROTEUS_CLANGXX_BIN=/path/to/clang++-22

/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_backend_loader.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_smoke.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_validation.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_host_cpp_std_headers.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_invalid_clang_override.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_wheel_layout.py

On Linux systems with an NVIDIA GPU and driver, additionally validate:

export CUDA_HOME=/usr/local/cuda

/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_gpu_cpp_smoke.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_gpu_cpp_launch_validation.py
/tmp/proteus-wheel-venv/bin/python bindings/python/tests/test_gpu_cpp_pointer_validation.py

Runtime Contract

Host Backend

ships Proteus and the LLVM/Clang runtime libraries it links against
does not ship a host C++ compiler toolchain
still requires a host clang++ whose major version matches the bundled LLVM/Clang runtime for frontend="cpp", target="host"
for the current wheel line, use clang++-22 or equivalent and set PROTEUS_CLANGXX_BIN when it is not the default clang++ on PATH
a mismatched clang++ may fail while resolving builtin/system headers during the in-process frontend compile

Install	Backend	Target	Required compiler/toolchain
`pip install proteus-python`	`proteus-python[host]`	Host CPU	LLVM/Clang 22.x
`pip install proteus-python[cuda12]`	`proteus-python[cuda12]`	Host CPU + NVIDIA CUDA GPU	LLVM/Clang 22.x, or the NVIDIA toolchain when using NVCC for host/device compilation

CUDA 12 Backend

is a superset backend: host functionality remains available
does not vendor libcuda.so, libcudart, or the CUDA Toolkit
requires an installed NVIDIA driver for CUDA functionality
requires a matching CUDA 12 toolkit root for runtime compilation

CUDA toolkit resolution remains:

PROTEUS_CUDA_HOME
CUDA_HOME
CUDA_PATH

For wheel-targeted CUDA builds on Linux, Proteus now resolves libcuda.so.1 with dlopen() at runtime instead of linking the wheel directly against the CUDA driver shared library. This keeps the wheel compatible with manylinux repair and lets host-only functionality work on machines without an NVIDIA driver.

CI Workflow

The wheel workflow is defined in .github/workflows/ci-wheels.yml.

It produces three artifact groups:

shim wheel
host backend wheels
CUDA backend wheels

Current target matrix:

shim: pure Python
host backend:
macOS arm64
manylinux_2_28 x86_64
CUDA backend:
manylinux_2_28 x86_64

The workflow builds all backend wheels with cibuildwheel, then performs explicit installed-wheel validation using the newly built shim and backend wheels from the local artifact directories.